Operation Manual

Developing Sites and Pages 85

Excluding pages from indexing (Robots file)

The objective of this method is the same as that for using a robots meta tag,

but instead a robots.txt file is created and no robots meta tag is included in

Web pages. The robots.txt file is stored in the Web site's root folder and can

be viewed in any text editor to verify the excluded pages and folders.

To enable a robots.txt file:

1. (For the site) Choose Site Properties... from the File menu.

(For a page) Right-click the page in the workspace or Site tab and choose

Page Properties... (or choose the item from the Edit menu).

2. Check the Create search engine robots file option.

3. (For the site) To allow or prevent search engines indexing the entire site

(check/uncheck Index pages on this site option).

(For a page) To prevent search engines indexing the page, check

Override site search engine settings, then uncheck the Index this page

option.

Including pages in indexing

So far we've looked primarily at methods of excluding Web pages from

indexing. Without these controls, Web pages will be indexed by discovering

page hyperlinks and crawling through them, harvesting keywords,

descriptions, and page text to be indexed. However, this process may not be

efficient as there may be a limited number of inter-page hyperlinks present

throughout your site. As a result, a search engine sitemap file (sitemap.xml)

can be created to act as a local lookup for crawlers to begin investigating your

site. The file simply lists pages in your Web site that you've decided can be

indexed. The file also indicates to search engines when pages have been

modified, informs when the search engine should check the page and how

"important" pages are in relation to each other.

The Sitemap method is especially good for "advertising" your Web site

pages—with a greater likelihood of your pages appearing high in a user's

search results.