Operation Manual

Developing Sites and Pages 85
Excluding pages from indexing (Robots file)
The objective of this method is the same as that for using a robots meta tag,
but instead a robots.txt file is created and no robots meta tag is included in
Web pages. The robots.txt file is stored in the Web site's root folder and can
be viewed in any text editor to verify the excluded pages and folders.
To enable a robots.txt file:
1. (For the site) Choose Site Properties... from the File menu.
OR
(For a page) Right-click the page in the workspace or Site tab and choose
Page Properties... (or choose the item from the Edit menu).
2. Check the Create search engine robots file option.
3. (For the site) To allow or prevent search engines indexing the entire site
(check/uncheck Index pages on this site option).
OR
(For a page) To prevent search engines indexing the page, check
Override site search engine settings, then uncheck the Index this page
option.
Including pages in indexing
So far we've looked primarily at methods of excluding Web pages from
indexing. Without these controls, Web pages will be indexed by discovering
page hyperlinks and crawling through them, harvesting keywords,
descriptions, and page text to be indexed. However, this process may not be
efficient as there may be a limited number of inter-page hyperlinks present
throughout your site. As a result, a search engine sitemap file (sitemap.xml)
can be created to act as a local lookup for crawlers to begin investigating your
site. The file simply lists pages in your Web site that you've decided can be
indexed. The file also indicates to search engines when pages have been
modified, informs when the search engine should check the page and how
"important" pages are in relation to each other.
The Sitemap method is especially good for "advertising" your Web site
pages—with a greater likelihood of your pages appearing high in a user's
search results.