Specifications
B-37
Cisco Internet Streamer CDS 2.0-2.3 Software Configuration Guide
OL-13493-04
Appendix B Creating Manifest Files
Manifest File Structure and Syntax
• externalPrefixes
The externalPrefixes attribute is optional and specifies additional prefixes for crawl jobs to crawl
multiple protocols or multiple websites. Prefixes are separated with a bar (|).
• externalServers
The externalServers attribute is optional and can be used for multiple host crawling jobs where each
host has a different user account. This attribute can be used to refer to the <host> tag with the proper
authentication information.
• keepExpiredContent
The keepExpiredContent attribute can be used to acquire content during an HTTP or HTTPS crawl
that is expired. When this attribute is set to true, expired content will be fetched. When this attribute
is set to false, expired content is discarded. If this attribute is not specified, the default is false.
• keepFolder
The keepFolder attribute is used to fetch folders (a folder is indicated when the request URL ends
with a forward slash “/”). If this attribute is set to false, folder URLs are not acquired.
• keepNoCacheContent
The keepNoCacheContent attribute can be used to acquire content during an HTTP or HTTPS crawl
that would normally not be cached. When this attribute is set to true, the acquirer will fetch the
content even though the content contains an HTTP cache control header indicating that the content
is not to be cached. If this attribute is not specified, the default is false.
• keepQueryUrl
The keepQueryUrl attribute can be used to fetch URLs that contain “?” in the URL string. If this
attribute is set to true, URLs with “?” will be fetched during HTML parsing for a crawl job if the
URL meets the other crawling criteria set forth in the Manifest file.
This attribute is useful when you want to acquire content from a database, for example, where
multiple files are differentiated in the portion of the URL string after the “?”. When this attribute is
not set, the portion of the URL after the “?” is discarded. If multiple URLs are found where the
portion of the URL string in front of the “?” is the same, these URLs appear as duplicates, and only
the last “duplicate” URL found is fetched.
• reportBrokenLinks
The reportBrokenLinks attribute is used to report links on an HTML web page that cannot be
fetched. If this attribute is set to true, all broken links encountered during a website crawl will be
reported as errors. This attribute only applies to a website crawl, not to an index crawl. The default
is false and broken links are not reported as errors.
The following attributes described under the <host> tag attributes can also be specified by the
<crawler> tag:
• disableBasicAuth
• noProxy
• ntlmUserDomain
• password
• port
• proto
• proxyServer
• sslAuthType