Specifications

B-35
Cisco Internet Streamer CDS 2.0-2.3 Software Configuration Guide
OL-13493-04
Appendix B Creating Manifest Files
Manifest File Structure and Syntax
ttl="3000"
/>
crawler
The <crawler> </crawler> tag set supports crawling a website or an FTP server.
Attributes
start-url
The start-url attribute is required. It defines the URL at which to start the process of crawling the
website or FTP server. It is identical to the src attribute used in the <item> tag. (See the •src”
section on page B-28 under the item section.)
host
The host attribute specifies the host name if the starting URL specified in the start-url attribute is a
relative URL.
depth
The depth attribute is optional and defines the link depth to which a website is to be crawled or
directory depth to which an FTP server is to be crawled. If the depth is not specified, the default is
20. The following are the general depth values:
0 = Acquire only the starting URL
1, 2, 3, ... = Acquire the starting URL and its referred files
–1 = Infinite or no depth restriction
Depth is defined as the level of a website or the directory level of an FTP server, where 0 is the
starting URL.
prefix
The prefix attribute is optional and combines the hostname from the <server> tag with the value of
the prefix attribute to create a full prefix. Only content with URLs that match the full prefix is
acquired, as shown in this example:
<server name="xx"> <host name="www.cisco.com" proto="https" port=433 /> </server>
and with the following <crawler> tag:
prefix="marketing/eng/"
The full prefix is “https://www.cisco.com:433/marketing/eng/.” Only URLs that match this prefix
are crawled.
If a prefix is omitted, the crawler checks the default full prefix, which is the hostname portion of the
URL from the server. In the example, the default full prefix is “https://www.cisco.com:433.
accept
The accept attribute is optional and uses a regular expression to define acceptable URLs to crawl in
addition to matching the prefix. For example, accept=“stock” means that only URLs that meet two
conditions are searched: the URL matches the prefix and contains the string “stock.” (See the
“Writing Common Regular Expressions” section on page B-5 for more information on using regular
expressions.)
Note the following two key differences between the accept attribute and the prefix attribute:
The prefix attribute uses an exact string match, while the accept attribute uses a regular
expression.