Specifications

B-3
Cisco Internet Streamer CDS 2.0-2.3 Software Configuration Guide
OL-13493-04
Appendix B Creating Manifest Files
Working with Manifest Files
For a single item, you specify the item’s URL in the src attribute. There are two ways to specify the item
URL:
Specify the src attribute with the absolute URL as shown in the following format:
proto://username:password@/domain-name:port/file-path/file-name
In the example, the first <item> tag uses the full path.
Specify the origin server information using the <server><host> tags and use the src attribute to
specify only the relative path.
In the example, every <item> tag except the first one uses a relative path. The second <item> tag
uses the Manifest file server, where test.html is relative to the Manifest file URL. The second <item>
tag, “project-two.html,” uses “my-origin-server-two.” The third <item> tag, “project-one.html,
uses “my-origin-server-one.
Specifying a Crawl Job
The crawler feature methodically and automatically searches acceptable websites and makes a copy of
the visited pages for later processing. The crawler starts with a list of URLs to visit and identifies every
web link in the page, adding these links to the list of URLs to visit. The process ends after one or more
of the following conditions are met:
Links have been followed to a specified depth.
The maximum number of objects has been acquired.
The maximum content size has been acquired.
By crawling a site at regular intervals using the time-to-live (or ttl) attribute, these links and their
associated content can be updated regularly to keep the content fresh. Use the <crawler> tag to specify
the website or FTP server crawler attributes. Table B-1 lists the attributes, states whether these attributes
are required or optional, and describes their functions.
Table B-1 Website or FTP Server Crawl Job Attributes
Attribute Description
start-url (Required) Identifies the URL to start the crawl job from. It can be a full path or
a relative path. If it is a relative path, the <server><host> tags are required to
specify the origin server information.
depth (Optional) Defines the level of depth to crawl the specified website.
The depth is defined as the level of a website’s URL links or FTP server’s
directory, where 0 is the URL or directory from which the crawl job starts.
0 = Acquire only the starting URL.
1, 2, 3, ... = Acquire the starting URL and its referred files to the depth specified.
–1 = Infinite or no depth restriction.
If the depth is not specified, the default is used. The default is 20.
Note It is not advisable to specify a depth of –1 because it will take a long time
to crawl a large website and is wasteful if all of the content on that
particular website is not required.