Specifications

ManualsBrandsCisco ManualsComputer equipmentContent Delivery Engine 100/200/300/400

291

292

293

294

295

296

297

298

299

300

B-3

Cisco Internet Streamer CDS 2.0-2.3 Software Configuration Guide

OL-13493-04

Appendix B Creating Manifest Files

Working with Manifest Files

For a single item, you specify the item’s URL in the src attribute. There are two ways to specify the item

URL:

• Specify the src attribute with the absolute URL as shown in the following format:

proto://username:password@/domain-name:port/file-path/file-name

In the example, the first <item> tag uses the full path.

• Specify the origin server information using the <server><host> tags and use the src attribute to

specify only the relative path.

In the example, every <item> tag except the first one uses a relative path. The second <item> tag

uses the Manifest file server, where test.html is relative to the Manifest file URL. The second <item>

tag, “project-two.html,” uses “my-origin-server-two.” The third <item> tag, “project-one.html,”

uses “my-origin-server-one.”

Specifying a Crawl Job

The crawler feature methodically and automatically searches acceptable websites and makes a copy of

the visited pages for later processing. The crawler starts with a list of URLs to visit and identifies every

web link in the page, adding these links to the list of URLs to visit. The process ends after one or more

of the following conditions are met:

• Links have been followed to a specified depth.

• The maximum number of objects has been acquired.

• The maximum content size has been acquired.

By crawling a site at regular intervals using the time-to-live (or ttl) attribute, these links and their

associated content can be updated regularly to keep the content fresh. Use the <crawler> tag to specify

the website or FTP server crawler attributes. Table B-1 lists the attributes, states whether these attributes

are required or optional, and describes their functions.

Table B-1 Website or FTP Server Crawl Job Attributes

Attribute Description

start-url (Required) Identifies the URL to start the crawl job from. It can be a full path or

a relative path. If it is a relative path, the <server><host> tags are required to

specify the origin server information.

depth (Optional) Defines the level of depth to crawl the specified website.

The depth is defined as the level of a website’s URL links or FTP server’s

directory, where 0 is the URL or directory from which the crawl job starts.

0 = Acquire only the starting URL.

1, 2, 3, ... = Acquire the starting URL and its referred files to the depth specified.

–1 = Infinite or no depth restriction.

If the depth is not specified, the default is used. The default is 20.

Note It is not advisable to specify a depth of –1 because it will take a long time

to crawl a large website and is wasteful if all of the content on that

particular website is not required.