Specifications

B-8
Cisco Internet Streamer CDS 2.0-2.3 Software Configuration Guide
OL-13493-04
Appendix B Creating Manifest Files
Working with Manifest Files
A <match> subtag can specify multiple attributes. Attributes within a <match> tag have a Boolean AND
relationship. In the following example, to satisfy this match rule, a file must have an .mpg type file
extension and its size must be larger than 50 kilobytes.
<match extension="mpg" minFileSizeIn-KB="50" />
There is a Boolean OR relationship between the <match> rules themselves. A <matchRule> tag can have
multiple <match> subtags, but only one of these subtags must be matched. The <matchRule> tag can be
specified as a subtag of the <crawler> tag, or a subtag of the <item-group> tag. If there is a subtag in an
<item-group> tag, it is shared by every <crawler> tag within that <item-group> tag.
Note The accept or reject attributes can be mistakenly used in the <crawler> tag for a crawler filter.
For example, to crawl files with the extension .mpg, simply specifying accept=“\.mpg” is not correct. In
this case, although specifying accept=“\.mpg” is not technically incorrect, no crawling occurs. Pages
with URLs that do not match the accept constraint are not searched. For example, if the starting URL is
index.html, this HTML file is parsed and any links not containing .mpg are rejected. If the .mpg files are
located in the second or lower link levels, they are not fetched because the links connecting them have
been rejected.
To properly crawl for the .mpg extension, use <matchRule>. Specify <matchRule> <match
extension=“mpg” />. The whole site is crawled and only those files with the .mpg extension are retained.
Specifying Content Priority
A priority can be assigned to content objects to define their order of importance. The CDS software
determines the order of processing from the level of priority of the content. The higher the content
priority, the sooner the acquisition of content from the origin server and the sooner the content is
distributed to the Service Engines.
Note Every content object acquired by running a crawl job has the same priority.
Three factors combine to determine content priority:
Delivery Service priority—Content Distribution Priority drop-down list in the Acquisition and
Distribution Properties area of the Delivery Service Definition page in the CDSM
Item index—Content order listed in the Manifest file
Item priority—Priority of the attributes specified in the <item> or <crawler> tag
prefix (Optional.) Specifies a prefix as a match rule to filter out websites during
a crawl job.
url-pattern (Optional.) Specifies a regular expression as a match rule to filter out
certain URLs.
Table B-2 <match> Subtag Attributes (continued)
Attribute Description