Datasheet

29 www.microsoft.com/sharepoint
many clicks in the search click-through log are popular and therefore receive better rank
scores than less-viewed items. Items that are linked to from many other items are also
perceived to be more relevant for the user and therefore receive better rank scores.
The Web Analyzer scales up to many nodes to reduce the total time that is needed for
the analysis.
Item Processing
The item processing service receives items to be indexed from indexing connectors. The
item processing service extracts content from source documents in various formats,
discovers and sets managed properties, and performs linguistic processing on the
content. The item processing service then sends the processed items to the indexing
service.
Key features of the item processing service are as follows:
Mapping from crawled properties to managed properties. Managed properties
contain the content that will be indexed including metadata associated with the
items. You will first perform a crawled property discovery based on an initial set
of crawled items. Based on this discovery you can change the mapping to
managed properties.
Parsing of document formats such as Office and PDF. This includes extracting
searchable text and metadata from these formats.
Extracting properties from the retrieved content. The property extraction can
detect various properties such as names and dates from the documents, and
maps them into managed properties. In this manner, you can query these
properties, and also change query refinement based on these properties. It is also
possible to create custom property extractors using, for example, a dictionary of
product names relevant to your organization.
Linguistic processing of items before indexing. In search, linguistics is defined as
the use of information about the structure and variation of languages so that
users can more easily find relevant information. The item‟s relevancy with regard
to a query is not necessarily decided based on words common to both query and
document, but instead the extent that its content satisfies the user‟s need for