-
Google Search Appliance Search Protocol Reference Google Search Appliance software version 7.
-
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-XML_100.09 December 2013 © Copyright 2013 Google, Inc. All rights reserved. Google and the Google logo are, registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
-
Contents Chapter 1 Chapter 2 Introduction .............................................................................................................. 5 Request Format ..........................................................................................................
-
Chapter 4 Dynamic Result Clustering Service /cluster Protocol ...................................................... 86 Dynamic Result Clustering Request Dynamic Result Clustering JSON Request and Response Dynamic Result Clustering XML Request and Response Chapter 5 88 88 90 Query Suggestion Service /suggest Protocol .................................................................
-
Chapter 1 Introduction Chapter 1 The Google Search Appliance uses a simple HTTP-based protocol for serving search results. This enables you to control how search results are requested and how they are presented to end users. This guide describes the technical details of search requests and results. This guide assumes that you have a basic understanding of the HTTP protocol and the HTML document format. For terminology definitions, see the Google for Work Glossary.
-
Chapter 2 Request Format Chapter 2 The information in this section helps you create custom searches for your web site. By using search parameters, special query terms and filters in your search requests, you can refine and enhance searches to serve your needs.
-
Using the POST Command In some instances, your query strings might exceed the 2KB URL length limit of GET requests and be truncated. This might happen when you submit dynamic navigation queries containing a large number of metadata filters. You can avoid this limitation by submitting POST requests instead, which have a much larger body limit (10KB).
-
Submitting a Search Request Typically, search users make search requests by entering search parameters in a HTML form rendered in a web browser (like the following):
-
This request returns the first 10 results that match the search query terms “bill” and “material”: q=bill+material&output=xml&client=test&site=operations&access=p This request returns results numbered 11-15 that match the same query terms and collection as example 1. As specified by the proxystylesheet parameter, the results are rendered in the custom HTML output format defined by the front end named “test.
-
as_epq Adds the specified phrase to the search query in parameter q. For example, to add the terms “hello there” use as_epq=hello there This parameter has the same effect as using the phrase special query term (see “Phrase Search” on page 28). Default value: Empty string as_eq Excludes the specified terms from the search results.
-
as_lq Specifies a URL, and causes search results to show pages that link to the that URL. This parameter has the same effect as the link special query term (see “Back Links” on page 24). No other query terms can be used when using this parameter. For example, to return results that have links to http://myUrl.com/Page, use as_lq=http://myUrl.
-
as_sitesearch Limits search results to documents in the specified domain, host or web directory, or excludes results from the specified location, depending on the value of as_dt. This parameter has the same effect as the site or -site special query terms. It has no effect if the q parameter is empty.
-
Standard terms use only the search appliance’s internal contextual (synonym) files for query expansion. Local terms use all displayed and activated synonym files, including any uploaded files. After you configure and enable the appropriate query expansion files, set the query expansion policy for a front end. Each front end has a policy that specifies whether it uses the search appliance’s built-in logic (the “standard” set of terms), your own list of synonyms (the “local” set), or both (the “full” set).
-
exclude_apps Controls whether Google Apps content from the user’s Google Apps domain displays in search results, according to the following values: Value Description No value If you omit the exclude_apps parameter in the search request, Google Apps content will not display in search results. 0 (exclude_apps=0). Google Apps content will display in search results, as determined by the Google Apps results sidebar element in the front end. See the table below.
-
gsaRequestID A GSA-generated ID that is set at the start of a query session and that exists only for the length of a query. Serving logs use this value, which is sent back to the search appliance for each subsequent request during the query session. Default value: None. ie Sets the character encoding that is used to interpret the query string. See “Internationalization” on page 34 for more information.
-
numgm Number of KeyMatch results to return with the results. A value between 0 to 50 can be specified for this option. Default value: 3 oe Sets the character encoding that is used to encode the results. See “Internationalization” on page 34 for more information.
-
proxycustom Specifies custom XML tags to be included in the XML results. The default XSLT stylesheet uses these values for this parameter: , . The proxycustom parameter can be used in custom XSLT applications. See “Custom HTML” on page 52 for more information. This parameter is disabled if the search request does not contain the proxystylesheet tag. If custom XML is specified, search results are not returned with the search request.
-
rc Request an accurate result count for up to 1M documents. When rc = 1, the user will get accurate result count. This might introduce high latency. rc=0 works like current default search estimates, as described in “Appendix A: Estimated vs. Actual Number of Results” on page 105. Default value: 0 requiredfields Restricts the search results to documents that contain the exact meta tag names or name-value pairs. See “Meta Tags” on page 41 for more information.
-
sitesearch Limits search results to documents in the specified domain, host, or web directory. Has no effect if the q parameter is empty. This parameter has the same effect as the site special query term. Unlike the as_sitesearch parameter, the sitesearch parameter is not affected by the as_dt parameter. The sitesearch and as_sitesearch parameters are handled differently in the XML results. The sitesearch parameter’s value is not appended to the search query in the results.
-
As an example, if the result URLs contain files whose names are in Chinese characters and the ud parameter is set to 1, the Chinese characters appear. If the ud parameter is set to 0, the Chinese characters are escaped. Default value: • When a search request includes the proxystylesheet parameter, the default value for ud is 1 and cannot be modified. • When the search request does not include the proxystylesheet parameter, the default value for ud is 0 and the value can be modified.
-
This search request includes the custom parameter myparam with a value of test+this . The space character (represented as "+") in the custom parameter myparam is replaced by the underscore character (_) in the XML output. The resulting XML output looks like this: The unmodified value can be retrieved from the original_value attribute.
-
Character Description Minus sign or hyphen (-) Treated as part of a query term if there is no space preceding it. A hyphen that is preceded by a space is the Exclude Query Term operator. A hyphen after a parenthesis is treated as the Exclude Query Term operator. For example, the query Fmoc-Cys(Trt)-OH returns documents that contain Fmoc-Cys(Trt) and excludes documents that contain OH in addition to Fmoc-Cys(Trt). Decimal point (.
-
Back Links The query prefix link: lists web pages that have links to the specified web page. No spaces can come between link and the web page URL. The URL pattern for the linked-to web page must appear in Follow and Crawl URL patterns on the Content Sources > Web Crawl > Start and Block URLs page in the Admin Console. Otherwise, the link query does not produce any search results. For example, consider the following the query link:http// www.example.com/child.html. For this query to return any results, www.
-
Date Range Search Restrict search to documents with modification dates that fall within a time frame. You can search any dates between 1900-01-01 and 2079-06-06. For a complete list of date formats, see “Acceptable Date Formats” on page 109. Date range searches by themselves do not return results and must be accompanied by a search term. Only documents that have a modification date are returned for a daterange query. Documents that do not have modification dates are excluded from the results.
-
The URLs used with site must contain fewer than 119 characters. The exclusion operator (-) can be applied to this to remove a web directory from consideration in the search. You can specify one site term per search request or multiple site terms using the Boolean OR operator. The search request parameters as_sitesearch (see “as_sitesearch” on page 13) and as_dt (see “as_dt” on page 10) can also be used to submit directory restricted searches. See also the site parameter (see “site” on page 19).
-
You can exclude multiple file types by adding more -ext: terms to the search query. Sample usage: whitepaper -ext:pdf -ext:doc File Type Filtering The query prefix filetype: filters the results to include only documents with the specified MIME content type. No spaces can come between filetype: and the type. You can exclude file types by putting a minus sign before filetype, such as -filetype:pdf. For more information, see “File Type Exclusion” on page 27.
-
Number Range Search To search for documents or items that contain numbers within a range, type your search term and the range of numbers separated by two periods (..). You can set ranges for weights, dimensions, prices (dollar currencies only), and so on. Be sure to specify a unit of measurement or some other indicator of what the number range represents. Sample usage: pencils $1.50..$2.50 Phrase Search Search for complete phrases by enclosing them in quotation marks or by connecting them with hyphens.
-
Sample usage: intitle:google Title Search (all terms) If you precede a query with allintitle: Google search restricts the results to those with all of the query words in the result title. For plain text files, the search appliance displays results using the first 70 KB of the file as the title. Because the document does not have a title, the allintitle special query term does not work for plain text files.
-
Wildcard Search If you precede a query with wildcard:, you can search by entering a wildcard pattern instead of the exact spelling of a term. By default, wildcard search is enabled for each front end of the search appliance. However, to use wildcard search, you must ensure that wildcard indexing is also enabled for your search appliance by using the Index > Index Settings page in the Admin Console.
-
Automatic Filtering Google uses automatic filtering to ensure the highest quality search results. Google search uses two types of automatic filters: • Duplicate Snippet Filter—If multiple documents contain identical titles as well as the same information in their snippets in response to a query, only the most relevant document of that set is displayed in the results.
-
Automatic Language Filters The Google Search Appliance automatically detects the language of each search query and returns results in that language. For example, if a user submits a search query in Hungarian (lang_hu), results are automatically returned in Hungarian. The algorithm for automatically determining the language of a web document is not customizable. The language of a document is determined primarily by the language used for the majority of the text in the body of the document.
-
Language Automatic Language Filter Name Spanish lang_es Swedish lang_sv Turkish lang_tr If you want to filter languages other than the above, obtain the language code from ISO 639 (see http:// www.loc.gov/standards/iso639-2/php/code_list.php), index a document corpus containing the desired languages, and run tests to determine that the search results are as expected.
-
Combining Language Filters Search requests that use the lr parameter support the Boolean operators identified in the following table in order of precedence. Boolean Operator Sample Usage Description Boolean NOT [ - ] -lang_fr Removes all results that are defined as part of the Language Filter immediately following the - operator. The example lr value would remove all results in French. Boolean AND [ . ] gloves.
-
Example 2. This request interprets the search query “gloves” using latin2 encoding, searches for results which are not in Hungarian or Czech, and returns results using latin2 encoding: GET /search?q=gloves&client=test&site=test&lr=(-lang_hu).(lang_cs)&ie=latin2&oe=latin2 Example 3.
-
Language Encoding Value Alternate Encoding Value Polish latin2 ISO-8859-2 Romanian latin2 ISO-8859-2 Russian cyrillic ISO-8859-5 Spanish latin1 ISO-8859-1 Swedish latin1 ISO-8859-1 Turkish latin3 ISO-8859-3 Turkish latin5 ISO-8859-9 Unicode (All Languages) utf8 UTF-8 Sorting Google search provides three sorting options for search results: • “Sort By Relevance (Default)” on page 36 • “Sort By Date” on page 36 • “Sort by Metadata” on page 37 Sort By Relevance (Default) By defa
-
Example The following request returns the first 10 top results that match the query “chicken teriyaki” in the “test” collection: GET /search?q=chicken+teriyaki&output=xml&client=test&site=test&sort=date:D:S:d1 Results are sorted by date and relevancy. Details To sort the results by date, include the sort parameter in the search request, formatted as follows: date::: The following tables shows the possible values for , , and .
-
• How much metadata exists for each document • The sorting options specified The value used to sort each document is available in the XML output in the FS tag. How Sorting Works When a search request is submitted with the sorting parameter specified as described in the following sections, the Google Search Appliance retrieves the value corresponding to the given meta tag name for each search result. In some instances, as described below, some processing of the value will occur.
-
Value Description E Return the 1000 most relevant results, then sort by metadata. Default. ED Same as mode E, but also sort dates chronologically. Supported in GSA version 7.2.0.G.230 and later. S Return the 1000 most relevant results, then sort by metadata, then apply Advanced Score Reporting, Unification biasing, and filtering. SD Same as mode S, but also sort dates chronologically. Supported in GSA version 7.2.0.G.230 and later.
-
Value Description F This is similar to Y, but also identifies and sorts negative and floating-point numbers. It will identify proper punctuation based on the language specified, so a decimal point is a . for English, but a , for German. N Can be used to sort pure English numbers (only containing digits and +-. punctuation) faster than using Y or F. But values like ABC2XYZ will not be sorted.
-
Sort currency sort=meta::::::F $0.01 $1.00 $34 $1,234.56 Sort English-looking numbers This is a very fast option, if all the numbers are in the following formats. sort=meta::::::N -12345 -1234.56 -3 0 34.9 +35.172 +321 16003.58 Sort dates sort=meta:::ED January 30, 2012 February 18, 2013 October 2, 2013 Sort date-like words Dates in word format are sorted alphabetically.
-
At search time, if the encoded value of the search attribute (requiredfields, partialfields, inmeta) plus the attr:value is greater than 121, then the search won't produce any results.
-
If any of the results contain the author, title or keywords meta tags, then the values of those meta tags are returned with the respective results. For example, the following tags could be returned with this search request: Details To specify multiple meta tag values to be returned, list all meta tag names separated by a period (.
-
Examples Example 1: The following search request returns the first 10 results that match the query “checks” in the “test” collection and also contain either of the following meta tags (the %2520 operator in the GET statement shows double encoding where %20 (space) is double encoded so that the % character (hexadecimal 25) is appended to the hexadecimal 20): GET /search?q=checks&output=xml&client=test &site=test &re
-
• Boolean AND [ . ] Returns results that satisfy both meta tag constraints. Example: author:William.author:Jones • Combined OR and AND with [ ( ) ] Evaluates conditions in parentheses first: (department=Sales OR department=Finance) AND (author=Williams OR author=Jones). Example: (department:Sales|department:Finance).(author:William|author:Jones) Boolean operators are left associative with equal precedence. You can use parentheses to change the order of precedence. For example, A .
-
Searches with unsupported expressions are not performed and do not return results. Non-Alphanumeric Characters By default, non-alphanumeric characters in a partialfields query separate the query terms in the same way as space characters. Generally use spaces as separators even when the original content used different content as a separator.
-
Character Description Ampersand (&) Not treated as a separator. For example for the meta tag: Use a partialfields query like this (%2526 is a double URL-encoded ampersand character): partialfields=letters:a%2526b Underscore (_) Not treated as a separator.
-
Usage Notes: 1. By default documents that contain ALL query terms are returned. This behavior is similar to a boolean AND. Note though that there is no AND query term. It is the default way of processing query terms. The default behavior can be changed by using the boolean or query term OR or the boolean not query term ‘-’. Also note that it is not possible to use the NOT operator in an OR statement, for example test OR -test1. Also, there is no way to do nesting of boolean logic using parenthesis. 2.
-
6. An inmeta search is unable to search by multiple keywords or perform phrase searches. For example, consider the following meta tags: The following query does not work correctly: checks inmeta:department=Human+Resources+OR+checks inmeta:department=Finance Instead, use multiple inmeta query terms, for example: inmeta:department~Human OR inmeta:department~Finance 7.
-
11. Metadata can have multiple attributes with the same name. For example: If multiple values are available and if any of the attribute values match the search query, a link to the document appears in the search results. 12. While inmeta supports wildcard search, it does not support boolean logic.
-
Example 4. The following is an open-ended date range search request that returns results containing “date” meta tag values later than 2007-01-01: Monica inmeta:date:daterange:2007-01-01.. Date meta tags must contain only the date information. If you want to filter by date meta tags, make sure the meta tag content fields do not contain any information other than a date. Limitations For information about search request limitations, see Specifications and Usage Limits.
-
Chapter 3 Results Format Chapter 3 This section covers the following topics: • “Custom HTML” on page 52 • “XML Output” on page 53 Custom HTML This section describes the custom HTML results. • “Custom HTML Output Overview” on page 52 • “Internationalization” on page 53 Custom HTML Output Overview Google Search Appliance has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom HTML using your XSL stylesheet.
-
Notes: • XSL stylesheets used by the XSLT server are cached for 15 minutes. To force the XSLT server to use the latest version of an XSL stylesheet, set the proxyreload input parameter to a value of 1 in your search request. • XSL stylesheets that include other files may not be used with the Google search engine.
-
XML Output Overview For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. If you are using an XSL stylesheet to transform the XML results instead of developing your own XML parser, proceed to “Custom HTML” on page 52. Notes: • Element values are valid HTML and are suitable for display, unless otherwise noted in the XML tag definitions.
-
To get results in XML output format, use one of the following parameters in the search request: • output=xml_no_dtd (recommended), or • output=xml When you use the xml output format, the XML results include the line: The DTD is available on the Google Search Appliance at http:///google.dtd. Google XML Tag Definitions This section contains an index of Google’s XML tags.
-
C Format/Parent HAS Subtags None Definition Indicates that the “cache:” special query term is supported for this search result URL. Cached results are suppressed and this element is not returned if the
tag of the document contains the following tag: . Attributes Name Format Description SZ Text (Integer + “k”) Provides the size of the cached version of the search result in kilobytes (“k”).
-
CACHE Format/Parent GSP Subtags CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML Definition Encapsulates the cached version of a search result. Attributes None CACHE_CONTENT_TYPE Format/Parent Text (MIME type) CACHE Subtags None Definition MIME type of the cached result, as specified in the HTTP header that is returned when the document is crawled.
-
CACHE_HTML Format/Parent Text (HTML) (Custom HTML output only) CACHE Subtags BLOB? (XML output only) Definition The cached version of the search result. All search results are stored in HTML format. Attributes None CACHE_ENCODING Format/Parent Text CACHE Subtags None Definition The encoding scheme of the cached result, as specified in the HTTP header that is returned when the document is crawled. (See “Internationalization” on page 34 for a list of common values.
-
CACHE_LANGUAGE Format/Parent Text (Google language tag) CACHE Subtags None Definition The language of the cached result as determined by Google’s automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the “lang_” prefix (see “Automatic Language Filters” on page 32).
-
CACHE_LEGEND_FOUND Format/Parent CACHE Subtags CACHE_LEGEND_TEXT* Definition Encapsulates query terms that are found in the visible text of the cached result returned. Attributes None CACHE_LEGEND_NOTFOUND Format/Parent Text (Custom HTML output only) CACHE Subtags BLOB? (XML output only) Definition Details of any query terms that are not visible in the cached result returned.
-
CACHE_LEGEND_TEXT Format/Parent Text (Custom HTML output only) CACHE_LEGEND_FOUND Subtags BLOB (XML output only) Definition Details of a query term that is visible in the cached result. Query terms found in the cached result are automatically highlighted using the colors described in the attributes of this tag. Attributes Name Format Description fgcolor Color attribute The foreground color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.
-
CACHE_URL Format/Parent Text (Absolute URL) CACHE Subtags None Definition Initial URL of cached result. Attributes None CRAWLDATE Format/Parent Text R Subtags None Definition An optional element that shows the date when the page was crawled. It is shown only for pages that have been crawled within the past two days.
-
CT Format/Parent HTML GSP Subtags None Definition Search comments. Example comment: Sorry, no content found for this URL Attributes None CUSTOM Format/Parent GSP Subtags (Custom XML specified in the search request) Definition Encapsulates custom XML tags that are specified in the proxycustom input parameter.
-
ENT_SOURCE Format/Parent R Subtags None Definition Identifies the application ID (serial number) of the search appliance that contributes to a result. Example: S5-KUB000F0ADETLA Attributes None ENTOBRESULTS Format/Parent GSP Subtags OBRES Definition Encapsulates the results returned by OneBox modules.
-
FI Format/Parent RES Subtags None Definition Indicates that document filtering was performed during this search. See “Automatic Filtering” on page 31 for more details Attributes None FS Format/Parent R Subtags None Definition Additional details about the search result.
-
GD Format/Parent Text (HTML) GM Subtags None Definition Contains the description of a KeyMatch result. Attributes None GL Format/Parent Text (URL) GM Subtags None Definition Contains the URL of a KeyMatch result.
-
GM Format/Parent GSP Subtags GL, GD? Definition Encapsulates a single KeyMatch result. Attributes None GSP Format/Parent This is the root element. Subtags (CT?, CUSTOM?, ENTOBRESULTS, GM*, PARAM+, Q, RES?, Spelling?, Synonyms?, TM) | CACHE Definition GSP = “Google Search Protocol” Encapsulates all data that is returned in the Google XML search results. Attributes Name Format Description VER Text Indicates version of the search results output. The current output version is “3.2”.
-
HAS Format/Parent R Subtags L?, C? Definition Encapsulates special features that are included for this search result. Attributes None HN Format/Parent Text (URL-encoded web directory, see “Appendix B: URL Encoding” on page 107) R Subtags None Definition Indicates that filtering has occurred and that additional results are available from the directory where this search result was found. The value of this tag is ready to be used with the site: query term (see “Directory Restricted Search” on page 25).
-
L Format/Parent HAS Subtags None Definition Indicates that the “link:” special query term is supported for this search result URL. Attributes None LANG Format/Parent Text R Subtags None Definition Indicates the language of the search result. The LANG element contains a two-letter language code. See “Automatic Language Filters” on page 32 for language codes.
-
M Format/Parent Text (Integer) RES Subtags None Definition The estimated total number of results for the search. The estimate of the total number of results for a search can be too high or too low. See “Appendix A: Estimated vs. Actual Number of Results” on page 105. Attributes None MT Format/Parent R Subtags None Definition Meta tag name and value pairs obtained from the search result. Only meta tags (see “Meta Tags” on page 41) that are requested in the search request are returned.
-
NB Format/Parent RES Subtags PU?, NU? Definition Encapsulates the navigation information for the result set. The NB tag is present only if either the previous or additional results are available. Attributes None NU Format/Parent Text (Relative URL) NB Subtags None Definition Contains a relative URL pointing to the next results page. The NU tag is present only when more results are available.
-
OBRES Format/Parent ENTOBRESULTS Subtags The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module’s documentation for details. See also the Google OneBox for Enterprise Developer’s Guide. Definition Encapsulates a result returned by a OneBox module. Attributes None OneSynonym Format/Parent HTML Synonyms Subtags None Definition A related query for the submitted query, in HTML format.
-
PARAM Format/Parent GSP Subtags None Definition The search request parameters that were submitted to the Google Search Appliance to generate these results.
-
PC Format/Parent Text (Integer 0 or 1) PARM Subtags None Definition Indicates whether the counts are exact or partial. 0-exact, 1-partial. None PMT Format/Parent PARM Subtags PV+ Definition Encapsulates results for one attribute. A maximum of 5k values (PV) are returned after sorting all by count or value as configured and discarding the rest.
-
PU Format/Parent Text (Relative URL) NB Subtags None Definition Contains relative URL to the previous results page. The PU tag is present only if previous results are available. Attributes None PV Format/Parent PMT Subtags None Definition Encapsulates one value count information.
-
Q Format/Parent HTML GSP Subtags None Definition The search query terms submitted to the Google search appliance to generate these results. Attributes None R Format/Parent RES Subtags CRAWLDATE, FS?, HAS, HN?, LANG, MT*, RK, S?, T?, U, UD, UE Definition Encapsulates the details of an individual search result. Attributes Name Format Description N Text (Integer) The index number (1-based) of this search result. L Text (Integer) The recommended indentation level of the results.
-
RES Format/Parent GSP Subtags FI?, M, NB?, PARM?, R*, XT? Definition Encapsulates the set of all search results. Attributes Name Format Description SN Text (Integer) The index (1-based) of the first search result returned in this result set. EN Text (Integer) Indicates the index (1-based) of the last search result returned in this result set.
-
RK Format/Parent Text (Integer in the range 0-10) Subtags None Definition The RK parameter assigns a ranking score to each page on a scale from 0 (least important) to 10 (most important) based on how well the result matches the query. When search results are sorted by relevancy, the RK value is in decreasing order (highest to lowest). To see the RK values, you must view search results in raw XML, as described in the following steps: 1. On the search page, enter a query and get results. 2.
-
S Format/Parent Text (HTML) R Subtags None Definition The snippet for the search result. Query terms appear in bold in the results. Line breaks are included for proper text wrapping. In documents larger than 300KB, snippets may not contain query terms that occur beyond the first 300KB of the document. For non-HTML documents, the 300KB limit applies to the converted version, not the original document.
-
SCOREBIAS Format/Parent Text (XML) R Subtags None Definition The SCOREBIAS tag can appear zero or more times as a child of the R tag (see “R” on page 76) for each result. The SCOREBIAS tag appears for each result biaser that is applied. The NAME attribute is the name of the result biaser. The VALUE attribute indicates the effect of the biaser. For biasers where the strength is expressed symbolically, such as source or collection biasing and metadata biasing.
-
Spelling Format/Parent GSP Subtags Suggestion+ Definition Encapsulates alternate spelling suggestions for the submitted query. Only one spelling suggestion is returned at this time. Attributes None Suggestion Format/Parent HTML Spelling Subtags None Definition An alternate spelling suggestion for the submitted query, in HTML format. Attributes Name Format Description q Text The spelling suggestion. qe Text Internal-only attribute of the spelling suggestion.
-
Synonyms Format/Parent GSP Subtags OneSynonym+ Definition Encapsulates the related queries for the submitted query. Up to 20 related queries may be returned, depending on the related queries list that is associated with the front end. Attributes None T Format/Parent Text (HTML) R Subtags None Definition The title of the search result.
-
TM Format/Parent Text (Floating-point number) GSP Subtags None Definition Total server time to return search results, measured in seconds. Attributes None U Format/Parent Text (Absolute URL) R Subtags None Definition The URL of the search result.
-
UD Format/Parent Text (URL to display for non-ASCII URLs) R Subtags None Definition The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly. Attributes None UE Format/Parent Text (URL-encoded version of the URL) R Subtags None Definition The URL-encoded version of the URL that is in the U parameter.
-
XT Format/Parent RES Subtags None Definition Indicates that the estimated total number of results specified in this search result is exact. See “Automatic Filtering” on page 31 for more details.
-
Chapter 4 Dynamic Result Clustering Service / cluster Protocol Chapter 4 Dynamic result clustering narrows searches by providing dynamically formed subcategories that appear at the top or right side of the search results. The following illustration shows the dynamic result clustering at the top of the search results (enclosed in the red box): The search appliance generates alternative search queries by analyzing indexed documents based on a user’s current search query.
-
3. Triggers an A JAX call to the cluster service to populate the cluster position holders.
-
Dynamic Result Clustering Request Administrators can test the /cluster feature by submitting a custom HTTP POST form. The search appliance processes cluster requests: 1. The cluster request inherits all request parameters and the search appliance transports the parameters into an internal search query. If any of the /search parameters (see “Search Parameters” on page 10) are present in the parameter list for the request to /cluster, they are passed to the internal search request. 2.
-
The search appliance returns the following JSON response: { "clusters": [ { "algorithm": "Concepts", "clusters": [ { "label": "canada chile culebra", "docs": [ 18,19,20,21,23,26,27,29,30,32] }, { "label": "dewey culebra", "docs": [ 1,9,36] } ] } ], "documents": [ { "url": "http://server.example.com/file42.pdf", "title": "TLA Annual Report 2009--Acronyms in the Public Sector ...", "snippet": "... Soy Flz (Culebra) Culebra 34,102 34,102 2.28 ...
-
Dynamic Result Clustering XML Request and Response The POST form returns XML output by adding the coutput=xml parameter to the action= URL:
Google Search Appliance: Search Protocol Reference Dynamic R
-
The search appliance returns the following XML response:
-
The top-level entries are described in the following table. Entry Description The output from different clustering algorithms. There is only one supported cluster algorithm, so the value of must be Concepts. The category consists of: • A series of and subordinate pairs. • The subordinate is a series of
-
Chapter 5 Query Suggestion Service /suggest Protocol Chapter 5 The query suggestion service provides suggestions that complete a user’s search query. As a user enters a query in the search box, a drop-down menu appears with suggestions to complete the query. The search appliance uses the most popular search queries of its users to determine the top suggestions that list for a query. Only queries that returned results are used to build the database of query suggestions.
-
Queries with Special Query Terms, such as inmeta, are excluded from the Query Suggestion database. So if this type of query is used extensively, then a possible solution is to use the equivalent Search Parameters, such as partialfields and requiredfields. After enabling query suggestions: 1. The search appliance sets the XSLT stylesheet element show_suggest element: 1 2. The search appliance provides access to the http://Search_Appliance/ss.
-
ss_g_max_to_display The maximum number of query suggestions to show from the suggest server. If set to 0, allows an unlimited number of suggestion types. Default value: 10 ss_g_more_names_to_display A literal string that displays for multiple suggestions. This value appears to the right of the query suggestion box. The default value listed in the next column is for the English language version. The value is internationalized and the actual value depends on the current language setting.
-
ss_protocol The three values are: Value Description legacy Provides backward compatibility with the version 6.0 query suggestion feature for the token and max_matches variables. This setting excludes user-added results from the response. If an unknown format is set, legacy is assumed. os Supports the OpenSearch format. rich Rich text format (default for version 6.2 and later).
-
Query Suggestion Requests and Responses The output format controls the query suggestion request and response: • Legacy Format—Backward compatibility with version 6.0 (see “Legacy Format” on page 97) • OpenSearch Format—Supports the OpenSearch protocol (see “OpenSearch Format” on page 98) • Rich Output Format—Version 6.2 and later default format (see “Rich Output Format” on page 99) Legacy Format The legacy format is similar to the suggest feature in the version 6.0 search appliance.
-
Response: [ "", "", ..., "" ] Or, if no result: [] OpenSearch Format The os format uses the OpenSearch protocol. Parameter Description Default Value callback Provides a JSONP compatible response from suggest. If you set callback=test, it will return: /* GSA Suggest Service JSONP Response. */ test(); The prefix /* GSA Suggest Service JSONP Response.
-
Response: [ "", [ "", "", ... "" ], [ "", "", ..., "" ], [ "", "", ..., "" ] ] The client distinguishes between suggest and user-added results as follows: • For suggest results, the term is non-empty while the content and the URL are empty. • For user-added results, the term is empty, content is optional, and the URL is non-empty. A client can choose to display both suggest and user-added results or just one of them.
-
Parameter Description Default Value q The partial query string that a user enters in the search box. The minimum size is one character. If set to 0, that is, if the search box is empty, then the suggest client side JavaScript doesn’t send a request to query suggestion. Even if an administrator implements a custom interface, sending an empty token returns an empty set as the result. The maximum size of the token parameter is not defined. None site Collection name.
-
Chapter 6 Advanced Search Reporting Service / click Protocol Chapter 6 Advanced search reporting enables administrators to see what types of links a user chooses on a search results page, and more generally to track all actions that a user performs such as clicking navigational links. This information enables administrators to improve access and latency of search results, and to understand user click behavior.
-
7. The search appliance takes the /click URL information and uses the URL to write advanced search reporting information to a log file. Administrators view the log on the Admin Console. For more information on advanced search reporting and the Admin Console, see “Gathering Information about the Search Experience” in Creating the Search Experience. Request Parameters Advanced search reporting requires the site parameter.
-
URL Parameter Description Example url URL that the user clicked. url=http%3A//www.foo.com/ site Required parameter. If this parameter does not have a valid value, other parameters in the query string do not work as expected. Limits search results to the contents of the specified collection. You can search multiple collections by separating collection names with the OR character, which is notated as the pipe symbol, or the AND character, which is notated as a period.
-
Responses In response to the /click URL, the search appliance always sends the same HTTP response back to the browser to acknowledge receipt. The response has a status code of 204 (no content) and a MIME type of image/gif. The Google implementation running on the client generates a URL with JavaScript as a dummy image object. The result is ignored, because the response does not affect the client behavior. Its purpose is to log the user interactions on the server side.
-
Appendices This section contains: • “Appendix A: Estimated vs. Actual Number of Results” on page 105 • “Appendix B: URL Encoding” on page 107 • “Appendix C: Date Formatting” on page 108 • “Appendix D: Compressed Results” on page 111 Appendix A: Estimated vs. Actual Number of Results The Google Search Appliance does not guarantee the ability to return a particular number of results for any given search query.
-
How the Google Search Appliance Determines the Number of Results to Return When search results are returned, the number of results is determined by one of the following conditions: • If the Google Search Appliance has results to satisfy the search request, then the requested number of results are returned. • If the Google Search Appliance has fewer results than the number requested in the search request, the last page of results is returned.
-
The underlined text in the message should be a hypertext link to submit the same search again with the parameter filter=0. Google finds that this method of informing users about automatic document filtering is effective. This method is used on the Google Internet search site. If you are using OneBox modules to provide additional query results to your users, note that the results served through a OneBox module are reported separately.
-
Examples Original String URL-Encoded String chicken -teriyaki chicken+%2Dteriyaki admission form site:www.stanford.edu admission+form+site%3Awww.stanford.edu Original String Doubly URL-Encoded String William Shakespeare William%2BShakespeare admission form site:www.stanford.edu admission%2Bform%2Bsite%253Awww.stanford.edu Appendix C: Date Formatting The search appliance recognizes dates in most reasonable formats.
-
Acceptable Date Formats The following table lists date formats that you can use with the Google Search Appliance. Format Separator Example YYYY-M-D Hyphen 2008-2-27 YYYY-D-M Hyphen 2008-27-2 YYYY.M.D Period 2008.2.27 YYYY.D.M Period 2008.27.2 YYYY/M/D Slash 2008/2/27 YYYY/D/M Slash 2008/27/2 D-M-YYYY Hyphen 20-2-2008 M-D-YYYY Hyphen 2-23-2008 D.M.YYYY Period 20.2.2008 M.D.YYYY Period 2.23.
-
Format Separator Example MMDDYYYY (none) 03232009 YYMMDD (none) 090225 DDMMYY (none) 150209 MMDDYY (none) 021509 YYYY (none) 2009 Date Formatting Notes 1. The YYYYMMDDHH and YYYYMMDDHHmm patterns for specifying dates are supported, however, the search appliance has no notion of sorting search results based on the difference of time in document dates.
-
Examples of Rules Rule # Host or URL Pattern Date Located In 1 www.foo.com/example/ Title 2 www.foo2.com/archives/ URL 3 www.foo.com/ Meta Tag 4 www.foo2.com/ Body 5 / Last Modified Meta Tag Name publication_date Because the document http://www.foo.com/example/foo.html matches the URL pattern in rule 1, the search appliance first checks for the date in the title of the document. The URL doesn’t match rule 2, so the search appliance checks against rule 3.
-
Index A access search parameter 10 advanced search reporting description 101 HTTP response 104 processing 101 request parameters 102 allinanchor query term 23 allintext query term 28 allintitle query term 29 allinurl query term 29 ampersand (&) character 6, 47 anchor text search 23 as_dt search parameter 10, 26 as_epq search parameter 11, 28 as_eq search parameter 11 as_filetype search parameter 11 as_ft search parameter 11 as_lq search parameter 12, 24 as_occt search parameter 12 as_oq search parameter 12
-
E encoding, character 53, 107 ENTOBRESULTS tag 64 entqr search parameter 13 entqrm search parameter 14 ENT_SOURCE tag 64 entsp search parameter 14 estimated vs.
-
phrase search 28 POST command 7 proxycustom search parameter 18 proxyreload search parameter 18 proxystylesheet search parameter 18 PU tag 75 Q q search parameter 18 Q tag 76 query string 8 query suggestions desciption 93 JavaScript variables 94 legacy format 97 OpenSearch format 98 processing 94 request and response 97 rich output format 99 service 93 XSLT stylesheet 96 query terms allinanchor 23 allintext 28 allintitle 29 allinurl 29 cache 24 description 22 ext 26 info 24, 29 inmeta 27, 47 intext 28 inti
-
search requests anchor text search 23 back links 24 Boolean OR 24 cached results page 24 character encoding 16, 17 collections 19 custom XML tags for results 18 date range 25 directories 25 domains 13, 20, 25 file extensions 26 file formats 11 file types 27 filters 15 front ends 13 hosts 13, 20 index number 20 internationalized domain name 20 ip addresses 16 KeyMatches 17 languages 16 limits 51 maximum number of results 16 meanings, exclude 26 meta tags 15, 17, 19, 27 number range 28 pages that link 12 phra
-
ss_protocol variable 96 start search parameter 20 Suggestion tag 81 Synonyms tag 82 T T tag 82 text search 28 title search 28 tlen search parameter 20 TM tag 83 U U tag 83 ud search parameter 20 UD tag 84 UE tag 84 ulang search parameter 21 underscore (_) character 47 URL encoding 107 search 29 usage notes for inmeta 48 UTF8 encoding 53 W wc search parameter 21 wc_mc search parameter 21 web document info 29 wildcard search 30 X XML output character encoding 54 DTD 54 search results 6, 54 xml parameter 5