Google Search Appliance Administrative API Developer’s Guide: Protocol Google Search Appliance software version 7.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-API_100.06 December 2013 © Copyright 2013 Google, Inc. All rights reserved. Google and the Google logo are, registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Administrative API Developer’s Guide: Protocol .............................................................
GSA Unification Configuring a GSA Unification Network Adding a GSA Unification Node Retrieving a Node Configuration Retrieving All Node Configurations Updating a Node Configuration Deleting a Node Administration License Information Import and Export Event Log System Status Shut Down and Reboot 65 66 66 67 68 69 69 69 70 70 72 73 74 Index .......................................................................................................................
Administrative API Developer’s Guide: Protocol Introduction The Google Search Appliance Administration API enables administrators to configure a search appliance programmatically. This API provides functions for creating, retrieving, updating, and deleting search appliance configuration settings. The Google Search Appliance Administration API follows the principles of the Google Data APIs. Google Data APIs are based on both the Atom 1.0 and RSS 2.
• Create—Operations to add a new object, such as a collection or front end. To perform any of these operations, issue an HTTP POST request with the appropriate URL. The body of the POST request is an XML document that contains information about a resource to create. • Retrieve—Operations to request and obtain information about search appliance features.
The search appliance returns a response containing your authentication token in response to a POST request. The authentication token is the Auth value on that page, and you need to extract the token from the page. When you submit an API request, you must set the Content-Type and authorization headers as follows: Content-type: application/atom+xml Authorization: GoogleLogin auth=your-authentication-token Note: Authentication tokens expire after 24 hours or 30 minutes when not in use.
Content Format Container atom:entry Definition The encapsulates an API request or an API Atom response. Child Elements atom:id, gsa:content, atom:link Content Format Container atom:id Definition The element’s value identifies a permanent, unique identifier for a feed. This element is included in API responses.
Attributes Name Format Description rel Text The rel attribute identifies the relationship of the link to the API response feed. • If the value of the rel attribute is self, then the href attribute value is a link to the URL you use to request the feed. • If the value of the rel attribute is edit, then the href attribute value is the URL that you use to retrieve, update, or delete the resource.
gsa:content Definition The tag specifies properties of the search appliance Admin Console settings. The must contain at least one . The attribute name specifies the name of property and the value for the property should be put in content. Example http://yourdomain.
XML Response Formats Depending on the API request, the search appliance Administrative API returns XML responses. The XML response is a Google Data Atom entry. The must contain at least one . All the search appliance related information are put in XML tag. For example, the following list defines a GSAEntry response as an XML document that contains information about the crawl URLs. The client libraries convert this XML response into a GSAEntry object.
Crawl URLs Retrieve and update crawl URLs for a search appliance using the crawlURLs entry of the config feed. Property Description doNotCrawlURLs Do not crawl URLs with the following URL patterns. followURLs Follow and crawl only URLs with the following URL patterns. startURLs Start crawling from the following URL patterns.
Data Source Feed Retrieve, delete, and destroy data source feed information for a search appliance using the feed feed. The Google Search Appliance supports an interface known as the “feeds interface,” which is different from a Google Data API feed. To differentiate between these terms, the feeds interface on the search appliance is referred to as a data source feed. For more information on data source feeds, see the Feeds Protocol Developer’s Guide.
Retrieving Data Source Feed Information To retrieve information about all data source feeds for a search appliance, send an authenticated GET request to the feed feed URL: http://Search_Appliance:8000/feeds/feed?query=feedDataSource The following example result includes current feeds values for the search appliance:
Note: To get information about all feeds, specify a query to get the feedDataSource value. Alternatively, you can get all the feeds if you do not supply a query. Whether or not you supply a query, you can get information about at most five feeds for each feedDataSource value.
Destroying a Data Source Feed To destroy a data source feed from a search appliance, send an authenticated DELETE request to the feed feed URL: http://Search_Appliance:8000/feeds/feed/Feed_File_ID Note: You can only destroy a data source feed after you delete the feed. Feeds Trusted IP Addresses Retrieve and update the trusted IP addresses for feeds for a search appliance using the feedTrustedIP entry of the config feed.
The following example updates the feeds trusted IP specified in an entry: http://gsa:8000/feeds/config/feedTrustedIP feedTrustedIP 127.0.0.1 Crawl Schedule Retrieve and update the crawl schedule of a search using the crawSchedule entry of the config feed.
The response is as follows: http://gsa:8000/feeds/config/crawlSchedule 2008-12-11T06:29:35.
Property Description order The entries in crawler access rules are sequential rules. The order indicates the sequence. The order is an integer value starting from 1. password Password for authentication. urlPattern URL pattern that matches files with secure content. username User name for authentication.
http://gsa:8000/feeds/crawlAccessNTLM/http://example.com/ 2009-03-22T06:33:40.471Z http://example.com/ http://example.
Updating a Crawler Access Rule To update a crawl access rule, send an authenticated PUT request to the following URL: http://Search_Appliance:8000/feeds/crawlAccessNTLM/urlPattern The following example request body shows the result:
Retrieving a Host Load Schedule To get the host load schedule information for a search appliance, send an authenticated GET request to the config feed URL: http://Search_Appliance:8000/feeds/config/hostLoad The result is an entry that contains the current host load schedule values for the search appliance:
Property Description forceURLs URL patterns for pages to recrawl regardless of their response to IfModified-Since request headers. frequentURLs URL patterns for pages on which content changes often (typically more than once a day). Retrieving Freshness Tuning Settings To get the settings for freshness tuning, send an authenticated GET request to the following URL: http://Search_Appliance:8000/feeds/config/freshness The response is as follows:
If you discover a set of URLs that you want crawled (usually because changes made to the web pages or because of a temporary error or misconfiguration present when the crawler last tried to crawl the URL), you can enter the pattern to inject it quickly into the queue of URLs the search appliance is crawling. Property Description recrawlURLs URL patterns to be recrawled.
The following example inserts a new connector manager: ConnectorManagerOne Connector Manager One Description http://example.
http://gsa:8000/feeds/connectorManager/ConnectorManagerTwo 2009-03-22T06:31:15.
OneBox Settings Retrieve or update a OneBox setting for a search appliance using the oneboxSetting entry of the config feed. Property Description maxResults Maximum number of OneBox results per search. timeout OneBox response timeout.
OneBox Modules Retrieve the names of and delete OneBox modules from a search appliance using the onebox feed. Note: This API does not support adding, updating, or viewing detailed configuration information for a OneBox module. Property Description logContent The log content for OneBox logs.
To view OneBox information for a search appliance, send an authenticated GET request to the onebox feed URL for a OneBox name: http://Search_Appliance:8000/feeds/onebox/OneBox_Name The result is an entry that includes current individual OneBox values for a search appliance: http://gsa:8000/feeds/onebox/oneboxone 2008-12-15T13:39:42.
The response result is as follows: http://gsa:8000/feeds/command/pauseCrawl 2008-12-11T08:55:57.
The response result is as follows: http://gsa:8000/feeds/stats/documentStatus 2008-12-11T08:38:05.
Creating a Collection To create a new collection, send an authenticated POST request to the following URL: http://Search_Appliance:8000/feeds/collection To create a new collection with a default setting, use the following entry:
The following example shows a sample result: http://gsa:8000/feeds/collection 2008-12-11T08:01:21.253Z
The following example response shows the result: http://gsa:8000/feeds/collection/default_collection 2008-12-11T08:18:04.
Crawl Errors: Value Description 7 Redirect with no location header 11 Document not found (404) 12 Other HTTP 400 Errors 14 HTTP 0 error 15 Permanent DNS failure 16 Empty document 17 Image conversion failed 22 Authentication failed 25 Conversion error 32 HTTP 500 error 33 Robots.
Value Description 26 Unhandled content type 27 No filter for content type 34 Robots.txt forbidden Listing Crawled Documents Query parameters: Parameter Description collectionName Name of the collection that you want to list. The default value is the last used collection. flatList false: List the files and directories that directly belong to an indicated URI. true: List all files starting with an indicated URI as a flat list. The default value is false.
Directory status entry properties: Property Description The URL of a directory. numCrawledURLs The number of crawled documents in a directory. numExcludedURLs The number of excluded URL patterns in a directory. numRetrievalErrors The number of retrieval error for documents in a directory. type DirectoryContentData or HostContentData. Document status entry properties: Property Description The URL pattern of a document to check its status.
http://gsa:8000/feeds/diagnostics/http://server.com/secured/test1/ level_1_0 2009-03-26T04:47:40.813Z 2009-03-26T04:47:40.813Z http://server.
Getting Crawled Document Status Get the status for documents that have been crawled for a collection. Parameter Description collectionName Name of the collection for which you want to list the document status. The default value is the last used collection. To retrieve detailed information for a document, send an authenticated GET request to a document entry of the diagnostics feed. http://Search_Appliance:8000/feeds/diagnostics/ http%3A%2F%2Fserver.com%2Fsecured%2Ftest1%2Fdoc_0_2.
http://gsa:8000/feeds/diagnostics/http%3A%2F%2Fexample.com%2Fdoc.html 2009-03-26T05:41:43.724Z 2009-03-26T05:41:43.724Z
Retrieving Content Statistics for All Document Types To retrieve content statistics for all kinds of document in a search appliance, send an authenticated GET request to the root entry of the contentStatistics feed. http://Search_Appliance:8000/feeds/contentStatistics A list of content statistics entries is returned.
Retrieving Content Statistics for a Document Type To retrieve content statistics for a document type in a search appliance, send an authenticated GET request to the content statistics entry of the contentStatistics feed. http://Search_Appliance:8000/feeds/contentStatistics/text%2Fpdf A content statistics entry is returned.
An example response result is as follows: http://gsa:8000/feeds/command/resetIndex 2008-12-11T09:00:21.
Front Ends, Remove URLs, and Relative OneBoxes Retrieve, update, and delete front ends, remove URLs, and relative OneBox modules for a search appliance using the frontend feed. A relative OneBox is a OneBox module that you assign to work with a front end. Remove URLs are URL patterns that you want to exclude from appearing in an index for a front end. Property Description frontendOnebox OneBox modules for a front end. Specify a comma-separated list of OneBox module names.
The following result is an entry that includes current individual front end values for a search appliance: http://gsa:8000/feeds/frontend/default_frontend 2008-12-15T16:21:26.
Deleting a Front End To delete a front end from a search appliance, send an authenticated DELETE request to the frontend feed URL: http://Search_Appliance:8000/feeds/frontend Output Format XSLT Stylesheet Retrieve and update XSLT template and other output format related properties for each language of each front end using the frontend entry of the outputFormat feed. Parameter Description language Specify a language for the output format properties that you want to retrieve.
The result is an entry that includes all stylesheet information for the designated Front_End and Language_Code: http://gsa:8000/feeds/outputFormat/default_frontend 2008-12-09T23:59:51.
This value overwrites the stylesheet properties specified in the entry to update for the designated Front_End and Language_Code:
Property Description startLine The starting line number of the KeyMatch configuration to change. The minimum value is 0. updateMethod The method to change KeyMatch configurations. Possible values are: • update. Update part of the KeyMatch configuration table to the new configurations. You can also delete KeyMatch configurations using the update method, as shown in “Updating KeyMatch Settings” on page 50. • append. Add a new KeyMatch configuration to the end of the KeyMatch configuration table.
Updating KeyMatch Settings To change KeyMatch settings, send an authenticated PUT request to the following URL: http://Search_Appliance:8000/feeds/keymatch/Front_End The following example appends KeyMatch settings: append image,KeywordMatch,http://images.google.
Use related queries to associate alternative words or phrases with specified search terms. Parameter Description query A query string to perform a full-text search. For example, if you specify computer in the query parameter, then you can view all related query settings that contain the word computer. startLine The starting line number of the results, the default value is 0 lines. maxLines The number of result lines in a response, the default value is 50 lines.
The following example retrieves related queries: http://ent1:8000/feeds/synonym/default_frontend 2008-12-15T06:41:20.954Z PAGE 53The following example replaces all related queries:
Update query suggestion blacklist entries as follows: PUT request URL: http://Search_Appliance:8000/feeds/suggest/suggestBlacklist bad_word_3 ^bad_word_1$ car[0-9]{4}.
The following result is an entry that includes the current serving status values for the search appliance: http://gsa:8002/feeds/status/servingStatus 2014-03-14T16:05:56.
Property Description topCount The number of top queries to be generated. withResults Indicates if a search has results. The default value is false. Listing a Search Report List a search report using the following query parameters: Parameter Description collectionName Collection name for the search report. The default value is all.collections. To list search report entries, send an authenticated GET request to the root entry of the searchReport feed.
http://gsa:8000/feeds/searchReport/aaa@default_collection 2009-03-26T07:26:55.991Z 2009-03-26T07:26:55.
Purpose Format Year year_year Date range range_month_day_year_month_day_year An example request with content is:
The following is a returned search report entry that contains log content (if the content is ready): http://gsa:8000/feeds/searchReport/aaa%40default_collection 2009-03-26T07:14:56.343Z 2009-03-26T07:14:56.
A search log entry is returned: http://gsa:8000/feeds/searchReport/bbb%40default_collection 2009-03-26T07:24:16.099Z 2009-03-26T07:24:16.
Property Description reportCreationDate (Read only) The creation date of a search log. reportDate The dates for the queries that are collected in the search log. reportName (Write only) The report name, which is only needed when creating a search log. reportState (Read only) The status of the search log: 0: Initialized; 1: Report is in progress; 2: Report competed; 3: Non-final complete report is in progress; 4: Last report generation failed.
http://gsa:8000/feeds/searchLog/aaa@default_collection 2009-03-26T06:44:31.094Z 2009-03-26T06:44:31.
A new search log entry generates and returns: http://gsa:8000/feeds/searchLog 2009-03-26T06:42:28.742Z 2009-03-26T06:42:28.
A search log entry with logContent (if content is ready) returns: http://gsa:8000/feeds/searchLog/aaa%40default_collection 2009-03-26T06:22:41.416Z 2009-03-26T06:22:41.
A search log entry returns: http://gsa:8000/feeds/searchLog/bbb%40default_collection 2009-03-26T06:50:05.928Z 2009-03-26T06:50:05.
Configuring a GSA Unification Network Retrieve, update, create, or delete the GSA Unification node configuration and retrieve the node configuration of all nodes in the network on the Google Search Appliance. Property Description applianceId The ID of the search appliance, required to identify the node in node operations. federationNetworkIP The private tunnel IP address (virtual address) for the node. This address must be an RFC 1918 address.
The following is an example of a request body: S4-JAX9N2PQ4GNAB SECONDARY 10.0.0.2 token host1.domain.
Retrieving All Node Configurations To retrieve information on all GSA Unification nodes, send an authenticated GET request to the following URL: http://Search_Appliance:8000/feeds/federation The following example shows a sample result for a secondary node:
Updating a Node Configuration To update the configuration of a node in the GSA Unification network, send an authenticated PUT request to the following URL: http://Search_Appliance:8000/feeds/collection/Appliance_Id Note: Changing the Appliance Id isn’t possible in an update request. In this case the search appliance should be deleted from the network and added again. The following example request body shows the result:
License Information Retrieve license Information for a search appliance using the licenseInfo entry of the info feed. Note: You can only view license information with this API, installing a new license is not supported. Property Description applianceID Provides the identification value for the Google Search Appliance software. This value is also known as the serial number for the software. licenseID Provides the unique license identification value.
Common query parameters for all requests: Parameter Description password The password of the exported configuration The importExport entry properties: Property Description xmlData The content of exported configuration password The password for generating configuration file Exporting a Configuration To export a search appliance configuration, send an authenticated GET request to the importExport entry of the config feed: http://Search_Appliance:8000/feeds/config/importExport?password=12345678 An im
Event Log Retrieve the event log for a search appliance using the eventLog entry of the logs feed. Parameter Description query Query string for the logContent. The logContent contains many lines of logs. The query string applies to each line and only lines that contain the query string are returned. startLine The first logContent lines to retrieve. The default value is 1 line. maxLines The maximum logContent lines to retrieve. The default value is 50 lines.
System Status Retrieve the system status for a search appliance using the systemStatus entry of the status feed. Property Description cpuTemperature Temperature of the CPU: 0 if okay, 1 if caution, 2 if critical. diskCapacity Remaining disk capacity of the search appliance: 0 if okay, 1 if caution, 2 if critical. machineHealth Health of the internal system components: 0 if okay, 1 if caution, 2 if critical.
Shut Down and Reboot Shut down or reboot the search appliance. Property Description command Command sent to the search appliance. The command can be shutdown or reboot. runningStatus Indicates the search appliance status: • shuttingDown if you sent the shutdown command. • rebooting if you sent the reboot command. • running if the search appliance is operating normally.
Index A Administration 69–74 atom:entry element 8 atom:feed element 7 atom:id element 8 atom:link element 8 atom:updated element 9 authentication 6 C collections create 32 delete 34 retrieve 32 update 34 command feed 23, 29 config feed 12, 16, 17, 21, 22, 27, 70, 71 connector managers delete 26 insert 24 retrieve 25 update 26 contentStatistics feed 40, 41, 42 crawl access rule update 21 crawl access rules delete 21 insert 19 retrieve 19 crawl and index 11–34 crawl diagnostics description entry parameters
feeds command 23, 29 config 12, 16, 17, 21, 22, 27, 70, 71 contentStatistics 40, 41, 42 data source 13–17 diagnostics 34, 36, 39 eventLog 72 federation 65 feed 14, 15 frontend 44, 45, 46 info 70 logs 72 onebox 28, 29 outputFormat 46, 47 searchLog 62, 64, 65 searchReport 55, 57, 58, 59 status 30, 54, 73 suggest 54 synonym 50 freshness tuning settings retrieve 23 update 23 front ends delete 46 retrieve 44 frontend feed 44, 45, 46 frontendOnebox property 45 G GSA Unification 65–69 add nodes 66 delete nodes 69
serving status, retrieve 54 shut down a search appliance 74 status and reports 55–72 status feed 30, 54, 73 suggest feed 54 synonym feed 50 system status, retrieve 73 T token, authentication 6 trusted IP addresses 16 U update operations 6 URL patterns crawl 12 recrawl 24 user name 6 X XML elements 7–11 request formats 10 response formats 11 XSLT stylesheet retreive 46 update 47 Google Search Appliance: Administrative API Developer’s Guide: Protocol Index 77