Google Search Appliance Administrative API Developer’s Guide: Java Google Search Appliance software version 7.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-JAVAAPI_100.07 December 2013 © Copyright 2013 Google, Inc. All rights reserved. Google and the Google logo are, registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Administrative API Developer’s Guide: Java ..................................................................
Administration License Information Import and Export Event Log System Status Shutdown or Reboot 50 50 51 52 53 54 Index .......................................................................................................................
Administrative API Developer’s Guide: Java Introduction This guide provides Java programming information about how to use the Google Data API to create, retrieve, update, and delete information for one or more Google Search Appliance devices. Use the information in this guide to create or learn about coding Java applications that programmatically set the administrative functions for the Admin Console of a search appliance.
• Apache Ant version 1.7 or later (http://ant.apache.org/). • Admin Console user name and password for the search appliance to which you direct your commands. After you download the software and acquire search appliance credentials, get started as follows: 1. Browse to the Administrative API download site (https://code.google.com/p/google-enterprisegdata-api). 2. Download the ZIP file gsa-admin-api-java-1.0.1.zip (http://google-enterprise-gdataapi.googlecode.com/files/gsa-admin-api-java-1.0.1.
Building Your Applications You can build your own applications using the client library. Copy the following client library JAR files from the gdata/java/lib folder to your development folder and add the files to your classpath environmental system variable: • gdata-core-1.0.jar • gdata-gsa-1.0.jar • gdata-client-1.0.jar • gdata-client-meta-1.0.jar • gdata-gsa-meta-1.0.jar You can then use the JAR files in your application.
• “Pause or Resume Crawl” on page 20 • “Document Status” on page 21 Crawl URLs Retrieve and update crawl URL patterns on a search appliance using the crawlURLs entry of the config feed. Property Description doNotCrawlURLs Do Not Crawl URLs with the following patterns, separate multiple URL patterns with new line delimiters. followURLs Follow and crawl only URLs with the following URL patterns, separate multiple URL patterns with new line delimiters.
Data Source Feeds Retrieve, delete, and destroy data source feed information for the search appliance using the feed feed. The following parameters let you search for a string and retrieve source statements. Parameter Description query The query string. When used to retrieve all feed information, the query parameter is overloaded to mean the feedDataStore. When getting information about a single feed, the parameter is a query. Each log statement contains a query string to be retrieved.
Retrieving Data Source Feed Information Retrieve all data source feed information from a search appliance using the feed feed: // Send a request and print the response Map queries = new HashMap(); queries.put("query",feedDataSource); GsaFeed myFeed = myClient.queryFeed("feed", queries); for(GsaEntry myEntry : myFeed.getEntries()) { //get information on each myEntry System.out.println("Feed Name: " + myEntry.getGsaContent("entryID")); System.out.
Destroying Data Source Feeds After deleting a data source feed, you can destroy the feed so that the feed no longer exists on the search appliance: myClient.deleteEntry("feed", FEED_NAME); Trusted Feed IP Addresses Retrieve and update trusted feed IP addresses using the feedTrustedIP entry of the config feed. Retrieve the IP addresses of trusted feeds using the trustedIPs property. Property Description trustedIPs Trusted IP addresses. This value is a list of one or more IP addresses.
Crawl Schedule Retrieve and update the crawl schedule for a search appliance. Property Description crawlSchedule The crawl schedule is only available in scheduled crawl mode. The value of crawlSchedule has the format: Day,Time,Duration Where: isScheduledCrawl • Day is a number representing the days of a week: 0 means Sunday and 1 means Monday. • Time is a 24-hour representation of time. The time pertains to the search appliance and not the computer running the application to set the value.
Crawler access rules instruct the crawler how to authenticate when crawling the protected content. Property Description domain Windows domain for NTLM, or empty for HTTP Basic authorization. isPublic Indicates whether to allow users to view results of both the public content (normally available to everyone) and the secure (confidential) content. The value can be 1 to enable users to view content as public, or 0 to require users to authenticate to view secure content.
Retrieve an individual crawler access rule as follows: // Send the request and print the response GsaEntry myEntry = myClient.getEntry("crawlAccessNTLM", "urlPattern"); System.out.println("URL Pattern: " + myEntry.getGsaContent("urlPattern")); System.out.println("User Name: " + myEntry.getGsaContent("username")); System.out.println("Order: " + myEntry.getGsaContent("order")); System.out.println("Domain: " + myEntry.getGsaContent("domain")); System.out.println("Is Public: " + myEntry.
Host Load Schedule Retrieve and update host load schedule information from the search appliance using the hostLoad entry of the config feed. Property Description defaultHostLoad The default web server host load, a float value. This value measures the relative load on the search appliance based on the number of connections that a search appliance can handle.
Updating the Host Load Schedule Update the host load schedule setting in a search appliance as follows: // Create an entry to hold properties to update GsaEntry updateEntry = new GsaEntry(); // Add a property for the Host Load Schedule to updateEntry updateEntry.addGsaContent("defaultHostLoad", "2.4"); updateEntry.addGsaContent("exceptionHostLoad", "* 3 5 1.2 \n www.example.com 1 6 3.6"); updateEntry.addGsaContent("maxURLs", "3000"); // Send the request myClient.
Recrawling URL Patterns If you discover that a set of URLs that you want to have in the search index are not being crawled you can inject a URL pattern into the queue of URLs that the search appliance is crawling. URLs may not appear in the index because changes were made to the web pages, or because a temporary error or misconfiguration was present when the crawler last tried to crawl the URL. Property Description recrawlURLs URL patterns to be recrawled.
Retrieving a List of Connector Managers Retrieve a list of connector managers as follows: // Send the request and print the response GsaFeed myFeed = myClient.getFeed("connectorManager"); for(GsaEntry myEntry : myFeed.getEntries()) { System.out.println("Status: " + myEntry.getGsaContent("status")); System.out.println("Description: " + myEntry.getGsaContent("description")); System.out.println("URL: " + myEntry.
The properties for retrieving a OneBox are as follows: Property Description maxResults Maximum number of results. timeout OneBox response timeout in milliseconds. Updating OneBox Module Settings Update the OneBox settings for a search appliance as follows—in this example three results are requested and the timeout is set to 2000 milliseconds. // Create an entry to hold properties to update GsaEntry updateEntry = new GsaEntry(); // Add properties for the OneBox settings to updateEntry updateEntry.
Retrieve an individual OneBox module’s log information from a search appliance as follows: // Send the request and print the response GsaEntry myEntry = myClient.getEntry("onebox",ONEBOX_NAME); System.out.println("OneBox Log: " + myEntry.getGsaContent("logContent")); Note: You can only retrieve OneBox log entries individually. Deleting a OneBox Module Delete a OneBox module from a search appliance as follows: myClient.
Document Status Retrieve document status using the properties that follow. Property Description crawledURLsToday The number of documents crawled since yesterday. (Note that the time pertains to the search appliance, not the computer sending this request.) crawlPagePerSecond Current crawling rate. errorURLsToday The document errors since yesterday. filteredBytes The document bytes that have been filtered. foundURLs The number of URLs found that match crawl patterns.
Collections Retrieve, update, create, or delete the collections of documents on the search appliance. Property Description collectionName The name of the collection to create, which is only required when creating a new collection. doNotCrawlURLs The URL patterns of content that you want to exclude from this collection. followURLs The URL patterns of content that you want to include in this collection.
Retrieving a Collection Retrieve the attributes of a single collection as follows: // Send the request and print the response GsaEntry myEntry = myClient.getEntry("collection", "default_collection"); System.out.println("Follow URLs: " + myEntry.getGsaContent("followURLs")); System.out.println("Do Not Crawl URLs: " + myEntry.
Crawl Errors: Errors Retrieval Error 7 Redirect without a location header 11 Document not found (404) 12 Other HTTP 400 errors 14 HTTP 0 error 15 Permanent DNS failure 16 Empty document 17 Image conversion failed 22 Authentication failed 25 Conversion error 32 HTTP 500 error 33 The robots.
Excluded Description 26 Unhandled content type 27 No filter for this content type 34 robots.txt forbidden Listing Documents Query parameters: Value Description collectionName Name of a collection that you want to list. The default value is the last used collection. flatList Indicates: false: (Default) List the files and directories specified by the URL. true: List all files specified by a URL as a flat list.
Directory status entry properties: Property Description Entry Name The URL of the directory. numCrawledURLs The number of crawled documents in this directory, numExcludedURLs The number of excluded URLs in this directory. numRetrievalErrors The number of retrieval error documents in this directory. type DirectoryContentData or HostContentData. Document status entry properties: Property Description Entry Name The URL of the document. docState The status of this document.
Viewing Index Diagnostics for a Document Retrieve detailed information about a document by sending an authenticated GET request to a document status entry of the diagnostics feed. The parameter is as follows. Parameter Description collectionName Name of the collection for which you want to view crawl diagnostics. A detailed document status entry is returned.
Detailed document status entry properties: Property Description Entry Name The URL of the document. backwardLinks The number of backward links to this document. collectionList A list of collections that contain this document. contentSize The size of the document content. contentType The type of the document. crawlFrequency The frequency at which the document is being scheduled to crawl, with possible values of seldom, normal, and frequent.
Common query parameters for all requests: Parameter Description collectionName Name of the collection for which you want to view content statistics. Content statistics entry properties: Property Description avgSize The average document size for this content type. Entry Name The MIME type of the documents, such as, plain/text. maxSize The maximum document size for the crawled files with this MIME type. minSize The minimum document size for the crawled files with this MIME type.
WARNING: Resetting an index deletes all the documents in the index. Depending on the number of documents to crawl, crawling an index can take many days to complete. Property Description resetIndex 1 if index is reset, 0 if index is not reset. resetStatusCode Status code for resetting index. resetStatusMessage Status message: ERROR, PROGRESS, READY.
Front Ends: Remove URLs and a Relative OneBox Retrieve, update, insert, or delete front ends to remove URLs or a relative OneBox for the search appliance using the frontend feed. Retrieve a front end using the following properties. Property Description frontendOnebox OneBox Modules that are relative to this front end. This value is a commaseparated list of OneBox names. The OneBox modules are triggered for this front end in the order that you specify.
Inserting Front Ends and Remove URLs Insert a front end and remove a URL from the search results as follows: // Create an entry to hold properties to insert GsaEntry insertEntry = new GsaEntry(); insertEntry.setId(entryUrl); // Add properties to insertEntry insertEntry.addGsaContent("entryID", FRONTEND_NAME); insertEntry.addGsaContent("removeUrls", "http://www.example3.com/"); // Send the request myClient.
Use the following properties to access the XSLT template information. Property Description isDefaultLanguage Set to 1 if the designated language is the default language for the specified front end, set to 0 if not. isStyleSheetEdited Set to 0 if the style sheet is the default stylesheet that has not been previously edited. Set to 1 if the style sheet has been edited. language When retrieving, the language is determined by the language that is specified by the query parameter.
Updating the Output Format XSLT Stylesheet Update the output format stylesheet information in a search appliance as follows: // Create an entry to hold properties to update GsaEntry updateEntry = new GsaEntry(); updateEntry.setId("default_frontend"); // The language parameter is passed as part of // the entry because we cannot use a query parameter updateEntry.addGsaContent("language", "en"); // Indicate that the XSLT stylesheet has default values updateEntry.
Use the following properties to set KeyMatch configurations. Property Description line_number The line_number of the KeyMatch configuration rule. newLines The new KeyMatch configuration to update. This value may include multiple KeyMatch statements. The line delimiter is \n. numLines The number of total result lines. originalLines The original KeyMatch configurations to change. The value may include multiple KeyMatch statements. The line delimiter is \n.
Changing KeyMatch Settings The following example appends KeyMatch settings: // Create an entry to hold properties to append GsaEntry appendEntry = new GsaEntry(); appendEntry.setId("myFrontend"); appendEntry.addGsaContent("updateMethod", "append"); // Prepare new content String newLines = "image,KeywordMatch,http://images.google.com/,Google Image Search\n" + "video,KeywordMatch,http://www.youtube.com/,Youtube\n" + "rss feed,PhraseMatch,http://www.google.com/reader,Reader"; appendEntry.
The following example replaces KeyMatch settings: // Create an entry to hold properties to replace GsaEntry replaceEntry = new GsaEntry(); replaceEntry.setId("myFrontend"); replaceEntry.addGsaContent("updateMethod", "replace"); // Prepare new content String newLines = "image,KeywordMatch,http://images.google.com/,Google Image Search\n" + "video,KeywordMatch,http://www.youtube.com/,Youtube\n" + "rss feed,PhraseMatch,http://www.google.com/reader,Reader"; replaceEntry.
Use the following properties to access related queries. Property Description line number The line number of the related query configuration rule (in all the rules). newLines The new related query configuration to add. This value may include multiple lines of related query statements. The delimiter is \n. numLines The total number of result lines. originalLines The original related query configuration to change. This value may include multiple lines of related query statements. The delimiter is \n.
Retrieving Related Queries Retrieve related queries as follows: Map queryMap = new HashMap(); // Initialize the query map queryMap.put("query", "myQuery"); queryMap.put("startLine", "0"); queryMap.put("maxLines", "50"); // Send the request and print the response GsaEntry myEntry = myClient.getEntry("synonym", "myFrontend", queryMap); Iterator i = myEntry.getAllGsaContents().entrySet().iterator(); while (i.hasNext()) { Map.Entry me = (Map.Entry)i.next(); if (me.getKey().
The following example updates related queries: // Create an entry to hold properties to update GsaEntry updateEntry = new GsaEntry(); updateEntry.setId("myFrontend"); updateEntry.addGsaContent("updateMethod", "update"); // Set the starting line number updateEntry.addGsaContent("startLine", 0); // Provide the original content String originalLines = "airplane,aircraft\ngoogle,googol"; updateEntry.
The query suggestion blacklist supports the regular expressions in the re2 library (http:// code.google.com/p/re2/wiki/Syntax).
Retrieving Search Status Retrieve the current search appliance search status as follows: GsaEntry myEntry = myClient.getEntry("status", "servingStatus"); System.out.println("Queries Per Minute: " + myEntry.getGsaContent("queriesPerMinute")); Reports The sections that follow describe how to configure the Reports features of the Admin Console: • “Search Reports” on page 42 • “Search Logs” on page 45 Search Reports Generate, update, and delete search reports using the searchReport feed.
Property Description topCount The number of top queries to generate. withResults Indicates if a query should only count searches that have results. The default value is false. Listing a Search Report List search report entries by sending an authenticated GET request to the root entry of the searchReport feed. Query parameter: Parameter Description collectionName Collection Name of search report. The default value is all.collections.
The following example generates and returns a new search report entry: GsaEntry insertEntry = new GsaEntry(); insertEntry.addGsaContent("reportName", "bbb"); insertEntry.addGsaContent("collectionName", "default_collection"); insertEntry.addGsaContent("reportDate", "month_5_2009"); insertEntry.addGsaContent("withResults", "true"); insertEntry.addGsaContent("topCount", "100"); myClient.
Search Logs Generate, update, and delete a search log using the searchLog feed. A search log lists all search queries for a specified time frame in a format similar to a common log format (CLF). Search log entry properties: Property Description collectionName (Write-only) The collection name—use only to create a search log.
A list of search log entries will be returned. GsaFeed myFeed = myClient.getFeed("searchLog"); for(GsaEntry entry : myFeed.getEntries()) { System.out.println("Entry Name: " + entry.getGsaContent("entryID")); System.out.println("Report State: " + entry.getGsaContent("reportState")); System.out.println("Report Creation Date: " + entry.getGsaContent("reportCreationDate")); System.out.println("Report Date: " + entry.getGsaContent("reportDate")); System.out.println("Is Final: " + entry.
A search log entry with logContent, if content is ready, is returned. Map queries = new HashMap(); queries.put("query","User"); queries.put("startLine","1"); queries.put("maxLine","10"); GsaEntry entry = myClient.queryEntry("searchLog", "bbb@default_collection", queries); System.out.println("Entry Name: " + entry.getGsaContent("entryID")); System.out.println("Report State: " + entry.getGsaContent("reportState")); System.out.println("Report Creation Date: " + entry.
GSA Unification is also known as dynamic scalability. The federation feed provides GSA Unification features. Configuring a GSA Unification Network Retrieve, update, create, or delete the GSA Unification node configuration and retrieve the node configuration of all nodes in the network on the Google Search Appliance. Property Description applianceId The ID of the search appliance, required to identify the node during node operations.
Adding a GSA Unification Node Add a GSA Unification node as follows: // Create an entry to hold properties to insert GsaEntry insertEntry = new GsaEntry(); insertEntry.setId(entryUrl); // In the following example code, add a secondary // node with arbitrary values for the various settings. // Add properties to insertEntry insertEntry.addGsaContent("entryID", "node_appliance_id"); insertEntry.addGsaContent("nodeType", "SECONDARY"); insertEntry.addGsaContent("federationNetworkIP", "10.0.0.2"); insertEntry.
Updating a Node Configuration Update the configuration of a node as follows: // Create an entry to hold properties to update GsaEntry updateEntry = new GsaEntry(); // Add properties to updateEntry updateEntry.addGsaContent("entryID", "applianceId"); updateEntry.addGsaContent("nodeType", "PRIMARY"); updateEntry.addGsaContent("federationNetworkIP", "10.0.0.3"); updateEntry.addGsaContent("secretToken", "new_secret_token"); updateEntry.addGsaContent("hostname", "new_hostname"); updateEntry.
Retrieving License Information Retrieve license information using the following properties. Property Description applianceID Provides the identification value for the Google Search Appliance software. This value is also known as the serial number for the search appliance. licenseID Provides the unique license identification value. licenseValidUntil Identifies when the search appliance software license expires. maxCollections Indicates the maximum number of collections.
Exporting a Configuration Export a search appliance configuration by sending an authenticated GET request to the importExport entry of the config feed. The following importExport entry is returned: Map queries = new HashMap(); queries.put("password","12345678"); GsaEntry entry = myClient.queryEntry("config", "importExport", queries); System.out.println("XML Data: " + entry.
Retrieving an Event Log Retrieve the event log information from a search appliance as follows: Map queries = new HashMap(); queries.put("query","User"); queries.put("startLine","10"); queries.put("maxLine","2"); GsaEntry myEntry = myClient.queryEntry("logs", "eventLog", queries); System.out.println("Log Content: " + myEntry.getGsaContent("logContent")); System.out.println("Total Lines: " + myEntry.getGsaContent("totalLines")); System.out.println("From Line: " + myEntry.
Shutdown or Reboot Shut down or reboot the search appliance. Property Description command Command sent to the search appliance. The command can be shutdown or reboot. runningStatus Indicates the search appliance status: • shuttingDown: If you sent the shutdown command. • rebooting: If you sent the reboot command. • running: If the search appliance is operating normally.
Index A Admin Console 6 Administration 50–54 Apache Ant 6 applications, building 7 authentication 7 C classpath environment variable 7 client library JAR files 7 collections create 22 delete 23 retrieve 22 update 23 config feed 8, 11, 15, 18, 51, 52 connector managers add 17 delete 18 retrieve 18 update 18 content statistics, retrieve 29 contentStatistics feed 28, 29 crawl and index 7–23 crawl diagnostics query parameters 25–26 retrieve document information 27 status values 23–25 crawl mode, update 12 cra
front ends delete 32 insert 32 retrieve 31 update 31 frontend feed 31 G GSA Unification add nodes 49 configure 47–50 delete node 50 retrieve nodes 49 update node 50 GsaClient object 7 H host load schedule retrieve 15 update 16 I import configuration 51 info feed 50 J JDK version 6 5 K KeyMatch settings retrieve 35 update 36 related queries retrieve 39 update 39 remove URLs 31 reset index reset 30 retrieve status 30 resume crawl 20 S sample applications 6 search logs create 46 delete 47 list entries 4