Google Search Appliance Configuring GSA Unification Google Search Appliance software version 6.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com November 2011 © Copyright 2012 Google, Inc. All rights reserved. Google and the Google logo are registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Configuring GSA Unification .........................................................................................
Configuring GSA Unification This guide contains the information you need to configure GSA unification. GSA unification, also called a unified environment, is a Google Search Appliance feature in which a group of search appliances is configured so that a body of documents spread out over several search appliances can be searched by a single search query.
One search appliance in the configuration is designated the primary search appliance or primary node. The other search appliances are designated the secondary search appliances or secondary nodes. Unified environments are typically set up so that end users’ search queries are directed to the primary search appliance. The primary search appliance searches its own index and issues a query to the indexes on the secondary search appliances.
• Search Appliance C indexes accounting documents. It is a secondary search appliance. Here’s what happens when a user wants to search for technical support, sales, and accounting information about a particular customer, Buzzword Advertising. 1. The user browses to the search page for Search Appliance A, the primary search appliance in the configuration, and types Buzzword Advertising in the search box. 2. Search Appliance A searches its local index, which contains sales and marketing documents. 3.
A particular search appliance is able to act as both a primary and secondary node in relation to another search appliance. The following example illustrates a pair of unified environments consisting of two search appliances. In unified environment A, Search Appliance A is the primary node and Search Appliance B is the secondary node. In unified environment B, Search Appliance A is the secondary node and Search Appliance B is the primary node.
If a user needs to search documents in a collection that is not included in a composite collection, the user must use the search page for that collection’s search appliance instead of the search page on the primary search appliance. How Crawling and Indexing Work Crawling and indexing in a unified environment are similar to crawling and indexing in single search appliance deployments.
• Each search appliance in the configuration must be able to ping the other search appliances on their public IP address. • The private IP addresses you choose must conform to the private address space as defined in RFC 1918 and must not overlap with the private address space used by the subnet to which the appliances are connected. For example, if the subnet where the search appliances are deployed uses 10.0.0.0/8, choose the private IP addresses from the 192.168.0.0/24 network. If the 192.168.0.
Using Authorization on the Primary Search Appliance Use authorization on the primary search appliance when you want all authorization to be performed on the primary search appliance. The following table tells you how to configure the primary and secondary search appliances when authorization is performed only on the primary search appliance.
Type of User Authentication How the User is Authenticated and Results are Authorized What to do on the Primary Search Appliance What to do on the Secondary Search Appliances Forms-based authentication with user impersonation for secure serve User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance.
The following table tells you how to configure the primary and secondary search appliances for delegated authorization. Note that forms authentication with IP binding, in which the authentication cookie is restricted to a single IP address, is not supported.
Type of User Authentication How the User is Authenticated and Results are Authorized What to do on the Primary Search Appliance What to do on the Secondary Search Appliances Multiple content sources protected by forms auth rules On the Universal Login form, URL patterns belonging to each of the content sources can be mapped to a Sample URL. The sample url will be used with the user name and password the user enters for that credential group. Configure one credential group for each content source.
Type of User Authentication How the User is Authenticated and Results are Authorized What to do on the Primary Search Appliance What to do on the Secondary Search Appliances Single connector using metadata and URL feeds Authorization is performed by sending a HEAD request. Any authentication method can be used for authenticating users. Authentication is performed on the primary GSA.
5. On a client machine that uses the Kerberos protocol for user authentication, open a browser. For example, this might be a Windows client machine that uses Active Directory on a Windows server for authentication. 6. Make sure the browse is configured as described in “Kerberos-Based Authentication” in Managing Search for Controlled-Access Content. 7. On the primary search appliance, go to the search page. 8.
In a unified environment of three search appliances, the administrator might configure a composite collection called MasterCollection on Search Appliance A as described in the following table.
• File type filters • Domain filters • Metatag filters About Crawl Patterns Unified environments function more efficiently when the set of URLs crawled on one node has few or no links to URLs crawled on other nodes. Google recommends that you set up the crawl patterns on each node so that there is minimal interlinking among the nodes.
About Timeout Intervals and Result Biasing The timeout interval and scoring bias parameters are set on the GSA Unification > Host Configuration page for each search appliance in the configuration. The timeout interval determines how long the primary search appliance waits before timing out a request to a particular secondary node.
Task Description Determine the secret token that the search appliances will use to recognize each other within the unified environment. The nodes in a unified environment use the secret tokens to authenticate to each other. The secret token must include only printable ASCII characters. Each search appliance in a unified environment has its own associated secret token, which you specify on the GSA Unification > Host Configuration page.
2. Complete the “Unified Environment Checklist” on page 18. 3. Log in to the Admin Console on the primary node. 4. Complete the GSA Unification > Host Configuration page on the primary node. 5. Log in to the Admin Console on each of the secondary nodes. 6. Complete the GSA Unification > Host Configuration page on each secondary node. 7.
5. Navigate to GSA Unification > Node Configuration. 6. Click the Edit link for each of the secondary search appliances in the configuration. 7. On the Test Mode drop-down list, choose Yes. 8. Click Save. 9. After you edit each of the secondary search appliances, click Apply Changes. 10. Log in to each of the secondary search appliances. 11. Navigate to GSA Unification > Node Configuration. 12. If any secondary search appliances are listed, perform the following steps. a.
If you want to set up mirroring, but not in a unified environment, use GSA mirroring in a GSAn configuration. For more information, see Configuring GSA Mirroring. To configure mirroring in a unified environment: 1. On the Admin Console for the primary node, navigate to GSA Unification > Nodes Configuration. 2. In the Nodes in Your GSA Unification Network section, click Add. 3. On the drop-down list, designate the remote search appliance for mirroring as a Replica node. 4.
Users See 404 Errors After Clicking Results Different configuration problems cause 404 errors when users click search results. Check the URL patterns in the Follow and Crawl Only URLs settings on the primary and secondary search appliances. Ensure that all Follow and Crawl Only URLs on the secondary appliances also appear on the primary search appliance. If you are using a database crawl, a user might see a 404 error after clicking a search result.