Google Search Appliance Configuring Distributed Crawling and Serving Google Search Appliance software version 7.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-DIST_100.02 December 2013 © Copyright 2013 Google, Inc. All rights reserved. Google and the Google logo are, registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Configuring Distributed Crawling and Serving ...............................................................
Configuring Distributed Crawling and Serving This guide contains the information you need to use distributed crawling and serving, a feature of the Google Search Appliance. Distributed crawling and serving, is a scalability feature in which several search appliances are configured to behave as though they are a single search appliance. This greatly increases the number of documents that can be crawled and served and greatly simplifies search appliance administration.
You can use GSA mirroring with a distributed crawling and serving configuration. If a master or nonmaster primary node in the distributed crawling configuration fails, you can promote the mirror node to function as a primary node in the distributed crawling and serving configuration. Limitations For information about distributed crawling and serving limitations, see Specifications and Usage Limits.
After the distributed crawl configuration is set up, the four search appliances behave as if they are a single search appliance. Crawling, serving, collections, front ends, and other features are configured on Shard 0, the master node of the configuration. Feeds are sent only to the admin master. The crawl process is automatically distributed among the four search appliances. Any of the nodes can serve results.
• The private IP addresses must conform to the private address space as defined in RFC 1918 and must not overlap with any other private address space used on your network. • The private network addresses cannot be in the range spanning subnet /16 to /8. Before You Configure Distributed Crawling and Serving This section provides a checklist of information you need to collect and decisions you need to make before you configure distributed crawling and serving.
Task Description Determine whether the search appliances in the configuration crawled substantially similar bodies of documents. If the search appliances crawled similar bodies of documents, the indexes are substantially similar and rebalancing the index after you set up the distributed crawling and serving configuration will be inefficient. In this situation, Google recommends that you reset the index on the non-master nodes before you set up the configuration. Configure feeds only on the master.
6. Under Distributed Crawling & Serving Administration, click Enable. A configuration form is displayed, listing each shard in the configuration by number. The master node is shard 0. Each additional shard is assigned a number incremented by 1. If there are four search appliances in the configuration, the shards are assigned numbers 0, 1, 2, and 3. 7.
5. Click Add. A form appears on which you enter information about the new node. 6. On the drop-down list, choose Secondary. 7. Type in the node’s GSA Appliance ID. 8. Type in the Appliance hostname or the IP address of the search appliance. 9. Type in the Admin username for the search appliance 10. Type in the Password for the Admin username. 11. Type in the Network IP of the search appliance. 12. Type in the Secret token of this search appliance. 13.
13. If Admin NIC is enabled on the shard that you are adding, click Admin NIC enabled on remote node? and type the IP address of the shard in IP Address. 14. Click Save. 15. Click the GSAn Configuration link. 16. Click Apply Configuration. This broadcasts the configuration data to all appliances in the GSAn network. Note that document serving will be interrupted briefly on the master node after you click Apply Configuration. 17.
Recovering from Node Failure When GSA Mirroring is Enabled When a primary search appliance fails in a distributed crawling configuration and GSA mirroring is enabled, promote a mirror node to primary and update the other search appliances in the configuration by importing a new GSAn configuration file.
Recovering from Node Failure When GSA Mirroring is Not Enabled To recover from a node failure when GSA mirroring is not enabled, you must add a new Google Search Appliance to the configuration. If you do not have an additional search appliance, delete and recreate the distributed crawling and serving configuration and recrawl the content. When the Failed Node is the Master Node To recover from a node failure when GSA mirroring is not enabled and the failed node is the master node: 1.