Google Search Appliance Configuring Distributed Crawling and Serving Google Search Appliance software version 6.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com November 2011 © Copyright 2012 Google, Inc. All rights reserved. Google and the Google logo are registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Configuring Distributed Crawling and Serving ...............................................................
Configuring Distributed Crawling and Serving This guide contains the information you need to use distributed crawling and serving, a feature of the Google Search Appliance. Distributed crawling and serving, is a scalability feature in which several search appliances are configured to behave as though they are a single search appliance. This greatly increases the number of documents that can be crawled and served and greatly simplifies search appliance administration.
You can use GSA mirroring with a distributed crawling and serving configuration. If a master or nonmaster primary node in the distributed crawling configuration fails, you can promote the mirror node to function as a primary node in the distributed crawling and serving configuration. Limitations The following limitations apply to distributed crawling and serving in this release. • All search appliances must be in the same data center.
Distributed Crawling Overview In the following diagram, four search appliances are configured with distributed crawling. Each search appliance is designated as a particular shard in the distributed crawling configuration. Shard 0 is the master search appliance. The shard number is incremented by 1 for each additional search appliance in the configuration. The distributed crawling configuration is created on the master and the settings are exported in a configuration file.
Serving from Master and Nonmaster Nodes In this release, you can serve results from both the master and nonmaster nodes in distributed crawling and serving configurations whether or not you have replicas configured and regardless of whether the mirroring configuring is active-active or active passive. If you are using a load balancer, a client creates a separate session for each node that it uses.
Before You Configure Distributed Crawling and Serving This section provides a checklist of information you need to collect and decisions you need to make before you configure distributed crawling and serving. Task Description Determine which Google Search Appliance will participate in the configuration. Any Google Search Appliance model running software version 6.0 or later can participate, but all search appliances must be the same model running the same software version.
Task Description If you are using Kerberos, ensure that you configure Kerberos on the master and nonmaster nodes. Kerberos keytab files are unique and cannot be used on more than one search appliance. You must generate and import a different Kerberos keytab file for each search appliance. When you configure Kerberos on a non-master node, use a different Mechanism Name from the one used for the master.
9. Click Add. A form appears on which you enter information about the new node. 10. On the drop-down list, choose Primary. 11. Type in the node’s GSA Appliance ID. 12. Type in the Appliance hostname or the IP address of the search appliance. 13. Type in the Admin username for the search appliance 14. Type in the Password for the Admin username. 15. Type in the Network IP of the search appliance. 16. Type in the Secret token of this search appliance. 17. Click Save. 18. Click the GSAn Configuration link.
13. Click Save. 14. Click the GSAn Configuration link. 15. Click Apply Configuration. This broadcasts the configuration data to all appliances in the GSAn network. Note that document serving will be interrupted briefly on the master node after you click Apply Configuration. 16. Optionally, click Export and save the distributed crawling configuration file to your local computer. 17. On the admin master node, click Status and Reports > Resume Crawl.
3. Click Administration > Reset Index and click Reset the Index Now. 4. Log in to each node and reset the index on each node. 5. On the master node, click GSAn > Configuration. 6. Click the Edit link for the shard configuration that contains the failed node. 7. Delete the node you want to delete. 8. Click Save. 9. Click the GSAn Configuration link. 10. Click Apply Configuration. This broadcasts the configuration data to all appliances in the GSAn network.
3. Log in to the Admin Console of the previous replica node to promote it as new master node. 4. Re-configure GSAn Distributed crawling and serving by selecting a previous non-master node as non-master to this master node. 5. Click Apply Configuration. This broadcasts the configuration data to all appliances in the GSAn network. Note that document serving will be interrupted briefly on the master node after you click Apply Configuration.
When the Failed Node is Not the Master Node To recover from a node failure when GSA mirroring is not enabled and the failed node is not the master node: 1. Log in to the Admin Console of the master search appliance in the distributed crawling configuration. 2. Click GSAn > Configuration. 3. Edit the shard containing the failed node. 4. Delete the failed node. 5. Click Save. 6. Add the new search appliance to the configuration. 7. Click Save. 8. Click the GSAn Configuration link. 9.