Google Search Appliance Getting the Most from Your Google Search Appliance Google Search Appliance software version 7.
Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com September 2012 © Copyright 2012 Google, Inc. All rights reserved. Google and the Google logo are registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract.
Contents Chapter 1 Introduction .............................................................................................................. 6 About This Document Using Google Search Appliance Documentation What Is Universal Search? Providing Universal Search with a Google Search Appliance Crawling and Indexing Content Sources Serving Search Results to Users Document Overview Chapter 2 Planning ............................................................................................................
Indexing Database Content Synchronizing a Database Learn More about Database Synchronization Testing Indexed Content Chapter 5 26 28 28 29 Search Experience ......................................................................................................
Showing Document Previews in Search Results Providing Document Previews in Search Results Learn More About Document Previews Customizing the User Interface How Does the Search Appliance Create the User Interface? Tools for Customizing the User Interface Editing an XSLT Stylesheet Learn More about User Interface Customization Collecting Metrics about User Clicks Setting Up Advanced Search Reporting Learn More about Advanced Search Reporting Chapter 6 53 53 54 54 55 55 57 57 58 58 58 Essentials ...........
Chapter 1 Introduction Chapter 1 The Google Search Appliance enables you to provide universal search to your users. You can get the most from your Google Search Appliance by using some or all of its many features to fine-tune and enhance universal search. Become familiar with the Google Search Appliance’s features by reading this document and apply those features that best suit your search solution.
Read more about the Google Search Appliance features and functions that interest you by clicking the links within this document to navigate to relevant documents in the library of public Google Search Appliance documentation. Also, for essential information about deploying a Google Search Appliance, see the Google Search Appliance “Notes from the Field,” http://support.google.com/gsa/bin/answer.py?answer=2721831.
• “Crawling and Indexing Content Sources” on page 8 • “Serving Search Results to Users” on page 8 This section provides an overview of each of these aspects.
For more information about enhancing the search experience, refer to “Using Features to Enhance the Search Experience” on page 30. Document Overview The following table lists the major topics in this document. To read about a specific topic, refer to the section listed in the table.
Chapter 2 Planning Chapter 2 Like other software system deployments, planning is the first and most important phase.
Chapter 3 Setting Up Chapter 3 Setting Up a Search Appliance Before you can start implementing a universal search solution, you need to set up your Google Search Appliance. The following sections provide an overview of the set up process: • “Installing and Configuring a Search Appliance” on page 11 • “Configuring Search Appliances for Load Balancing or Failover” on page 12 These topics are covered in depth in Google Search Appliance documentation.
• On the label on the back of the search appliance • On the Administration > License page in the Admin Console (see “Using the Admin Console” on page 59) • On the Google Enterprise Technical Support web site, if you log in with the credentials used for your Technical Support account If the search appliance experiences any problems during installation or configuration, attach a monitor directly to the search appliance.
The Google Search Appliance has two levels of user accounts: • Administrator accounts (see “Administrator Accounts” on page 13) • Manager accounts (see “Manager Accounts” on page 13) Each type of account has different permissions. Administrator Accounts An administrator has access to all functions in the Admin Console (see “Using the Admin Console” on page 59).
Setting Up User Accounts Set up a user account by selecting an account type and providing user information on the Administration > User Accounts page in the Admin Console, shown in the following figure. Learn More about User Accounts For more information about creating user accounts, refer to the Admin Console help page for the Administration > User Accounts page.
Chapter 4 Crawling and Indexing Chapter 4 After the Google Search Appliance has been set up (see “Setting Up a Search Appliance” on page 11), you can configure the search appliance to crawl the content sources that you identified during the planning phase, as described in “Planning” on page 10. Crawl is the process by which the Google Search Appliance discovers enterprise content and creates a master index. The resulting index consists of all of the words, phrases, and meta-data in the crawled documents.
• Product documentation • Marketing literature The Google Search Appliance supports crawling of many types of formats, including word processing, spreadsheet, presentation, and others. The Google Search Appliance crawls content on web sites or file systems according to crawl patterns that you specify by using the Admin Console. As the search appliance crawls public content sources, it indexes documents that it finds. To find more documents, the crawler follows links within the documents that it indexes.
Configuring Crawl of Public Content To configure a search appliance to crawl a content source, you specify top-level URLs and directory addresses and links that the search appliance should follow by using the Crawl and Index > Crawl URLs page in the Admin Console. In addition to specifying start URLs, you can also specify URLs that the search appliance should not follow and crawl. By default, the search appliance crawls in continuous crawl mode.
If you prefer to have the search appliance crawl according to scheduled times, you must also perform the additional following tasks by using the Crawl and Index > Crawl Schedule page in the Admin Console: 1. Selecting scheduled crawl mode. 2. Creating a crawl schedule. 3. Saving the crawl schedule. To schedule crawling times for a specific host, you can change the host load and times in the Crawl and Index > Host Load Schedule page.
The following table lists the access-control methods that the search appliance supports and whether the methods are supported for crawl, serve, or both. Method Crawl Serve HTTP Basic X X NTLM HTTP X X LDAP (Lightweight Directory Access Protocol) X Forms Authentication X X x.
Managing Serve of Controlled-Access Content When a user issues a search request for controlled-access content, the search appliance verifies the user’s identity and determines whether the user has authorization to view the content. This check is performed before the search appliance displays any content in search results. By performing the results access control checks in real-time, the Google Search Appliance ensures that users only see results they are authorized to view.
• To enable the search appliance to use IWA/Kerberos authentication during secure serve, use the Serving > Universal Login Auth Mechanisms > Kerberos page. • To configure the search appliance to use the Authentication SPI, use the Serving > Universal Login Auth Mechanisms > SAML page.
When connecting to a document repository through an enterprise connector, the Google Search Appliance uses a process called “traversal.” During traversal, the connector issues queries to the repository to retrieve document data to feed to the Google Search Appliance for indexing. The connector manager formats the content and any associated metadata for a feed to the Google Search Appliance, which then creates an index of the documents.
Obtaining the Connector Manager and Connectors To run a connector, you need the software for the connector manager and the connector. The following table lists methods for obtaining the software components that you need to use connectors, as well as the support provided for each component. Component Obtain by Support Source code for the connector manager and connectors Download the code from the Google Enterprise Connector Manager project (http:// code.google.com/p/ google-enterpriseconnector-manager/).
6. If required by the connector, configuring secure crawling of the content management system by using the Admin Console page that is appropriate for the specific connector. 7. Restarting the connector. 8. Verifying that the search appliance is indexing URLs from the connector by using the Status and Reports > Crawl Diagnostics page. Learn More about Connectors For in-depth information about connectors, refer to the Google Search Appliance connector documents.
The following figure provides an overview of indexing hard-to-find content by using feeds.
2. Configuring the search appliance to accept the feed by using the Crawl and Index > Feeds page, shown in the following figure. To prevent unauthorized additions to your index, feeds are only accepted from machines that are specified on this page. 3. Running the feed client script. 4. Monitoring the feed by using the Admin Console. 5. Checking for search results from the feed within 30 minutes of running the feed client script.
The following figure provides an overview of indexing content in databases.
Synchronizing a Database Synchronize a database by performing the following tasks with the Admin Console: 1. Creating a new database source on the Crawl and Index > Databases page, shown in the following figure. 2. Setting URL patterns that enable the search appliance to crawl the database by using the Crawl and Index > Crawl URLs page. 3. Starting a database synchronization by using the Crawl and Index > Databases page.
Testing Indexed Content Once the content has been crawled and indexed, you can ensure that it is searchable by using the Test Center. The Test Center enables you to test search across the indexed content, limiting it to specific collections (see “Segmenting the Index” on page 48) or using specific front-ends (see “Using Front Ends” on page 32) and verifying that the correct content is indexed and that the results are what you expect.
Chapter 5 Search Experience Chapter 5 Using Features to Enhance the Search Experience A user’s search experience is based on accessing the Google Search Appliance to enter a search and receive results. The Google Search Appliance provides many built-in features that ensure a satisfactory search experience for users. For a list of these features, refer to “Built-In Search Experience Features” on page 31.
This section briefly describes each feature that you can use to enhance the search experience and contains links that you can follow to get more information about each feature. Built-In Search Experience Features Without any administrator intervention, the Google Search Appliance provides a rich search experience by using its built-in search features. The following table lists these built-in search features.
The URL for the default search page is http://SEARCH_APPLIANCE_NAME. For example, if your search appliance is named compgsa, the built-in search page is available at http://compgsa. You can also link this URL from your website to provide user access to it.
2. Selecting the front end that you want to edit. 3.
Because a KeyMatch is specific to a front end, you can aim a KeyMatch at a specific group of users. Setting Up KeyMatches You set up a KeyMatch by matching a search term to a specific URL and specifying a title for the match. In the preceding example, there are two KeyMatches for the search term “gsa.” In the first KeyMatch: • The URL is http://pm.altostrat.com/products/products/gsa • The title is “Google Search Appliance” In the second KeyMatch: • The URL is https://pm.altostrat.
Suggesting Alternative Search Terms along with Results The Google Search Appliance can suggest alternative search terms to users for their original keyword searches with its related queries feature. When users search with that term, the search appliance always presents the related query at the top of the search results. When a user clicks the related query, the search appliance runs the search again and returns additional results.
2. Setting up the related query on the Serving > Front Ends > Related Queries page, shown in the following figure. 3. Saving the related query. Learn More about Related Queries For in-depth information about setting up and using related queries, refer to “Using Related Queries to Suggest Alternative Searches” in Creating the Search Experience. Grouping Search Results by Topic The Google Search Appliance can group search results by topic with its dynamic result clusters feature.
For example, suppose a user who looking for information about the expense budget for NYSSA. He searches for this information using the term “expense nyssa.” A dynamic result cluster appears with the results, as shown in the following figure. Setting Up Dynamic Result Clusters By default, dynamic result clusters are disabled.
2. Enabling dynamic result clusters and specifying their placement by using either the Page Layout Helper or the XSLT Stylesheet Editor on the Serving > Front Ends > Output Format page, shown in the following figure. 3. Saving the page layout. Learn More about Dynamic Result Clusters For in-depth information about setting up and using dynamic result clusters, refer to “Using Dynamic Result Clusters to Narrow Searches” in Creating the Search Experience.
Providing Options for Navigating Search Results In many cases, content already has considerable metadata associated with it. As a search appliance administrator, you can use metadata to help users explore search results by using dynamic navigation. With dynamic navigation, when a user clicks on a metadata attribute value, the search results are filtered to contain results from the original search query that also have that specific attribute value.
2. Saving the dynamic navigation configuration. 3. Showing dynamic navigation attributes in a front end by using the Page Layout Helper on the Output Format tab of the Serving > Front Ends page. Learn More about Dynamic Navigation For information about using dynamic navigation, refer to “Using Dynamic Navigation to Help Users Explore Results” in Creating the Search Experience.
4. Perform the following tasks: a. Configuring a collection containing expert data b. Selecting meta tags for the configuration c. Configuring expert layout Learn More about Expert Search For complete information about enabling and configuring expert search, click Help Center > Social Connect > Expert Search in the Admin Console.
• External provider—The OneBox module calls a URL to get data from an external application that returns information as XML. Setting Up a One Box Module Before you can set up a OneBox module, you must choose a front end where you want to implement it. For information about front ends, refer to “Using Front Ends” on page 32. Set up a OneBox module by: 1.
Integrating Personal Content from Google Apps Relevant results for a search query can include information from outside the search index, including personal content from Google Apps, as shown in the following figure. Integrating personal content is a feature that enables the Google Search Appliance to serve both private and public content directly from your Google Apps domain.
Set up integrating personal content by performing the following tasks with the Admin Console: 1. Enabling integration of personal content with a search appliance by using the Cloud Connect > Google Apps page, shown in the following figure. 2. Showing personal Google Apps content in a front end by using the Page Layout Helper on the Output Format tab of the Serving > Front Ends page.
For example, you might want to restrict search results by domain to ensure that searches in various regions return only results with local information. Suppose you want to restrict results on the pages in the United Kingdom to show only products and special offers available there, so you create a front end for users in the U.K. Suppose the domain name for the U.K. is www.mycompany.com.uk. You might use this domain name to create a domain filter so that when users in the U.K.
Controlling Automatic Searching of Synonyms The Google Search Appliance can automatically widen a search by adding terms that are synonymous with the search terms through query expansion. Query expansion helps users get search results that they would otherwise miss. When a user searches on a term, the search appliance expands the search to include synonymous terms.
For example, given the search term “AltoStrat,” you might want code or design documents to appear high in search results for the engineering group, while you might want product specifications to appear higher for the marketing group. The following figure illustrates two different search results rankings for the same search term, “AltoStrat.” Because result biasing is specific to a front end, you can aim result biasing at a specific group of users.
2. Configuring source biasing, date biasing, or metadata and entity biasing for the policy on the Serving > Result Biasing > Edit page, shown in the following figure. A menu-driven interface allows weak or strong increases or decreases, and requires no complex coding or scripting. You can use 11 settings to adjust result biasing from least influence to most influence. 3. Enabling the result biasing policy by selecting it for use with a front end on the Serving > Filters page in the Admin Console.
• “Corporate Policies,” for any staff to search for policy documents • “Engineering,” for technical users and other user who need to search for engineering documents • “Europe Offices,” for users who are geographically located in the European offices • “Marketing,” for marketing staff to search for marketing documents • “Sales,” for sales staff to search for sales documents To search a collection, a user can select the collection name from a pull-down menu on the search box, as illustrated in the
Set up a collection by performing the following steps with the Admin Console: 1. Providing a name on the Crawl and Index > Collections page, shown in the following figure. 2. Entering the URL patterns you want to include in the collection, as well as URLs that you don’t want to include. 3. Saving the collection. Before you add a pull-down menu for searching by collection, you must choose a front end where you want to implement it. Add a collection menu by performing the following steps: 1.
Setting Up User Results To add a user results configuration, use the Social Connect > User Results page. For each configuration, you can specify: • A name for the configuration. • A description of the configuration. • Whether user results are moderated, that is, if they require administrator approval before appearing in search results, and which front ends use the configuration.
A user sets up an alert by clicking My Alerts on the search page, logging in to the search appliance by using her LDAP user name and password, and choosing an hourly, weekly, or monthly schedule. After the user creates an alert, the search appliance sends the user an email whenever it finds new or changed documents about the topic of interest. Setting Up Alerts Alerts require that the user authenticate using their LDAP credentials.
Displaying Translations of Search Results The Google Search Appliance can translate titles and snippets in search results, as well as cached documents into the user’s language in real time. The user’s language is determined by the default language set in the user’s browser. When translation is enabled, translation links appear in search results. The user can translate everything on the page or just individual titles, snippets, or cached documents.
• Refeed content feed data sources • Resync content from databases If you upgrade to 7.0 from an older version, your content must be recrawled, resync'd or refed after enabling this feature to get document previews. Learn More About Document Previews For complete information about providing document previews, click Help Center > Serving > Document Preview Module in the Admin Console.
• Displaying a public and secure search radio button The AltoStrat examples in this document present a customized user interface, as shown in the following figure. How Does the Search Appliance Create the User Interface? After the search appliance receives and executes a search query: 1. The search appliance returns search results in XML. 2. The search appliance applies an XSLT stylesheet to the XML results and creates the search results page in HTML. 3.
Using the Page Layout Helper Even if you do not have any special knowledge of XSLT, you can effectively customize a Google Search Appliance user interface using the Page Layout Helper. Use the Page Layout Helper to perform the following tasks: • Changing Global Attributes—In the Global Attributes section, you can quickly put your logo on pages, specify the fonts to use, and add the HTML header and the HTML footer code used on your web site.
Using the XSLT Stylesheet Editor If the elements that you want to change are not available in the Page Layout Helper, you must use the XSLT Stylesheet Editor to change them. This editor enables you to make changes directly in the XSLT stylesheet. The XSLT stylesheet contains sections for various components, preceded by comments so that you know whether a section can be customized. To work in the XSLT Stylesheet editor, you need knowledge of XSLT, XML, and HTML.
Collecting Metrics about User Clicks The Google Search Appliance’s advanced search reporting feature enables you to gather information about user clicks on search results. By using advanced search reporting, you can determine: • If users a finding what they’re searching for • If groups of users are searching for the same information • If certain URLs are harder for users to find than others By analyzing user clicks, you can also identify ways to improve the search experience.
Chapter 6 Essentials Chapter 6 Using the Admin Console After the search appliance has been installed and configured, you can begin to use the Admin Console to crawl and index content sources in your organization, as well as to enhance, fine-tune, and optimize your search solution. The Admin Console is a web-based interface with pages that you use to set up and manage a search appliance.
Logging in to the Admin Console Log in to the Admin Console by entering your administrator User Name and Password. You can log in to the Admin Console using HTTP or HTTPS: • For a secure connection, use HTTPS on port 8443. Using HTTPS provides better protection for passwords and other information. • For an insecure connection, use HTTP on port 8000. Using HTTP increase the risk of exposing passwords and other information to users on the network who are not authorized to see such information.
By clicking the Help Center link, which appears on each Admin Console page, you can navigate to the Help Center Welcome page. From this page, you can browse various help topics. By clicking a help link for a section of a page, you can navigate to context-sensitive help about the page section. Using Language Options The Google Search Appliance supports search and indexing in almost every language.
• Japanese • Korean • Norwegian • Polish • Portuguese-Brazil • Portuguese-Portugal • Russian • Slovak • Spanish • Swedish • Thai • Turkish • Vietnamese The language of the Admin Console is determined by the language setting in your browser. If the Admin Console does not appear in the language that you prefer, make sure that your browser is set for the preferred language.
The search appliance allows multiple stylesheets that present the search page, advanced search, and results pages in different languages, all associated with a single front end. The language-specific stylesheet is selected based on the Accept-language header sent from the user’s browser. The stylesheet is selected from the set of languages marked “active”; if there is no match, the default language is used.
• Italian • Japanese • Korean • Latvian • Lithuanian • Norwegian • Polish • Portuguese • Romanian • Russian • Spanish • Swedish • Turkish Selecting Languages for Filtering To select languages for filtering search results, use the Filters tab on the Serving > Front Ends page in the Admin Console.
Whenever a user enters a search query that matches a synonym in one of these languages, the term is expanded. Enabling a Language Synonyms File You can enable or disable a synonyms file by using the Serving > Query Settings page in the Admin Console. Learn More about Language Synonyms Files For information about language synonyms files, refer to “Using Preconfigured Local Query Expansion Files” in Creating the Search Experience.
• Formatting search results by using an XSL stylesheet associated with a specific Front End • Limiting search results to the contents of a specified collection Restricting Searches Use query terms to restrict a search.
• A scripting language, such as Python Learn More about the Feeds Protocol For complete documentation on feeds, refer to the Feeds Protocol Developer’s Guide.
• x.509 Certificates for user authentication When using the SAML Authorization SPI to serve secure content results from SMB shares, you must use Kerberos for user authentication. Useful Knowledge for Writing Web Services To write an Identity Provider or Policy Decision Point web service, you need a basic understanding of the following technologies. • XML—Extensible Markup Language • SAML 2.0—An XML-based standard whose primary use case is inter-domain single sign-on • SOAP 1.
Learn More about Developing Custom Connectors For information about developing a connector, refer to the Connector Developer’s Guide, http://googleenterprise-connector-manager.googlecode.com/svn/docs/devguide/. Monitoring a Search Appliance The Google Search Appliance provides extensive reports (see “Using Search Appliance Reports” on page 69) that can help you to analyze the content that has or has not been indexed and why.
Report Description Admin Console page Search reports A search report is a summary of information about user search queries for a specified timeframe. Status and Reports > Search Reports Search logs Search log reports provide a monthly, weekly or daily snapshot of search activity, segmented by collection. For each time period, the report shows the top 100 queries, top no match searches, traffic by day and hour, an so on.
The search appliance Admin Console also provides assistance in the form of help pages. For more information about this type of help, refer to “Using the Admin Console Help Center” on page 60. Getting Help from Google Enterprise Support Google provides technical support for the Google Search Appliance on the Enterprise Technical Support web site. The support term for your Google Search Appliance is generally two years. Your Google support account generally begins upon shipment of your search appliance.
Learn More about Google Partners You can find a directory of Google partners at the Google Enterprise Search Partner Directory (http:// www.google.com/enterprise/search/partners/index.html). This site links customers to vendors whose solutions integrate and extend Google’s communication, collaboration, and enterprise search products. You might also visit the Google Apps Marketplace (http://www.google.
Chapter 7 Quick Reference Chapter 7 Google Search Appliance Administration Checklist The following table provides an checklist of common activities for administering the Google Search Appliance. To read about a specific activity, refer to the section listed in the table.
Activity Described in Section Enabling a search appliance to return expert profile information with keyword searches “Displaying Expert Profiles with Search Results” on page 40 Enabling a search appliance to return real-time, structured data with search results “Providing Real-Time Connectivity to Business Applications” on page 41 Enabling a search appliance to serve results from Google Apps (documents, spreadsheets, presentations, and sites) “Integrating Personal Content from Google Apps” on page 43
Index A access-control methods 19 accounts administrator 13 manager 13 setting up 14 user 12 Admin Console description 59 help center 60 languages 61 logging in 60 Administration > Certificate Authorities page 19 Administration > License page 12 Administration > SNMP page 70 administrator accounts 13 advanced search page 31 advanced search reporting 58 alerts 51 anaytics 58 appliance ID 11 authentication methods 19 SAML SPI 67 authorization methods 19 SAML SPI 67 C cached pages 31 certificates, x.
document previews 53 duplicate directories 31 duplicate snippets 31 dynamic navigation 39 dynamic result clusters description 36 setting up 37 indexing content sources 8 databases 26 non-web repositories 21 installing a search appliance 11 Integrated Windows Authentication 19 intranets, crawling and indexing 8 E K Enterprise Support 71 expert search 40 Kerberos 19 KeyMatch tab 33 KeyMatches description 33 setting up 34 F failover 12 feed client 25 feeds 24 pushing 25 writing applications 66 xml docume
R rankings 46 real-time diagnostics 69 related queries 35 Related Queries tab 33 relevance 31 Remove URLs tab 33 reports 69 result biasing description 46 setting up 47 results grouping 31 language filtering 63 translation 53 results page, default 31 S SAML Authentication SPI 67 Authorization SPI 67 Service Provider Interfaces 19 web services 68 search 7 Search Box, Page Layout Helper 56 search experience 30 search logs 70 search page, default 31 search protocol 65 search reports 70 search results formattin