Google Search Appliance Connectors Administration Guide Google Search Appliance Connectors software version 4.0.3 Google Search Appliance software version 7.
Table of Contents About this Guide 1 About Connectors 4.0 The Lister/Retriever model 2 What’s New in Connectors 4.
Stop running a connector 4 Enable Connector Security Certificate Authorities Self-signed certificates Create a self-signed certificate for the GSA Firefox Chrome OpenSSL (command line) Create a self-signed certificate for the connector Exchange certificates Turn on security with the server.
About this Guide This Administration Guide is intended for anyone who needs to understand how to manage Google Search Appliance (GSA) Connectors 4.0. It provides overview information about the Connectors, as well as procedures that you can follow to install, configure, or monitor each of the Connectors. The guide assumes that you are familiar with Windows or Linux operating systems and configuring the Google Search Appliance by using the Admin Console.
1 About Connectors 4.0 Google Search Appliance connectors enable the Google Search Appliance to acquire content from external repositories and provide that content in search results. A Google Search Appliance with configured connectors can perform fast, unified, secure search across multiple systems and document repositories. A fundamental strength of the search appliance is discovering enterprise content in web pages and indexing it.
The Lister/Retriever model Connectors 4.0 are based on the lister/retriever model. In this model, the lister notifies the search appliance of the names of documents in the repository and the retriever sends URLs and ACLs in feeds to the search appliance. The search appliance uses the URLs to crawl documents in the repository over HTTP/HTTPS. Each document in a repository is identified by a unique document identifier (DocId).
2 What’s New in Connectors 4.0? In release 4.0, connectors work seamlessly with more search appliance features than previous releases. Noteworthy features of Connectors 4.
Off-Board installation All Connectors 4.0 are installed on a separate host server rather than the search appliance itself. For more details on this topic, see Download the connector software. Connector configuration Because Connectors 4.0 are not built-in the search appliance, they are not configured through the search appliance Admin Console, as in previous releases. Configuration is handled in the adaptor-config.properties file. In release 4.
SAML security messages In this release, SAML is the communication protocol between the search appliance and the connector for user authentication and authorization. This protocol replaces XML, which was used in previous releases. Simplified troubleshooting A strength of Connectors 4.0 is simplified troubleshooting.
3 General Information This section contains general information about Connectors 4.
Configuration properties file Configuration is handled in the adaptor-config.properties file. Each connector installation procedure in this documentation contains steps for creating its adaptorconfig.properties file and adding minimal variables to make the connector operational. For example, after you create the configuration file, you need to add the server.port variable to point to the retriever port. For more information about configuration, see the section that pertains to your connector.
Secure crawling and serving configurations Connectors 4.0 support the authentication and authorization configurations for crawling and serving that the GSA administrator configures for the search appliance. For information about secure crawling and serving configurations, see Managing Search for Controlled-Access Content. Admin Console access If the search appliance only allows HTTPS access to the Admin Console, then the connector must be running in secure mode.
ACLs are stored in the search appliance’s group database. Mark all documents as public Adding the variable adaptor.markAllDocsAsPublic=true to the adaptorconfig.properties file enables you to treat all users as if they are members of all groups, thereby giving them access to all documents. The default value for adaptor.markAllDocsAsPublic is “false.
Download the connector software All connectors 4.0 must be installed on a host machine. This connector version does not support installing connectors on the Google Search Appliance. To download the software for a connector, visit http://googlegsa.github.io/adaptor/index.html. Executables are available for all the 4.0 connectors. Google provides the installation software for each 4.0 connector in a single binary file, as listed in the following table.
To register a connector as a service: 1. Download and extract prunsrv.exe from the latest Windows binary download of Apache Commons Daemon. If you are running on 64-bit Windows and will use a 64bit JVM, then you should use the prunsrv.exe in the amd64/ directory. 2. Place prunsrv.exe in the same directory as the connector you would like to run as a service. 3. In the same directory where the connector .
com.google.enterprise.adaptor.sharepoint.SharePointUserProfileAda ptor com.google.enterprise.adaptor.fs.FsAdaptor com.google.enterprise.adaptor.ad.AdAdaptor is the full path to the output log, for example, C:\sp\logs\stdout.log is the full path to the error log, for example, C:\sp\logs\stderr.log is the path to where Java Virtual Machine dynamic link library is installed, for example, C:\Java\jdk1.7.0_67\jre\bin\server\jvm.
Use the ServicePassword to specify the password for the account designated by the ServiceUser parameter, as shown in the following example: --ServicePassword password ^ Jvm options Use the JvmOptions parameter to specify a JvList of options in the form of -D or X that will be passed to the JVM, as shown in the following example: ++JvmOptions=-Djava.util.logging.config.file=logging.
Where is the internal name of the connector: ● SharePoint: adaptor-sharepoint ● SharePoint User Profiles: adaptor-sharepoint-user-profile ● File Systems: adaptor-fs ● Active Directory: adaptor-ad To stop running a connector by using the search appliance Admin Console, perform either or both of the following actions: 1. On the Content Sources > Diagnostics > Crawl Status page, click Pause Crawl. 2.
4 Enable Connector Security In secure mode, the connectors communicate with the Google Search Appliance over HTTPS. You can enable security for any connector by configuring certificates and turning on security. Take note that you must enable security for the Connector for SharePoint and the Connector for SharePoint User Profiles.
● Exchange certificates ● Turn on security with the server.secure property Create a self-signed certificate for the GSA For information about creating a self-signed certificate for the search appliance, see the GSA Admin Console help page for Administration > SSL Settings. To get the GSA's freshly-created certificate to add it as a trusted host for the connector, follow the procedure for your preferred browser or the command line. Firefox 1. Navigate to the GSA's secure search: https://gsahostname/.
Create a self-signed certificate for the connector Generate a self-signed certificate for the connector and export the newly created certificate. 1. Within the connector’s directory, run the following command: keytool -genkeypair -keystore keys.jks -storepass changeit keypass changeit -alias adaptor -keyalg RSA -validity 365 2. For "What is your first and last name?", enter the hostname of the connector’s computer. You are free to answer the other questions however you wish (including not answering them).
2. Under Add more Certificate Authorities, click Browse. 3. Navigate to the connector’s directory and select adaptor.crt. 4. Click Save. Turn on security with the server.secure property You can turn on security for the connector by using server.secure property, which enables HTTPS and certificate checking. Add the following line to your adaptorconfig.properties file: server.secure=true When server.secure=true, the connector uses the GSA's authentication configuration and HTTPS for all communication.
(Linux / Unix systems): java \ -Djava.util.logging.config.file=src/logging.properties \ -Djavax.net.ssl.keyStore=keys.jks \ -Djavax.net.ssl.keyStoreType=jks \ -Djavax.net.ssl.keyStorePassword= \ -Djavax.net.ssl.trustStore=.jks \ -Djavax.net.ssl.trustStoreType=jks \ -Djavax.net.ssl.trustStorePassword=changeit \ -classpath adaptor-name-4.0.3-withlib.jar \ com.google.enterprise.adaptor.name.
You must include server.secure=true in the connector configuration before enabling these stricter features. To enable stricter security, perform the following steps by using the GSA Admin Console: 1. Click Administration > SSL Settings. 2. Make any of the following changes on this page: a. Uncheck Enable HTTP (non-SSL) access for Feedergate. b. Check Enable Client Certificate Authentication for Feedergate. c. Check Enable Server Certificate Authentication. 3. Click Save.
5 Configure Connector Logs The connectors log processing messages, including exceptions and warnings. Log messages appear in the Connector Dashboard and you can download the logs, as described in Download rich data about the connector.
java.util.logging.FileHandler.count=20 java.util.logging.ConsoleHandler.formatter=com.google.enterprise.adaptor.Cust omFormatter com.google.enterprise.adaptor.CustomFormatter.useColor=true Change the location of logs By default, the logs are saved in logs/adaptor.*.log, in the same directory where the connector is running. To change the location of log files, edit the java.util.logging.FileHandler.pattern value in the logging.properties file: java.util.logging.FileHandler.pattern=logs/adaptor.%g.
To change the size of log files, edit the java.util.logging.FileHandler.limit value in the logging.properties file: java.util.logging.FileHandler.limit=10485760 Change the number of log files The connector writes to a log file until the size limit is reached, then starts writing to a new log file. By default, the connector writes to 20 log files, but you can change the number to suit your needs. There is no upper limit to the number of log files.
6 Monitor Connectors with the Dashboard The Dashboard is a web-based interface that provides information about the connector’s operation, with easy access to logs and error history. Use the Connector Dashboard to perform the following tasks: ● View information about the connector ● Start or restart feeds ● Encode sensitive values ● Download rich data about the connector You must start the connector to use the Dashboard.
where: ● HTTP or HTTPS--If you run the connector in secure mode, use HTTPS to log in to the Dashboard. ● is the hostname or IP address of the host that is running the connector ● is the dashboard port number, as specified in the adaptorconfig.properties file for the connector To log in to the Connector Dashboard, use your search appliance user or administrator login credentials. You cannot log in to the Connector Dashboard with search appliance manager login credentials.
For each item, a signal indicates the status by color: ● Green for OK. The item is functioning. ● Yellow for alert. The item is not currently functioning, but no action is required. For example, the Dashboard displays yellow when the GSA is not currently crawling. ● Red for warning. The item is not functioning and requires attention. Statistics In the Statistics section, the Connector Dashboard displays the following information: ● A datestamp for when the connector program was started.
Encode sensitive values You can encode passwords and other sensitive configuration values and copy them to the adaptor-config.properties file. Values can be specified in the configuration as prefix:data, where the prefix specifies how the value is stored. You can encode the listed sensitive values for the following connectors: ● Connector for SharePoint--sharepoint.password ● Connector for SharePoint User Profiles--sharepoint.password ● Connector for Active Directory--ad.
Download rich data about the connector The Diagnostics zip archive contains rich data about the connector, including: ● Current configuration settings (in the config.txt file) ● Connector version, status, and statistics (in the state.txt file) ● Thread details (in the threaddump.txt file) ● Logs folder This data that can help you to diagnose connector issues. To download the archive, click Diagnostics zip file on the Dashboard.
7 Troubleshoot Connectors Connectors 4.
2. Make sure GSA crawling is not paused by using the Content Sources > Diagnostics > Crawl Status page. 3. Check the Connector status and recent log messages by using the Dashboard. 4. Ensure that the Connector fed the document URL to the search appliance by examining the feed file. 5. Ensure that the search appliance got the document by using the Index> Diagnostics > Index Diagnostics page in the Admin Console. 6.
documents:The server sent HTTP reduce host load status code 503: Service unavailable Feeds are not coming through ● Make sure GSA can accept feeds from the connector host machine. ● Check connector logs for errors, such as failure to SharePoint, SharePoint User Profiles, File Systems, Active Directory connect to look-up GSA, or failure to communicate with the repository.
Crawling is slow SharePoint, File Systems Use the Dashboard to find: ● What is the mean duration of a request (Response Time)? A couple hundred milliseconds would be good. ● What is the max duration of a request? A file taking over a couple of minutes would be bad. Document retrieval times out The connector gives a document retrieval request 30 seconds to SharePoint, File Systems start and 3 minutes to complete. If you want to give your repository more time you can adjust adaptor.
8 Common configuration options The following table lists common configuration options, which are used by all connectors. If the administrator doesn’t set these options, defaults are used. The only required option is gsa.hostname. All others are optional. Name Meaning Default gsa.acceptsDocControlsHeader Use X-Gsa-Doc-Controls HTTP true header with namespaced ACLs. Otherwise ACLs are sent without namespace and as metadata. If not set, then an attempt to compute from gsa.version is made. adaptor.
documents are marked as “public.” Take note that if this option is set to “true” for the Connector for Active Directory, the connector does not send users/groups to specify group memberships. adaptor.pushDocIdsOnStartup Whether to invoke true Adaptor.getDocIds on process start (in addition to adaptor.fullListingSchedul e). docId.isUrl If your connector document ids false are already URLs, prevent them from being inserted into connector generated URLs. feed.
feed.noRecrawlBitEnabled Send bit telling the GSA to false crawl your documents only once. gsa.version Version number used to Defaults to configure expected GSA acquiring from features. GSA. Uses 7.0.14114 if acquiring fails. gsa.characterEncoding Character set used in feed files. UTF-8 gsa.hostname Machine to send feed files to. Process errors if not provided. gsa.samlEntityId The SAML Entity ID that http://google.com/ identifies the GSA. enterprise/gsa/ security-manager journal.
server.docIdPath Part of URL preceding encoded /doc/ document ids. server.fullAccessHosts server.hostname server.keyAlias Hosts allowed access without empty, but authentication (certificates still implicitly contains needed when in secure mode). gsa.hostname Hostname of a connector lowercase of machine for URL generation. automatically The GSA will use this hostname detected to crawl the connector. hostname Keystore alias where connector encryption (public and private) keys are stored.
checking. server.useCompression Compress retrieval responses. true transform.acl.X Where X is an integer, match no modifications and modify principals as described. transform.pipeline Sequence of transformation empty string (no steps.