Using a Google Search Appliance with a Sitecore Website

Thursday, June 13, 2013 @ 03:01

By: Matt Gartman, Senior Developer/Systems Engineer

Part 1: GSA Setup

In this two part article Jon and I will be discussing what it takes to setup and utilize an existing Google Search Appliance (GSA) with a Sitecore website.  In this first part I will discuss the configuration settings needed for your GSA to crawl your site.  In the second part Jon will go into the details of interacting with the GSA’s API to add seamless integration into your Sitecore website.

The configuration we will be setting up today will be very basic and assumes that the GSA is already configured with the base settings. This configuration is based on version 7.0.14.G 114 of the GSA and we have kept most of the default settings.  We will access the admin console via http://{ip_address}:8000/EnterpriseController.

The following three areas will need to be configured:

Crawl and Index -> Collections

Collections basically are a subset of patterns you want to include or exclude for a particular search.  This will let you refine the search to just the sites you want to include.  For our example we only want the GSA to return results from our new site.  So we’ll create a new Collection and add our site’s URL to the “include content” section.  If you have any Collections configured on the GSA that are setup to include all URLS (“/”) and do not want this site included in that Collection, this would be the time to update that collection to exclude this new site.

Serving -> Front Ends

Front Ends let you define the look and feel of the GSA’s search and results page.  Since we will be working directly against the API and handling the results with custom code in Sitecore, there isn’t much we need to do here.  We will create a Front End just for future use but will keep all the settings at their default values for now.  If we were using the GSA to display results we could configure our basic HTML and CCS settings here, as well as do some further refining of our search results utilizing Filters or Removing URLs.

Crawl and Index -> Crawl URLs

The final step is to tell the GSA to start crawling your site.  We will add our site into the “Start Crawling from the Following URLs” and “Follow and Crawl Only URLs with the Following Patterns” sections.  If you have any requirements to exclude certain content or want to be more granular with your selection, you can use regular expressions to include or exclude content.

These basic steps will get you up and running and ready to start coding against the API.  Our GSA is configured for continuous crawl as you can see in the “Status and Reports -> Crawl Status” section, so after a few minutes we should start seeing results in our collection.

 

Part 2 -> Coding against the GSA’s API