Sitecore 8 Lucene Text Index

Monday, October 19, 2015 @ 12:25

By William Cacy

I was recently asked to add basic search functionality to a client’s Sitecore 8 MVC site. The basic requirements were to enter a keyword and return all pages that contained it. In the displayed results, the actual searched for term should be bolded with a yellow background to highlight it. While the information presented here is by no means new, it is a complete walkthrough to set this up – which is sometimes hard to find.

The Sitecore installation used in these examples are 8 update 3 (ver. 150427) and the Lucene search that comes out–of–the–box.

The first thing I did was start to work on a Computed Field which would be used to populate the index for the keyword search. The information provided here was very useful and the code from the HtmlCrawledField.cs was used as a starting point for the class.

The PageContentField.cs file:

Generally speaking, what this class does is retrieve each Sitecore page and crawl the content, stripping out html, scripts, comments, etc.., so all that is left is the actual text values from the page. The end result, after the indexing has been completed, is a string field that contains <GUID of the Page>|<All text on the page>.

I used a custom index to keep this stuff from bloating up the default Sitecore indexes. To create the custom index, I created a new config file in the /App_Config/Include section of the project and copied the default Sitecore.ContentSearch.Lucene.Index.Web.config file contents into the new file.

The SiteSearchIndex.Web.config file:

The configuration section points to the actual configuration that this index uses, which is illustrated next.

I left the update strategy defined below alone since I was only planning on updating this index when the site is published.

<strategies hint="list:AddStrategy">
	<strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />

* for more information on Sitecore index strategies see this blog post by John West.

SiteSearchIndex.WebConfiguration.config was also created in the /App_Config/Include section of the project and is responsible for the actual configuration of the index.

The SiteSearchIndex.WebConfiguration.config file:

I didn’t want to include all of the fields in this index, so I set the indexAllFields value to False. FieldMap and Fields nodes are where I defined the computed index field ‘pagecontentfield’ that will be used in the queries and pointed to the PageContentField.cs file created earlier.

Since I didn’t have a corresponding modal for this field, I created the simple class outlined below that will be used in the queries.

The PageBaseSearchResult.cs file:

Utilizing this field required a new class to map the results into.

The SiteSearchResult.cs file:

With these two new classes, I’m able to use the Sitecore API to query the index from my controller.

The SearchResultIndexAgent.cs file contains the methods used to carry out the search and populate my viewmodel for the results view to use. The search method carries out the actual search by defining the search index (sitesearch_web_index) that is defined in SiteSearchIndex.Web.config as the ID of the index.

The SearchResultsIndexAgent.cs file methods:

These few lines are the actual searching of the index:

var searchIndex = ContentSearchManager.GetIndex("sitesearch_web_index");

        using (var context = searchIndex.CreateSearchContext())
        	var results = context.GetQueryable<pagebasesearchresult>()
            .Where(x => x.PageContentField.Contains(term)).GetResults().ToList();


After these execute, the results are mapped and are ready for display in the view.

SearchResultsViewModel contains the fields used in the view to display the search results.

The SearchResultsViewModel.cs file:

SearchResults.cshtml handles the actual rendering of the results to format the title of the page the search term exists on, the portions of code that the term exists in, and the link to the actual page within the site.

The SearchResults.cshtml file:

Once all of the components were put in place, a re-publish of the content was necessary along with a rebuild of the newly created ‘sitesearch_web_index’ that now appeared in the list of indexes available from the Developer toolbar.