Journey into the Aggregation Process

Tuesday, May 03, 2016 @ 02:00

By: Scott Gillis, Lead Consultant – The key to the new reporting scheme of the Sitecore Experience Platform is the combined use of MongoDB for data collection, as it scales quickly and can manage the large amount of data being gathered on site visitors, and the reporting flexibility of Microsoft's SQL Server.

MongoDB is considered a 'No SQL' data solution, where data is stored in a document-oriented configuration instead of a traditional relational-oriented structure. This document-oriented structure also increases speed and scalability in the collection of our visitor data. The downside to this approach is that summarization and reporting is not nearly as clean, which is one of the benefits of a relational structured system like MS SQL.

The Journey to Reports

The journey we are going to embark on will take us from point 1 (MongoDB) to point 2 (MS SQL) to point 3 (all those cool reports talked about in the quick start)

Journey into the Aggregation Process - Image One

(Map taken from http://www.lib.utexas.edu/maps/historical/txu-pclmaps-virginia_battlefields_1892.jpg)

The first leg of this journey is moving the document orientation of MongoDB into a relational form so we can think about reporting. As with any good journey, options are nice to have, and in this case, Sitecore does not disappoint.

The first path is referred to as the 'Rebuilding Reporting' process. Rebuilding the reporting database is a process which re-aggregates all of the data that has been collected since your site launched. This is a very time and power intensive process that requires proper planning to complete without causing any issues. Rebuilds can be triggered in the code or via the admin page: https://<MY WEBSITE>/sitecore/admin/rebuildreportingdb.aspx

The second path is referred to as the 'Continuous Update' process. In its simplest form, this is a background task managed by Sitecore that gathers recent data from MongoDB, aggregates it, and then ships it to the reporting database. There are a number of options that can be configured to give you different throughput.

Which Train Do I Take?

As one reads through the Sitecore documentation on server setup, the usage of roles and services seems to get intermixed. For my writings, a server can have one or more roles; where each role provides one or more services.

Anyone researching infrastructure setup scenarios for the Sitecore Experience Platform will notice numerous references to the different roles/services for which a server can be configured. The most commonly referenced are:

  1. Content delivery server (CD)
  2. Content management server (CM)
  3. Processing server
  4. Reporting Service server
  5. Collection database server (aka MongoDB)
  6. Reporting database server
  7. Session database server

No matter how we process the data, we leverage the server role defined as the Processing Server. In the purest form, a Processing Server requires a Sitecore install but does NOT serve up any content. It doesn't even need the Sitecore admin screens to function. It does, however, require the instance to be licensed.

In most Sitecore installations, this role will actually be configured to run on one of your content management (CM) servers, which is a perfectly acceptable and supported method of installation. If you notice that your reports are not refreshing as quickly as needed by the business or the CM seems to be running extremely slow for authors, this role should be the first to be considered for scaling off. For those who have purchased Sitecore's xDB Cloud (known as xCloud) offering, the Processing Server is included as part of the service.

Traveling the Lines of the Processing Server

A Processing Server role supports two services (features). The first service is called Processing. The idea of processing is the use of the Sitecore Task Manager API to run a variety of distributed tasks against xDB and the reporting database.

The second service is called Aggregation. This is the heart of the journey to MS SQL! This service is dependent on the Processing Service to be configured on the same server. The Aggregation service (also referred to as Aggregation Process) is the series of tasks that move the data from MongoDB to MS SQL.

Journey into the Aggregation Process - Image Two

The Aggregation process 'line' looks like the following:

  1. Sitecore Task Manager triggers the Aggregation Agents as defined in the configuration files
  2. Agents collect any unprocessed data from MongoDB and other data stores as defined by the agent
  3. Data is grouped, summarized, and prepped for reporting
  4. Batches of processed data are sent to SQL as table-valued parameters (TVP)
  5. Data arrives at the reporting database ready for consumption

Aggregation Bibliography

  1. Overview of the process that the data moves through to be reported on:
    https://doc.sitecore.net/sitecore_experience_platform/xdb_overview/processing_overview
  2. The 'go to' page for links explaining how to configure each of the different server roles:
    https://doc.sitecore.net/sitecore_experience_platform/xdb_configuration/configuring_servers
  3. Breakdown of the different server roles that can be applied to a Sitecore server installation:
    https://doc.sitecore.net/sitecore_experience_platform/xdb_configuration/server_configuration_features

As always, feel free to tweet me questions or comments @thecodeattic or on Sitecore Slack Community as @gillissm.

 

 

Scott Gillis, Lead Consultant at Paragon and 2017 Sitecore MVP, has been working with Sitecore for several years. He has a deep passion for helping clients leverage their content and data into powerful new capabilities in Sitecore and has produced successful outcomes as the technical lead on numerous, complex implementations. Recently, Scott has been focusing on helping these clients take advantage of the wealth of data collected by Sitecore Experience Analytics.