Generating Trees in SQL with Common Table Expressions and Cross Apply

Friday, February 20, 2015 @ 09:25

By Corey Adler and John Ours

I’m a big fan of Entity Framework. I’ve loved using it on projects both personal and work related. It’s the tag that I hang around the most on Stack Overflow. Our Vice President of Technology (and my boss) John Ours, on the other hand, is not such a big fan. We’ve had many back and forth discussions about various features or perceived problems with Entity Framework, with me vigorously trying to defend it. Even with all of that, I will readily admit that there are times where the use of a scalpel like T-SQL is better than the sledgehammer that Entity Framework can be. This post will discuss one such use case, utilizing Common Table Expressions and the Cross Apply keyword in SQL to generate a tree of linked financial accounts and their quarterly data.

Imagine the following scenario: You have a tree of financial accounts, where each of the nodes of the tree will roll-up their values to become the value of the parent node and a node can have up to 2 parents (let’s assume that they don’t have any cycles). The values of the accounts change on a quarterly basis. If you were tasked to display the entire (or just part of the) tree of accounts, with their values, how would you do so?

My first instinct was to tackle this problem by using Entity Framework. I went ahead and created a recursive function called GetRelevantAccountIDs that would give back all of the IDs belonging to an Account’s children. That function is below. Note that I have 2 collections since there are 2 ParentIDs on the table, necessitating 2 different collections on the parent accounts to complete the 2 different one-to-many relationships on the table.

EF Function

Afterwards, with IDs in hand, I queried for the corresponding values to see what they were, and then display all of them, and their children. This solution does work, but it comes with a price attached to it: If you have a lot of accounts, and a lot of different quarters, and you’re querying near the top of the tree (or even at the root, if you want the full tree), then this query can take quite a long time to run. That might not be something that you want to have in your application. I tried optimizing it as much as I could, but as any experienced EF developer will tell you: There are limits to what EF can do in such circumstances.

That’s when John came up with the idea of using Common Table Expressions. A Common Table Expression is simply a query that you want SQL to know by name so that you can quickly query the database with that name, instead of having to repeat the same query repeatedly. A basic example of a CTE is below (found on http://msdn.microsoft.com/en-us/library/ms175972.aspx). Notice the use of “Sales_CTE” in the bottom query, referencing the CTE above it by name.

CTE Example

So how would this help? Well, it turns out that you can actually create a recursive CTE in SQL. In the same MSDN article there is an example of how this can be done, given a table of Employees, in which someone wants to generate a list of Managers (also found in the Employees table) and the people who directly report to them. That example is below.

Recursive CTE

How does this work? It’s not that simple (it took me a few tries before I got it), so bear with me here. You define an “anchor” member in the CTE that defines the first record that should come out of the database (i.e. the one with no manager here). After that comes the UNION with the second part of the CTE—which is the recursive part. In this part, the source table (here MyEmployees) is queried, joining with the set of items that have already been returned by previous iterations of the CTE—where the Employee record’s manager is found. Hence, after the CEO of the company is returned, then the next iteration grabs the records where MyEmployee’s ManagerID is equal to that CEO’s Employee ID. That then also makes the next iteration of the recursive call, this time using the records just returned from the last iteration—querying for employees in the MyEmployee’s table who’s ManagerID was just returned in the last iteration. And the query will continue like this until nothing gets returned in a single iteration, wherein the CTE has completed.

So let’s see how John worked this knowledge of recursive CTEs into creating a hierarchy of financial accounts.

Relevant Accounts CTE

As you can see, John wrapped the CTE inside of a function that will return a table full of Account IDs (similar to the C# function that we had above) given the ID of an account that you’re starting from. The actual CTE itself starts off by using that passed-in Account ID as part of its anchor member definition. The recursive member joins with itself based on the base Accounts table record’s different Master Account IDs. Each iteration of the recursive member will return those accounts that have either of their Master Account IDs already placed in the Rollups results and then run the next iteration based on those newly returned records.

Now that I’ve (hopefully) explained what a CTE is, let’s move on to another part of SQL that will help us here—that of the CROSS APPLY operator. To begin, let’s first understand the use of the plain APPLY as discussed on https://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx. The APPLY operator is used whenever you want the rows of one table to be passed into a specified SQL function to return a data set in which each row contains the original row (from the first part of the query) with added columns resulting from the applied SQL function being run on that record. APPLY comes in 2 different flavors: OUTER and CROSS. OUTER APPLY returns every single row no matter what, even if that row doesn’t return any results from the applied function (the added columns being set to null, of course). Our focus, however, is on CROSS APPLY, which only returns rows from the first part of the query that produce a non-null result set from the applied function. To see how that works, let’s take a quick look at the MSDN example.

Cross Apply Example CTE

Cross Apply Example Query

The first picture above shows the recursive CTE to be used for the example—an example not unlike the ones shown above. Please note the @empid parameter in both the function as well as the Employees_Subtree CTE itself—it sets the anchor member to be a specific employee and then finds all of the employees under him. The second picture, though, is where the CROSS APPLY magic happens. The first part of the query is quite simple: Grab all of the departments found in the database. Once that’s been done then the CROSS APPLY occurs calling the CTE on each different department’s manager’s ID. This has the effect of creating a new row for each employee returned by the CTE with the first columns being the department returned by the first part of the query, and the last columns being those columns returned by the CTE. This is similar in a number of ways to a CROSS JOIN in SQL. Again, as mentioned before, this will only return departments (and their employees) where the department has a manager. If the department doesn’t have one, then that department is simply not returned, unless you use OUTER APPLY instead.

How did John apply (pun intended) this knowledge into grabbing all the quarterly financial data? Before we take a look, let me give a quick rundown of the table schema used for it:

Table Name Fields Purpose Relationships
Clients ID Contains information about the client Has One-to-Many with ClientDataPeriods
ClientDataPeriods ID, ClientID, PeriodDate Contains separate data for each different quarter for a specific client ClientID is a Foreign Key to the Clients table; Has One-to-Many with ClientAccounts
Accounts ID, AccountNumber, AccountName Contains static list of different financial accounts Has One-to-Many with ClientAccounts
ClientAccounts AccountID, ClientDataPeriodID, LinkedAccountID, Amount Contains quarterly-specific data about different financial accounts ClientDataPeriodID is a Foreign Key to the ClientDataPeriods table; LinkedAccountID is a ForeignKey to the Accounts table

 

Now onto the query.

Cross Apply Query

This query starts out by grabbing all of the basic information of each Account record, including the ID, Account Number, and Account Name. Once that’s done comes the CROSS APPLY with the function containing our recursive CTE to generate the tree under each different account. The subsequent LEFT JOIN (and its subquery) then connects those Account IDs to the relevant ClientAccounts given a specific ClientID and PeriodDate by joining the AccountIDs returned from the CTE to the LinkedAccountID of the ClientAccount records being returned. Note the use of the “links” alias both here as well as in the original SELECT clause grabbing the sum of all of these joined accounts. Putting it all together makes this query return each different account from the static Accounts table (along with its subaccounts), grabbing the relevant ClientAccounts from that quarter that link up to the different subaccounts, summing up the value from the Amount field of each of those ClientAccounts, and returning them as well onto the screen. It grabs all of these quarterly financial amounts, all rolled up nicely…and did in 2 seconds (given around 600+ different accounts).

I guess John wins this round.

In conclusion, CTEs can be a very handy tool in your arsenal, as long as you know when to use it, and how to use it. It will certainly beat trying to do it in Entity Framework which was really never meant to be used to do this number of potential calculations. It is very important to understand the places where using Entity Framework is beneficial to your application and to also understand the places in which nothing can replace straight up SQL. Will I ever convince John that Entity Framework is a good ORM? Will he ever convince me to drop it and find a different thing to obsess about? Probably neither, but that doesn’t mean we won’t keep trying. Until next time, I wish you good coding.