Photo by Faruk Kaymak on Unsplash

Recursive Queries

Unraveling business threads with recursive queries
Sjoerd Lagarde
Sjoerd Lagarde
Nov 10, 2023
SQL

We’re hiring full stack software engineers.

Join us remote/on-site in ’s-Hertogenbosch, The Netherlands 🇳🇱

Join us in building a fintech company that provides fast and easy access to credit for small and medium sized businesses — like a bank, but without the white collars. You’ll work on software wiring million of euros, every day, to our customers.

We’re looking for both junior and experienced software developers.

Find out more…
Ruby on Rails PostgreSQL Docker AWS React React Native

Picture this: the AI models have worked their magic on the bank statements of a new customer signup. All lights are green; we are happy to finance this company! Seems like smooth sailing, right? Well, not so fast. Here is the twist: this seemingly perfect company has a business sibling under the same corporate umbrella, a sibling that we are already financing. Now, why is this discovery crucial? Well, unknowingly lending money to related companies can pose a serious risk.

How did we tackle this problem? Well, cue the drumroll for our tech superhero: recursive queries in SQL.

Let’s start by visualising our company hierarchy. We have two siblings, and one holding company. While this example may seem simple and straightforward, in reality, these hierarchies can unfold into complexities similar to family trees, complete with nieces, nephews, uncles, aunts, and even grand-children in the corporate lineage.

Our task is, given the New sign-up company, to identify all related entities. That is, we want to find both Holding company and Existing customer company. To achieve this task, we turn to the power of recursive queries. Why? The hierarchical structure of companies makes recursive queries an ideal choice, allowing us to navigate through parent-child relationships and unveil the complete network of associated businesses.

Recursive SQL to the rescue

To understand recursive queries, let’s dive into an example sourced from postgresql.org.

with recursive t(n) as (
    values (1)
  union all
    select n+1 from t where n < 100
)
select sum(n) from t;

You might have already spotted it, but in this example we sum all the numbers from 1 to 100. But how does it work? A recursive query starts with WITH RECURSIVE. This creates an auxiliary query and allows the query to refer to its own output, setting the stage for recursion.

The definition of a recursive query looks like this:

with recursive t(n) as (
  -- non-recursive term

  union

  -- recursive term
)

Breaking down our example:

with recursive t(n) as (
  --- non-recursive term
  values (1)
        
  union all
        
  -- recursive term
  select n+1 from t where n < 100
)
select sum(n) from t;

Here, the non-recursive term initiates the recursion by providing an initial value, 1. This value is stored in a working table. Next, the recursive term kicks in. It operates on the working table: it processes the current value, 1, checks against the where-clause, and produces the next value, 2. The working table is emptied and the new value, 2, is put into the working table. This process continues iteratively until the where-clause (n < 100) is no longer met. The recursive query stops and returns all accumulated values: the numbers 1 to 100.

In essence, the non-recursive term sets the stage, and the recursive term operates on the working table iteratively, building a sequence of values until the recursion reaches its logical end. This powerful mechanism allows us to efficiently compute complex operations, as demonstrated in our summing example.

Roll up

Now that we know how recursive queries work, let’s dive into the first part of our query: the roll-up. In this phase, we’ll craft a recursive query to walk up the corporate tree, collecting parents, grand-parents and beyond. Our setup consists of two tables: Companies and CompanyRelations. These two tables allow us to make a many-to-many relation between companies. We need this relation, because a company can point to multiple other companies (e.g. a company can have multiple shareholder entities), and a company can be referred to by multiple other companies (e.g. a holding company).

Companies
| ID          | Name                        |
| ----------- | --------------------------- |
| 1           | Existing customer company   |
| 2           | Holding company             |
| 3           | New signup company          |


CompanyRelations
| FromID      | ToID        |
| ----------- | ----------- |
| 1           | 2           |
| 3           | 2           |

In our recursive query, we initiate the recursive process by selecting the ID of our New signup company. While I’ve used values(3) for clarity, in actual production code, this would typically be a parameter. Moving on, we define the recursive term, where we select the ID from our working table, rollup, and create a join with the CompanyRelations table. This uncovers all parent links, recursively.

with recursive rollup(ID) as (
  values(3)

  union

  select ToID as ID
  from rollup
    join CompanyRelations on CompanyRelations.FromId = ID
)

Roll down

With our roll-up strategy in place, we also need to perform a roll-down. The roll-down query walks down the tree, uncovering all children, grandchildren, and beyond.

with recursive rolldown(ID) as (
  select ID from rollup

  union

  select FromID as ID
  from rolldown
    join CompanyRelations on CompanyRelations.ToId = ID

The cool thing about our roll-down query is that it uses the output from the roll-up query as its input. The non-recursive term of roll-down simply selects the IDs from our roll-up query! Here you can also see that the non-recursive part is just a query and can return an arbitrary number of results. All of the results will go into the working table and serve as input for the recursive term.

Putting it all together

Now that we’ve seen both the roll-up and roll-down queries, let’s bring it all together in a single query.

with recursive rolldown(ID) as (
  with recursive rollup(ID) as (
    values(3)
    union
    select ToID as ID
    from rollup
      join CompanyRelations on CompanyRelations.FromId = ID
  )
  select ID from rollup
  union
  select FromID as ID
  from rolldown
    join CompanyRelations on CompanyRelations.ToId = ID
)
select Companies.ID, Name
from rolldown
  join Companies on Companies.ID = rolldown.ID

This ultimate query takes the results of our our roll-up and roll-down and selects the ID and Name from the Companies table. We now have found all related companies for our starting company.

The use of recursive queries has proven to be both an elegant and powerful tool in uncovering related companies. It has helped us to seamlessly traverse intricate business hierarchies, efficiently identifying connections and dependencies across our portfolio.

Floryn

Floryn is a fast growing Dutch fintech, we provide loans to companies with the best customer experience and service, completely online. We use our own bespoke credit models built on banking data, supported by AI & Machine Learning.

Topics
machine-learning people culture rails online-marketing business-intelligence Documentation agile retrospectives facilitation
© 2023 Floryn B.V.