Classifying 100k bank transactions: a hybrid approach

A combination of NLP and expert rules outperformed other strategies for categorising large numbers of bank transactions.
Arjan van der Gaag
Arjan van der Gaag
Aug 27, 2020

How do you quickly categorise hundreds of thousands of bank transactions in order to determine loan application risk? At Floryn, we have seen great results with a hybrid system of domain rules and machine learning to support the speed and quality of our risk assessments for business loans.

Categorisation by rules

Given a single loan application, domain experts can pore over all the customer’s bank transactions to assess the risk of extending credit, for example by classifying certain transactions as income based on the description, amount or counterparty.

Table 1: example categorisation of three example bank transactions using domain rules
Amount Details Notes
€2.000,00 Bob’s Transport Company
Invoice ABC123
Positive amount referring to “invoice”: category revenue
€-4.103,00 Alice
Salary August 2020
“Salary” mentioned:
category staff
€5.000,00 Shady Inc
Weapon parts
Blacklisted company name:
category suspect

When you deal with many thousands of customers and hundreds of thousands of bank transactions every month, automation becomes crucial. We had previously devised a system to automatically categorise new transactions based on a set of predefined business rules as defined by our domain experts. However, we have found this approach has several shortcomings:

The latest developments in Natural Language Processing (NLP) seemed to offer a compelling alternative to a rules-based categorisation system. After evaluating different approaches we settled on using BERT. BERT is a method of pre-training language representations, meaning that we train a general-purpose “language understanding” model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like labeling transactions).

Transfer learning with NLP

Some of the most exciting breakthroughs in NLP, such as ELMo, BERT, ELMFit and XLnet, are centred around transfer learning: using a language model that is pre-trained on extremely large amounts of data first, after which it can be applied to different tasks — such as categorising bank transactions. Although promising, we faced two challenges.

  1. Domain-specific language: bank transaction descriptions are commonly comprised of keywords, abbreviations and codes, which is far removed from the inputs the model is pre-trained on. This requires us to re-train the model on a representative dataset of sufficient size.

  2. Absence of ground truth: although we did have a dataset that was categorised with our rules-based system, we knew it contained mistakes. We did not want to train our model on those mistakes, since that would defeat the purpose of improving on the results.

The big advantage is that we have an existing dataset labeled by humans and by the existing rule system. Using an uncertainty model we extracted an input dataset based on our pre-existing expert-labeled transactions, looking up edge cases where the NLP model has a low confidence in the applied labeling.

After re-training our pre-trained multilingual DistilBERT model we compared both balanced and overall accuracy and found improved performance for some, but not all, transactions. We found we could use the uncertainty model again to dynamically decide between the best of the two approaches, devising a hybrid approach to categorisation.

Table 2: example how given high uncertainty BERT can be used to detect where domain rules fall short.
Amount Details Rules Conf. BERT Outcome
€-4.103,00 Alice
Salary August 2020
“Salary” mentioned:
category staff
95% staff
€4.103,00 Bob
Loon Aug 2020
No keyword, category revenue 80% “Loon” matches “Salary”.
category staff

A hybrid approach to categorisation

With the hybrid approach we use an uncertainty boundary to select either the results of the rules-based categorisation or the language model. Over 80% of “standard” transactions are categorised based on rules, while the remaining 20% are overruled by the DistilBERT model. The uncertainty model helps us find mistakes in the business rules: the rules perform about 18% worse on the transactions that are below the uncertainty boundary of 0.99.

The strength in this approach lies in recognising the strength of the business rules and getting creative with the different ways in which you can unlock value from the textual data to benefit machine learning models. With only a handful of iterations of relabelling data, we have improved classification performance significantly. With that, we are on the right track to handle ever greater volumes of incoming transactions and even better risk assessments for our customers.

Internships at Floryn

This blog post is the summary of the Thesis project of one of our interns Emiel de Heij and was co-written by him. He completed his Graduation Project and obtained his master’s degree Data Science & Entrepreneurship from the Jheronimus Academy of Data Science (JADS) at our company. We’re constantly giving interns the opportunity to build exciting new stuff and try out new technologies at Floryn, so if you’re looking for a technical internship make sure you reach out to us.

More from Floryn


Floryn is a fast growing Dutch fintech, we provide loans to companies with the best customer experience and service, completely online. We use our own bespoke credit models built on banking data, supported by AI & Machine Learning.

© 2021 Floryn B.V.