PyData London 2022

Understanding your bank statement in 100ms
06-18, 14:15–15:00 (Europe/London), Tower Suite 2

In the last year, the global number of fintech companies has nearly doubled. Yet, despite the rapid growth, there is one area of banking that has been notoriously difficult to modernize: financial transactions. More than 1 billion transactions occur every day around the world. Transactions are different in every country and language, require knowledge of every merchant and location, depend on the context of the surrounding parties involved and are specific for each use case. At Ntropy, we enable developers to parse financial transactions in under 100ms with super-human accuracy, unlocking the path to a new generation of autonomous finance, powering products and services that have never before been possible. We will for the first time discuss the key parts of our pipeline, made possible by the latest advancements in natural language understanding and unsupervised learning.


Every large tech company is becoming a fintech. From Amazon merchant loans, to Google wallet, Apple card and Facebook payments. 82% of people use digital wallets, 45% of all global transactions are digital. Transaction data has the potential to become the key piece of intelligence to power the products and services of the future, including automated lending, credit scoring, insurance, and financial management, but also healthcare, fitness, gaming and more.

However, decades of decentralized finance around the world has turned transactions into a tangled web of cryptic messages. Each financial transaction is a datapoint consisting of multiple modalities, including unstructured descriptions, amounts, currencies, directions of money movement, dates, locations, merchant category codes, and more. Due to regional inconsistencies, lack of unified standards and tens of thousands of merchants opening and closing down each day and with more than 200 million companies globally (far beyond what any model can remember), human common sense was essential to make sense of this data. Until now.

With recent breakthroughs in natural language processing, weak supervision and multi-task learning, algorithmic understanding of this data has finally become possible. We will here introduce how we do this at scale at Ntropy.

The Ntropy API converts raw transactions into human-readable data points that we call “enriched” transactions, by combining data from multiple sources, including natural language models, search engines, internal databases, external APIs, and existing transaction data from across our network. There are 3 key parts of the pipeline:

  1. Extract all named entities from transaction descriptions (dates, locations, service words, merchant originators, receiving entity, payment processors, etc.)
  2. Uniquely identify these entities using entity linkage and information extraction across search engines and databases.
  3. Use the extracted information in combination with all of the information describing the transaction to translate this into a human understandable label.

We cannot forget that transactions are real-time and any form of processing has to typically be kept to 200ms or less! These components are held together by an optimized infrastructure that scales our models to minimize inference latency and resource usage, while caching any intermediaries and can operate in and adapt to noisy, out-of-distribution data.

Finally, as the meaning of each transaction depends on the type of account holder, industry and use case, a general model will never be the optimal solution. The Ntropy API allows developers to train custom modes with their own data, on top of embeddings provided by our base model. Such fine-tuned models can be trained with as few as 150 labels, which can all be done by a single person in a few hours.

This talk is geared at practitioners interested in knowing how bank transactions can be understood by a machine. Expert knowledge of machine learning is not required.


Prior Knowledge Expected

No previous knowledge expected

CTO at Ntropy