PyData London 2022

Gabor Bakos

Gabor works in the Statistical Arbitrage team at Optiver. He is responsible for building systematic trading strategies and designing the data pipelines. Prior to joining the team, he worked at a systematic hedge fund for 3.5 years. He holds an MSc degree in Mathematical and Computational Finance from the University of Oxford.


Sessions

06-18
15:00
45min
Data Pipelining for Real-time ML Models
Gabor Bakos

Reinventing the wheel is usually not something we should be striving for, so why did we build our data pipeline from scratch? There are numerous design choices people make and they can highly affect the potential use cases. When making a custom pipeline you can make your own trade-offs between speed, throughput, simplicity and consistency of code/logic/data.

Market makers like Optiver are usually associated with ultra-low latency infrastructure, however there are plenty of use cases where human latency (seconds) is acceptable. Computing derived metrics, training models and making predictions as new data arrives are just a few such applications and what we will focus on in this presentation.

We will tackle some of the questions we asked ourselves on the design choices for our data pipeline.
* Should you write code that is used by both live and historical pipeline?
* How to improve research to production cycle?
* How do we ensure that real-time and backtest results match?
* How to improve development speed?
* What trade-offs to make if inputs/data arrive asynchronously?
* How to improve performance and reduce resource usage?
* How can we speed up day-to-day research?
* What to do with stateful nodes?

Basic knowledge of finance and data pipelining might be beneficial, but no specific knowledge is required to follow the presentation.

Tower Suite 1