PyData London 2022

Clean Architecture: How to structure your ML projects to reduce technical debt
06-19, 11:45–12:30 (Europe/London), Tower Suite 2

Software engineering principles are frequently mentioned as a solution to data science's productivity problem. Unfortunately, rarely in a comprehensive format to be actionable or adopted for data-intensive use.

In this talk, I will present a framework that enables practitioners to structure their projects and manage changes throughout the product lifecycle at low effort.

Audience will also learn about a minimum set of programming concepts to make this a reality.

The key takeaway for any Data Scientist is that you don't need to be a master programmer to start taking care of your own codebase.


Donald Knuth said that the most important system design principle is "Layers of Abstraction". More often than not, Data Science solutions break this principle leading to technical debt and a reduced ability to react to changes.

I will introduce the concept of Clean Architecture and the importance of decoupling in machine learning systems. We will look into how it resolves acute problems throughout the product lifecycle.

I will also introduce a minimal set of techniques to enable a refactoring cycle to move their legacy projects into the framework:
- Domain Data Model
- Dependency Inversion
- Adapter/Factory/Strategy design patterns

I will discuss economic and psychological rationale across the talk to justify the steps from business perspectives.


Prior Knowledge Expected

No previous knowledge expected

I run Hypergolic, a boutique consultancy in London specialising in Machine Learning Product Management.

Formerly I was Head of Data Science at Arkera, a fintech startup in London, where I built market intelligence products with Natural Language Processing for Tier 1 investment banks and hedge funds.

Prior to that, I worked in mobile gaming for King Digital (makers of Candy Crush), specialising in player behaviour and monetisation.

I started my career as a quant researcher writing trading strategies at multiple investment managers.