PyData London 2022

“Off with their I/Os!” - or how to contain madness by isolating your code
06-18, 11:00–11:45 (Europe/London), Tower Suite 3

Engulfed in a tedious refactoring of your code, you’re adding the 7th layer of mocks to a test when you realise something must have gone wrong somewhere, but what ? You’ve written readable code, split into functions and classes to avoid long chunks of code, and yet, every time, you end up with hardly testable code, a test suite that runs for hours, functions with seventeen arguments, and you wonder if it’s you mocking the code or the code mocking you.

Follow the white rabbit with me to learn about usual problems of code organization and I/O architecture, and some tricks on how to handle I/Os and dependencies isolation. We might encounter a bit of SOLID advice, and maybe even a nice hat!


The intended audience is intermediate to senior data scientists, who have already, or will soon encounter problems with testing, maintaining or expanding a growing codebase.

This talk will help you understand the benefits of good architecture, with a focus on isolating your I/O (inputs/ outputs) and other third-party dependencies, and guide through how to achieve it in practice, from simpler to more complex cases. I will present good practices coming from software engineering, with a focus on applying them to a data science context.

Outline: - (2 min) Intro - (4 min) Functional programming ideas - (6 min) Isolating I/O with a clean architecture (“onion” architecture) - (5 min) Benefits in terms of testing and maintainability - (8 min) How to isolate third-party dependencies using dependency injection and abstraction layers - (5 min) QA

I cannot promise that the Liskov substitution principle won’t be mentioned, but I will do my best to make it clear and understandable.


Prior Knowledge Expected

No previous knowledge expected

Sarah Diot-Girard has been working with Machine Learning since 2012 and she enjoys using data science tools to find solutions to practical problems. She is particularly interested in practical issues, both ethical and technical, coming from applying ML into real life. She gave talks about data privacy and algorithmic fairness, and software engineering best practices applied to data science.