PyData London 2022

Adam Klimont
  • Beyond medical image segmentation. The road towards clinical insights.
Ade Idowu
  • Document/sentence similarity solution using open source NLP libraries, frameworks and datasets
Adrin Jalali

I'm a computer scientist / bioinformatician who has turned to be a core developer of scikit-learn and fairlearn, and work as a Machine Learning Engineer at Hugging Face. I'm also an organizer of PyData Berlin.

These days I mostly focus on aspects of machine learning and tools which help with creating more ethical and fair decision making systems. This trend has influenced me to work on fairlearn, and to work on aspects of scikit-learn which would help tools such as fairlearn to work more fluently with the package; and at Hugging Face, my focus is to enable the community of these libraries to be able to share their models more easily and be more open about their work.

  • Measurement and Fairness: Questions and Practices to Make Algorithmic Decision Making more Fair
Afshin T. Darian

A. T. Darian is a Distinguished Contributor for Project Jupyter, is a Steering Council member, and maintainer. He works on JupyterLab, the integrated data environment for Jupyter notebook and data science. Visit his Github profile for full details:

  • Make your first Jupyter open-source contribution
Ahmet Melek


  • What is X up to? - NER and Relationship Extraction for Information Extraction
Alejandro Saucedo

Alejandro Saucedo is Director of Engineering at Seldon Technologies, where he leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products. Alejandro is also the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he contributes to policy and industry standards on the responsible design, development and operation of AI, including the fields of explainability, GPU acceleration, ML security and other key machine learning research areas. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and has a strong track record building cross-functional teams of software engineers. He is currently appointed as governing council Member-at-Large at the Association for Computing Machinery, and is currently the Chairperson of the GPU Acceleration Kompute Committee at the Linux Foundation.


  • Accelerating High-Performance Machine Learning with HuggingFace, Optimum & Seldon
Alexander Hendorf

Alexander Hendorf is responsible for the data and artificial intelligence at the digital excellence consultancy KÖNIGSWEG GmbH. Through his commitment as a speaker and chair of various international conferences and PyConDE & PyData Berlin, he is a proven expert in the field of data intelligence. He's been appointed Python Software Foundation and EuroPython fellow for this various contributions. He has many years of experience in the practical application, introduction and communication of data and AI-driven strategies and decision-making processes.

  • Lessons Learned About Data & AI at Enterprises and SMEs
Anders Bogsnes

I'm the Head of the Python Enablement Team at Nordea Asset Manager, driving enablement and adoption of Python for our teams.

I've previously worked as a tech lead for ML and Analytics, Sales Analytics as well as a Business Analyst and Management Advisor.

I also organize the Pydata Copenhagen monthly meetup

  • SQLAlchemy and you - making SQL the best thing since sliced bread
Anindya Datta

Founder & CEO, Mobilewalla
Anindya Datta is a leading technologist and innovator with core contributions in best-in-class large-scale data management solutions, artificial intelligence, and internet technologies. As Founder, CEO, and Chairman of Mobilewalla, Anindya has combined the industry’s most robust data set with deep artificial intelligence and data science expertise to help enterprises build high performing, resilient predictive models.

Prior to Mobilewalla, Anindya founded Chutney Technologies which was acquired by Cisco Systems in 2005. He has been on the faculties of major research universities and institutes in the United States and abroad, including the Georgia Institute of Technology, University of Arizona, National University of Singapore, and Bell Laboratories. Anindya obtained his undergraduate degree from the Indian Institute of Technology (IIT) Kharagpur, and his MS and Ph.D. degrees from the University of Maryland, College Park, USA.

  • Feature Engineering Made Simple
Asya Frumkin

Asya Frumkin is an algorithm developer and team lead, specializing in computer vision. She is also a proud Albino and visually impaired from birth. However, her disability never stopped her and gave her the irresistible drive to make a real impact. After building models in the automotive and medical imaging industries, it was fairly natural for her to join Evinced, a startup focused on improving digital accessibility. At Evinced she is responsible for end-to-end data solutions that help people like her take an equal part in the online world.

  • Can you Read This? (Or: how I Improved Text Readability on the Web for the Visually Impaired)
Chady Dimachkie
  • Understanding your bank statement in 100ms
Cheuk Ting Ho

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, now Cheuk is the Developer Relations Lead at TerminusDB - an open-source graph database. Cheuk maintains its Python client and engages with its user community daily.

Besides her work, Cheuk enjoys talking about Python on personal streaming platform and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member of), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.

  • Picking What to Watch Next - build a recommendation system
Chris Fonnesbeck

Chris is the Principal Quantitative Analyst in Baseball Research & Development for the Philadelphia Phillies. He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.​

  • Probabilistic Python: An Introduction to Bayesian Modeling with PyMC
Davide Frazzetto


I am Davide, Machine Learning Operations Engineer at Massive Entertainment – A Ubisoft Studio, and Pythonista at heart.

I have spent the past 7 years working with data science, both researching and developing machine learning applications and data platforms. My main interest is in bridging the gap between development and production in machine learning world.
Eventually I decided to go back to one of my original passions and landed in the videogames industry, excited to get for Avatar: Frontiers of Pandora and Star Wars released soon!

When I am not making or playing videogames, I like to keep myself active biking, bouldering, practicing yoga, and finally learning how to swim.
I am also an active community builder in and out IT, with a strong interest in local NGOs communities in my city, Copenhagen. It is not hard to find me around the city making food with my non-profit restaurant One Bowl.

  • A Hitchhiker’s Guide to MLOps
Dillon Gardner

Dillon is a data scientist with a passion for working on hard, messy problems. He has worked for companies in energy, agtech, and fintech as both an individual contributor and manager. Before starting work in data science, he did his PhD in physics at MIT

  • AUC is worthless: Lessons in transitioning from academic to business data science
Dr. Ilia Zintchenko

CTO at Ntropy

  • Understanding your bank statement in 100ms
Dr. Jonathan Kernes
  • Understanding your bank statement in 100ms
Dr. Susan Mulcahy

Dr Susan Mulcahy is the Director of the Data Sparks Programme at the Imperial College London, the innovative student placement programme matching a real world industry project on data science with a team of our postgraduate students. This programme sits within Imperial Business Analytics, the research centre focused on bringing data science research closer to the world of business. Susan was previously the Senior Education Fellow of the Data Science Institute (DSI) at Imperial where she developed the educational offering of the DSI for internal students and external industry engagements. She is also a Lecturer in Data Analytics at Ada National College for Digital Skills. Having also facilitated technical courses for corporate clients since 2013, Susan enjoys teaching/facilitating/presenting technical topics to a general audience.

Susan received her data-driven PhD from Imperial’s Bioengineering Department in 2016 where she researched indicators of traumatic brain injury using MATLAB on datasets collecting over 500 million data points per patient per day. In addition to this, she has an MBA from INSEAD in France and a BSc in Mechanical Engineering from Purdue University in the USA. Susan has been a Fellow of the Royal Geographical Society since 2002.

For outside interests, Susan seeks out adventure. In 1999, she spent three months riding her bicycle across the USA. These days, she can be found rowing weekly on the Thames (anything from singles to 8s, outside of lockdown), hiking up a rugged mountain in the Scottish Highlands, or sleeping in a tent in her back garden in London (which she did for 82 consecutive nights in lockdown v1.0 in search of a local adventure.)

“Life is either a daring adventure or nothing.” – Helen Keller

  • Keynote Congrats on making it through the conference! And other lighthearted thoughts on conversing about data
Eyal Kazin איל קאזין

Ex-cosmologist turned data scientist with over 15 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.

My main drive as a data scientist is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for facilitating stakeholders to make data driven decisions by quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.

My claim for fame is that between 2004-2014 I lived in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).

  • Don't Stop 'til You Get Enough - Hypothesis Testing Stop Criterion with “Precision Is The Goal”
Franz Kiraly
  • sktime - python toolbox for time series: how to implement your own estimator
Gabor Bakos

Gabor works in the Statistical Arbitrage team at Optiver. He is responsible for building systematic trading strategies and designing the data pipelines. Prior to joining the team, he worked at a systematic hedge fund for 3.5 years. He holds an MSc degree in Mathematical and Computational Finance from the University of Oxford.

  • Data Pipelining for Real-time ML Models
Hanna van der Vlis

Hanna is a creative and passionate data scientist with experience in energy, agriculture, and credit risk. She has 3+ years of experience in data science and machine learning, and proven skills in ML Ops. She is currently working to help Kenyan smallholder farmers run more profitable businesses at Apollo Agriculture.

  • Clusterf*ck: A practical guide to Bayesian hierarchical modeling in Pymc3
Ian Ozsvald

Ian is a Chief Data Scientist and Coach, he's helped co-organise the annual PyDataLondon conference with 700+ attendees and the associated 11,000+ member monthly meetup. He runs the established Mor Consulting Data Science consultancy in London, gives conference talks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python (2nd edition). He has 19 years of experience as a senior data science leader, trainer and team coach. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee. Past talks and articles can be found at:

  • Executives at PyData
  • Building Successful Data Science Projects
James Powell

James Powell serves on the board of NumFOCUS as co-Chairman and Vice President. NumFOCUS is the 501(c)(3) non-profit that supports all the major tools in the Python data analysis ecosystem (incl. pandas, numpy, jupyter, matplotlib, and others.) At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysis. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for over two dozen conferences. James is also a prolific speaker: since 2013, he has given over seventy conference talks at over fifty Python events worldwide.

  • Why do I need to know Python? I'm a pandas user…
Jay Alammar
  • Large Language Models for Real-World Applications - A Gentle Intro
Jennifer Hall

Jennifer is a senior data scientist in the NHS AI Lab Imaging Team in the NHS England Transformation Directorate, exploring practical, innovative and ethical applications of AI across the NHS. Prior to this she studied Astrophysics at university and has worked in innovation, analytics and data science teams in the finance and travel industries.

  • Making fake data generators for open source healthcare data science projects
Jim Dowling

Jim is the Co-founder and CEO of Hopsworks

  • Python-centric Feature Stores
Jon Bannister
  • Notebooker: Production and Scheduling for your Jupyter Notebooks
Juan Luis Cano Rodríguez

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He works as Data Scientist Advocate at Orchest, where he empowers data scientists by building an open-source, scalable, easy-to-use workflow orchestrator. He has worked as Developer Advocate at Read the Docs, previously as software engineer in the space, consulting, and banking industries, and as a Python trainer for several private and public entities.

Apart from being a long-time user and contributor to many projects in the scientific Python stack (NumPy, SciPy, Astropy) he has published several open-source packages, the most important one being poliastro, an open-source Python library for Orbital Mechanics used in academia and industry.

Finally, Juan Luis is the founder and former chair of the Python España association, the point of contact for the Spanish Python community, former organizer of PyCon Spain, which attracted more than 800 attendees in its last in-person edition in 2019, and current organizer of the PyData Madrid monthly meetups.

  • Beyond pandas: The great Python dataframe showdown
Julien Simon

Julien is currently Chief Evangelist at Hugging Face. He's recently spent 6 years at Amazon Web Services where he was the Global Technical Evangelist for AI & Machine Learning. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in large-scale startups.

  • Machine Learning 2.0 with Hugging Face
Kajanan Sangaralingam

Head of Data Science, Mobilewalla

Kajanan Sangaralingam manages the Data Science and AI function at Mobilewalla. He is passionate about solving real business problems using innovative AI/machine learning approaches. Prior to Mobilewalla, Kajanan worked as a Senior Data Scientist at Singapore Telecommunications where he honed his skills processing and analyzing large volumes of structured and unstructured data. He earned his Ph.D. at the National University of Singapore and his Bachelor of Science in Information Technology degree at the University of Moratuwa, Sri Lanka. His early work experience included many roles as a Senior Software Engineer and Software Engineer at companies in various industries.

  • Feature Engineering Made Simple
Kishan Manani

Kishan is a machine learning and data science lead, course instructor, and open source software contributor. He contributes to well known Python packages including Statsmodels and Feature-engine. He has 10+ years of experience in applying machine learning and statistics in finance, e-commerce, and healthcare research. He leads data science teams to deliver data and machine learning products end-to-end.

Kishan attained a PhD in Physics from Imperial College London in applied large scale time-series analysis and modelling of cardiac arrhythmias; during this time he taught and supervised undergraduates and master's students.





  • Feature engineering for time series forecasting
Laszlo Sragner

I run Hypergolic, a boutique consultancy in London specialising in Machine Learning Product Management.

Formerly I was Head of Data Science at Arkera, a fintech startup in London, where I built market intelligence products with Natural Language Processing for Tier 1 investment banks and hedge funds.

Prior to that, I worked in mobile gaming for King Digital (makers of Candy Crush), specialising in player behaviour and monetisation.

I started my career as a quant researcher writing trading strategies at multiple investment managers.

  • Clean Architecture: How to structure your ML projects to reduce technical debt
Marysia Winkels
  • Models schm-odels: why you should care about Data-Centric AI
Matthew Cooper

Matt is a senior data scientist in the NHS AI Lab Skunkworks Team. In his role, he aims to help organisations across the NHS to get AI-driven applications into their hands quickly, helping data science and machine learning to help them in ways that are tailored to their day-to-day. He has a background in financial regulation and consultancy, and studied Aerospace Engineering at university.

  • Making fake data generators for open source healthcare data science projects
Mia Chang

As a ML Specialist Solutions Architect, based in Berlin, Germany, Mia shares best practices around running AI/ML workload on AWS cloud with customers. Before she became the SA role, she has backend engineer and data scientist experience, solving machine problems end-to-end, from model training to model deployment. She enjoys working in tech industry due to the positive impact that technology brings to people's daily life. Talk with her: AI/ML on AWS; any suggestion for summer vacation plan :) Reach out her via:

  • Running the first automatic speech recognition (ASR) model with HuggingFace
Natan Mish

Senior Machine Learning Engineer at Zimmer Biomet. London School of Economics graduate with an MSc in Applied Social Data Science. Passionate about using Machine Learning to solve complicated problems. I have experience analysing and researching data in the financial, real estate, transportation and healthcare industries. Curious about (almost) everything and always happy to take on new experiences and challenges. I love finding bugs, especially if they're my own making!

  • Data Validation for Data Science
Nick Radcliffe

Nick Radcliffe is a data scientist. He runs the consulting and software company, Stochastic Solutions, which produces the Miró, a commercial data analysis suite, and the open source Python TDDA Library for test-driven data analysis. He is also a Visiting Professor in the Department of Maths at Univeristy of Edinbugh, and is acting Chief Data Scientist at Smart Data Foundry.

Nick has a background in parallel & high-performance computing from his time at EPCC.

  • Parallelism the Old Way: Using MPI in Python with mpi4py
Nick Sorros

Nick has been working as a data scientist for the last 10 years. Prior to setting up MantisNLP, he was working for the Wellcome Trust, initially to set up and lead the data science team. Prior to that he worked for a couple of startups at different stages of maturity from few to dozens of employees in various sectors such as fintech and social networks. Before data science, Nick was studying and doing research at Imperial College.

During these years in the industry Nick found himself working more and more in NLP problems from detecting the language of tweets and identifying which entrepreneur statements were factual to tagging grants with thousands of labels and finding references in policy documents. This led him to create MantisNLP, a data science consultancy focused on NLP with a remote first culture and client worldwide.

  • Extreme Multilabel Classification in the Biomedical NLP domain
Orian Sharoni
  • Audio Neural Networks without Ground Truth: How to avoid humans in the loop at all costs
Pedro Tabacof

Pedro Tabacof is based in Dublin and is currently a staff data scientist at Wildlife Studios (a mobile gaming company). Previously, he has worked at Nubank (fintech) and iFood (food delivery app). He has used and deployed machine learning models for anti-fraud, credit risk, lifetime value and marketing attribution, using XGBoost or LightGBM in almost all cases. Academically, he has a master's degree in deep learning and 300+ citations.

  • Unlocking the power of gradient-boosted trees (using LightGBM)
Pranjal Biyani

Pranjal is an experienced AI Scientist building the first AI powered platform to accelerate R&D for Material Sciences across the globe. He loves opening black-box models to reveal insightful AI secrets that help decision makers adapt with the ever changing Industry needs. He also loves to teach and mentor passionate individuals aspiring to be a part of the Data Science Community, all with his favourite language, Python!

  • How to Stack Neural Networks together ? Ideas and Applications
Richard Pelgrim

Richard Pelgrim is a data scientist with a passion for communicating technical content in creative and compelling ways. Currently he does so as Developer Advocate at, the leading company built around the open-source Dask library for distributed computing in Python. Richard is regularly invited to give Dask tutorials at meet-ups and conferences and has a treasure chest of expert tips to support anyone looking to take their distributed computing to the next level.

  • Data Science at Scale with Dask
Robin Kahlow
  • Understanding your bank statement in 100ms
Sam Morley

I am a research software engineer working on the DataSig project. This project is all about bringing rough path theory and signature methods to data science applications. I maintain the Python package esig for computing signatures and the C++ library libalgebra that backs esig, along with various other similar libraries. Prior to this role I worked as a lecturer in mathematics, and I am the author of the book "Applying Math with Python".

  • Signature methods for time series data
Sarah Diot-Girard

Sarah Diot-Girard has been working with Machine Learning since 2012 and she enjoys using data science tools to find solutions to practical problems. She is particularly interested in practical issues, both ethical and technical, coming from applying ML into real life. She gave talks about data privacy and algorithmic fairness, and software engineering best practices applied to data science.

  • “Off with their I/Os!” - or how to contain madness by isolating your code
Simon Ward-Jones

Simon is a Senior Data Scientist at Deliveroo with 8 years experience in Retail and Tech. He is interested in many areas of data science having worked in machine learning, causal inference as well as experimentation design and statistics.

He is from Berkhamsted in the UK, studied maths in Oxford and now lives and works in London.

  • Introducing more of the standard library
Sylvain Corlay

Sylvain Corlay is the founder and CEO of QuantStack. He holds a PhD in applied mathematics from University Paris VI.

As an open-source developer, Sylvain Corlay is active in the Jupyter ecosystem. He is the co-creator of the Voilà dashboarding system and the Xeus C++ implementation of the Jupyter kernel protocol, and he maintains several other projects of the Jupyter stack. He is also a core contributor to conda-forge, and a several other scientific computing open-source projects, such as bqplot, xtensor, and ipyleaflet.

Beyond QuantStack, Sylvain does a lot of volunteer work for the community, as a member of the board of directors of NumFOCUS, the vice chair of JupyterCon. He also co-organizes the PyData Paris Meetup.

Sylvain founded QuantStack in September 2016. Prior to founding QuantStack, he was a Quant Researcher at Bloomberg and an Adjunct Faculty member at the Courant Institute and Columbia University.

  • Possible Futures for Jupyter
Tambe Tabitha Achere

Tambe Tabitha Achere works as a Data Analyst at Social Finance UK, a not-for-profit organisation that partners with governments, service providers, the voluntary sector, and the financial to tackle and scale solutions to social problems.

Her work in the Data + Digital Labs involves combining research and tech to reimagine public and social services for the 21st century. This involves partnering with people and communities in developing deep understanding of their most challenging problems. Then working together to design and build innovative human-centred services, that are safe and trusted, and empower people to live happy and healthy lives.

How my name is pronounced
Tambe – The “be” is pronounced as “bear” without the r.
Tabitha – Ta-bee-tha
Achere – Both “e”s are short sounds. “e” as in egg.

  • How Pyodide and a new opensource community are improving children’s social work.
Tania Allard

Tania is the co-director at Quansight Labs and previous Sr. Developer Advocate at Microsoft. She has vast experience in academic research and industrial environments. Her main areas of expertise are within data-intensive applications, scientific computing, and machine learning. Tania has conducted extensive work on the improvement of processes, reproducibility and transparency in research, data science and artificial intelligence. She is passionate about mentoring, open source, and its community and is involved in a number of initiatives aimed to build more diverse and inclusive communities. She is also a contributor, maintainer, and developer of a number of open source projects and the Founder of Pyladies NorthWest.

In her free time she likes tinkering with electronics, nerding with mechanical keyboards, reading all the books and lifting heavy weights.

  • Keynote: Key Challenges in the PyData Ecosystem and How We Can All Make a Difference
Theodore Meynard

Theodore Meynard is a senior data scientist at GetYourGuide. He works on our recommender system to help customers to find the best activities to book and locations to explore. Before GetYourGuide, he was building the recommendation system at plista to help online newspapers to monetize their content. When he is not programming, he is also involved in Pydata Berlin, helping to organize monthly meetups. Finally, he loves to ride his bike looking for the best bakery-patisserie in town.

  • Test your data like you test your code
Thomas Wiecki

Dr. Thomas Wiecki is an author of PyMC, the leading platform for statistical data science. To help businesses solve some of their trickiest data science problems, he assembled some of the best Bayesian modelers out there and founded PyMC Labs -- the Bayesian consultancy. He did his PhD at Brown University.

  • Solving Real-World Business Problems with Bayesian Modeling
  • Fuzzy Matching at Scale
Tomasz Bartczak

Tomasz is a Machine Learning Engineer at Cydar Medical. He works on variety of medical imaging analysis - segmentation, classification models, combining all that to shape the future of the software supporting endovascular aortic repair. He contributes to all the stages of the machine learning project - from the problem statement definition, through literature review and experimentation, up to deployment & monitoring. In the previous years he was developing large-scale ranking models in an e-commerce platform. Before focusing on machine learning he worked in general software engineering.

  • Beyond medical image segmentation. The road towards clinical insights.
Usman Zafar

Usman Zafar is a Machine Learning Engineer at GoDataDriven. He has worked for several years as a Data Scientist, eventually moving into a Machine Learning Engineering role where he could focus more on the implementation of machine learning models in a scalable way. More recently, he has been interested in graph neural networks from the mathematical and engineering perspective as well as their ability to solve interesting problems in a new way.

  • Using graph neural networks to embrace the dependency within your data
Valerio Maggio

Valerio Maggio is a Researcher, Data scientist, and SSI fellow currently holding an appointment of Senior Research Associate in the Dynamic Genetics Lab at the MRC Integrative Epidemiology Unit, University of Bristol. Valerio holds a Ph.D. in Computer Science from University of Naples "Federico II" with a thesis on Machine Learning for Software Maintainability. Valerio is well versed into open research software, and best software development practice. His research interests span a broad range of topics in data science, from data processing to reproducible analytics, specifically focused on addressing challenges in public health. Valerio is also an open-source contributor, and active member of the Python community, where over the years he has led the organisation of many international conferences like PyCon/PyData Italy, and EuroSciPy. In 2019 Valerio has been awarded the honorary position of Microsoft Azure Cloud Research Software Engineer due to its work for Scalable Machine Learning pipelines on Microsoft Azure.

  • Rethinking Data Visualisation with PyScript
Yizhar (Izzy) Toren
  • Testing, testing: On experimental drift and data driven product design
vincenzo crescimanna
  • Train Object Detection with small Datasets