PyData London 2022

Natan Mish

Senior Machine Learning Engineer at Zimmer Biomet. London School of Economics graduate with an MSc in Applied Social Data Science. Passionate about using Machine Learning to solve complicated problems. I have experience analysing and researching data in the financial, real estate, transportation and healthcare industries. Curious about (almost) everything and always happy to take on new experiences and challenges. I love finding bugs, especially if they're my own making!


Sessions

06-17
09:00
90min
Data Validation for Data Science
Natan Mish

Have you ever worked really hard on choosing the best algorithm, tuned the parameters to perfection, and built awesome feature engineering methods only to have everything break because of a null value? Then this tutorial is for you! Data validation is often neglected in the process of working on data science projects. In this tutorial, we will demonstrate the importance of implementing data validation for data science in commercial, open-source, and even hobby projects. We will then dive into some of the open-source tools available for validating data in Python and learn how to use them so that edge cases will never break our models. The open-source Python community will come to our help and we will explore wonderful packages such as Pydantic for defining data models, Pandera for complementing the use of Pandas, and Great Expectations for diving deep into the data. This tutorial will benefit anyone working on data projects in Python who want to learn about data validation. Some Python programming experience and understanding of data science are required. The examples used and the context of the discussion is around data science, but the knowledge can be implemented in any Python oriented project.

Tower Suite 2