PyData London 2022

Don't Stop 'til You Get Enough - Hypothesis Testing Stop Criterion with “Precision Is The Goal”
06-19, 11:45–12:30 (Europe/London), Tower Suite 3

In hypothesis testing the stopping criterion for data collection is a non-trivial question that puzzles many analysts. This is especially true with sequential testing where demands for quick results may lead to biassed ones. I show how the belief that Bayesian approaches magically resolve this issue is misleading and how to obtain reliable outcomes by focusing on sample precision as a goal.


Hypothesis testing may come off as a dark art. On the one hand, data collection is expensive. On the other, small data sets may not yield enough statistical significance to draw meaningful conclusions. Combining these constraints with stakeholder requirements for quick answers from data makes the task of choosing the sample size stopping criterion a challenging balancing act.

This is especially true if the data is collected in a sequential manner, where a person, or an algorithm, needs to determine when to stop collecting data to satisfy the project requirements without introducing confirmation bias.

This talk is targeted to anyone involved in experimentation, technical or managerial, and is interested in improving how they plan an experiment budget and conduct post data collection interpretation. A basic understanding of statistics and hypothesis testing experience are nice-to-haves but not essential as I will outline the basics.

In this talk you will learn why even though Bayesian approaches are more reliable than Frequentist ones for small data sets they do not magically solve the problem of confirmation bias. This will be followed with an introduction to John Kruschke's “Precision is the Goal” method, where by determining in advance the experiment expected precision level yields robust results.

Demonstrating on a pythonic demo calculator I conclude with a discussion of the importance of communication of the considerations with stakeholders for expectation management.

Slides: bit.ly/precision-goal

  • Motivation (3 minutes)
  • A Gentle Introduction to Hypothesis Testing (5 minutes)
  • The Problem with Extreme Stop Criterions in sequential hypothesis testing (10 minutes)
  • Precision is the Goal as a proposed Stop Criterion (9 minutes)
  • Real life considerations in application and demoing with a streamlit calculator (5 minutes)
  • Summary (3 minutes)
  • Q/A (5 minutes)

Prior Knowledge Expected

No previous knowledge expected

Ex-cosmologist turned data scientist with over 15 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.

My main drive as a data scientist is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for facilitating stakeholders to make data driven decisions by quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.

My claim for fame is that between 2004-2014 I lived in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).