PyData London 2022

Measurement and Fairness: Questions and Practices to Make Algorithmic Decision Making more Fair
06-18, 13:30–14:15 (Europe/London), Tower Suite 1

Machine learning is almost always used in systems which automate or semi-automate decision making processes. These decisions are used in recommender systems, fraud detection, healthcare recommendation systems, etc. Many systems, if not most, can induce harm by giving a less desirable outcome for cases where they should in fact give a more desired outcome, e.g. reporting an insurance claim to be fraud when indeed it is not.

In this talk we first go through different sources of harm which can creep into a system based on machine learning [1], and the types of harm an ML based system can induce [2].

Taking lessons from social sciences, one can see input and output values of automated systems as measurements of constructs or a proxy measurement of those constructs. In this talk we go through a set of questions one should ask before and while working on such systems. Some of these questions can be answered quantitatively, and others qualitatively [3].

[1] Suresh, H., Guttag, J., Kaiser, D., & Shah, J. (2021). Understanding Potential Sources of Harm throughout the Machine Learning Life Cycle. MIT Case Studies in Social and Ethical Responsibilities of Computing, (Summer 2021). https://doi.org/10.21428/2c646de5.c16a07bb
[2] The Trouble with Bias - NeurIPS 2017 Keynote - Kate Crawford, https://www.youtube.com/watch?v=fMym_BKWQzk
[3] Jacobs, Abigail Z., and Hanna Wallach. "Measurement and fairness." Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.


In this talk we first go through different sources of harm that creep into a data based system, such as historical harm, representation bias, measurement bias, aggregation bias, learning bias, evaluation bias, and deployment bias. We then cover different types of harm that such a system can induce, two examples of which being allocation harm and quality of service harm.

Then we move to measurement and fairness. Academics in social sciences use a different jargon than data scientists and computer scientists implementing automated systems. To bridge the gap, in this talk we explore concepts such as measurement, construct, construct validity, and construct reliability. We then go through concepts such as face validity, content validity, convergent validity, discriminant validity, predictive validity, hypothesis validity, and computational validity. By the end of this talk, you would be able to apply these lessons from social sciences in your daily data science projects to see if you should intervene at any stage of your product’s life cycle to make it more fair.


Prior Knowledge Expected

No previous knowledge expected

I'm a computer scientist / bioinformatician who has turned to be a core developer of scikit-learn and fairlearn, and work as a Machine Learning Engineer at Hugging Face. I'm also an organizer of PyData Berlin.

These days I mostly focus on aspects of machine learning and tools which help with creating more ethical and fair decision making systems. This trend has influenced me to work on fairlearn, and to work on aspects of scikit-learn which would help tools such as fairlearn to work more fluently with the package; and at Hugging Face, my focus is to enable the community of these libraries to be able to share their models more easily and be more open about their work.