PyData London 2022

Parallelism the Old Way: Using MPI in Python with mpi4py
06-17, 15:30–17:00 (Europe/London), Tower Suite 2

MPI is one of the oldest best-established and best-tested approaches to parallel computing, with bindings for most languages and availability on most systems. MPI uses explicit message passing and can be used on "shared-nothing" systems (in which each process/processor has its own memory, unavailable to other processors) as well as shared-memory systems, (uniform and non-uniform).
This tutorial will provide a gentle introduction to parallel computing using specifically MPI using the Python mpi4py library.


There are many different approaches to parallel computing available in Python, many of which are hampered by the global interpreter lock (the GIL). Rather than focusing on threads (for concurrency) or the multiprocessing module (for process-based parallelism) from the standard library, this tutorial will introduce MPI, the Message-Passing Interface, which is the most widely used and successful approach to parallel computing across all languages and systems.

Using the mpi4py module, the tutorial will introduce parallel computing, explain the difference between multi-threading and parallelism, briefly explain the GIL, and then introduce the well-established, cross-language approach using MPI through the mpi4py module, which by-passes the GIL.

The tutorial will guide participants through a from-scratch construction of a task farm using MPI.

Nick Radcliffe is a data scientist. He runs the consulting and software company, Stochastic Solutions, which produces the Miró, a commercial data analysis suite, and the open source Python TDDA Library for test-driven data analysis. He is also a Visiting Professor in the Department of Maths at Univeristy of Edinbugh, and is acting Chief Data Scientist at Smart Data Foundry.

Nick has a background in parallel & high-performance computing from his time at EPCC.