PyData London 2022

Why do I need to know Python? I'm a pandas user…
06-19, 15:00–15:45 (Europe/London), Tower Suite 3

You use pandas every day. You know every keyword argument on every function, even .melt! You even know whether it's .rename, .rename_axis, or .set_axis that you want—and you get it right on the first try! So why bother learning Python? Sure, pandas is written in it, but outside of assembling parts of the pandas API, what's there that has any value in your life?


It's common for data scientists to narrowly focus on the APIs of the tools they use every day—pandas, matplotlib, pymc, dask, &c.—to the detriment of any focus on the surrounding programming language. In the case of tools like matplotlib, the total amount of Python we need to know is limited to what existed when matplotlib was first developed. (Did you know that matplotlib predates @property? That explains a lot…) In the case of newer tools like dask or pymc or even pandas, we may encounter some newer parts of Python—e.g., context managers or descriptors—as part of these tools' API design, but it's very easy to accept these as mere “syntax.”
In this talk, we will discuss where a deeper understanding of pure Python has direct and immediate consequences to your work as a data scientist. We will discuss where these parts of Python you may have skimmed over show up in analytical code, outside of the mere “syntax” of an API.
This talk will be organised around answering the following questions:
- why do generators even matter (and who cares about coroutines)?
- the itertools module is great… if I were writing scripts, but where does it show up in data analysis?
- object orientation seems like a bunch of bureaucracy—can it really simplify my analytical code?
- why should I bother with data types in the builtins and collections; is the pandas.DataFrame not enough?
- knowledge of Python internals would probably be useful, if I were a programmer writing scripts, but why do they matter for a data scientist?


Prior Knowledge Expected

No previous knowledge expected

James Powell serves on the board of NumFOCUS as co-Chairman and Vice President. NumFOCUS is the 501(c)(3) non-profit that supports all the major tools in the Python data analysis ecosystem (incl. pandas, numpy, jupyter, matplotlib, and others.) At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysis. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for over two dozen conferences. James is also a prolific speaker: since 2013, he has given over seventy conference talks at over fifty Python events worldwide.