PyData London 2022

Using graph neural networks to embrace the dependency within your data
06-19, 10:15–11:00 (Europe/London), Tower Suite 3

Many machine learning models we use today have the core assumption that our data needs to be tabular, but how often is this truly the case? What if our data points are not independent? By ignoring the potential interrelatedness of our data, do we lose meaningful information that our models cannot leverage? In this talk, we shall explore graph neural networks and highlight how they can solve interesting problems in a way that is intractable when limiting ourselves to using tabular data.

We will look at the limitations of common algorithms and highlight how some clever linear algebra enables us to incorporate more meaningful information into our models. Social network data is a popular example of where relationships are relevant but relationships exist in many types of data where it may not be so obvious. Whether it's e-commerce, logistics or molecular data, relationships within your data likely exist and making use of them can be incredibly powerful.

This talk will hopefully spark your curiosity and provide you with a way of looking at problems from a new angle. It is intended for anyone with an interest in machine learning and will only lightly touch on some technical details.


Representing data in a tabular way is almost second nature to us and we seldom think about the limitations this can cause. This talk aims to help attendees develop an intuition for how graph neural networks (GNNs) can potentially solve their problems more effectively by accounting for the relationships present in our data. With tools like Neo4j, Spark and PyTorch Geometric, implementing GNNs has never been easier.

Below is a rough outline of the schedule of the talk:

0-5 minutes: - Introduction to talk - Limitations of methods that require tabular datasets - Examples of different problems and how data can be represented as a graph

5-15 minutes: - Explain how we go from a graph of nodes/edges to a matrix representation - Theory of message passing and convolutions as a way to change the representation of a graph - Different types of problems we can solve: node prediction, link prediction, graph-level classification

15-25 minutes: - Provide one detailed use-case (link prediction task) - List open-source frameworks that can help implement GNNs - If time permits, share personal experiences/challenges

25-30 minutes: - Q&A


Prior Knowledge Expected

No previous knowledge expected

Usman Zafar is a Machine Learning Engineer at GoDataDriven. He has worked for several years as a Data Scientist, eventually moving into a Machine Learning Engineering role where he could focus more on the implementation of machine learning models in a scalable way. More recently, he has been interested in graph neural networks from the mathematical and engineering perspective as well as their ability to solve interesting problems in a new way.