PyData London 2022

Nick Sorros

Nick has been working as a data scientist for the last 10 years. Prior to setting up MantisNLP, he was working for the Wellcome Trust, initially to set up and lead the data science team. Prior to that he worked for a couple of startups at different stages of maturity from few to dozens of employees in various sectors such as fintech and social networks. Before data science, Nick was studying and doing research at Imperial College.

During these years in the industry Nick found himself working more and more in NLP problems from detecting the language of tweets and identifying which entrepreneur statements were factual to tagging grants with thousands of labels and finding references in policy documents. This led him to create MantisNLP, a data science consultancy focused on NLP with a remote first culture and client worldwide.

The speaker's profile picture

Sessions

06-19
11:45
45min
Extreme Multilabel Classification in the Biomedical NLP domain
Nick Sorros

Extreme multilabel classification refers to cases where the prediction space of a multilabel classifier is in the thousands of millions of labels which is an order of magnitude more than typical problems. The scale of such problems brings some unique challenges that one has to work around with such as memory, model size, train and inference time. This talk will discuss 1) how you can overcome those challenges, 2) relevant state of the art architectures for this problem 3) learning from the development of an transformers based nlp model to tag biomedical grants with 29K MeSH tags

Tower Suite 1