PyData London 2022

Running the first automatic speech recognition (ASR) model with HuggingFace
06-18, 11:00–11:45 (Europe/London), Tower Suite 1

Come and learn your first audio machine learning model with Automatic speech recognition (ASR) use case! ASR has been a popular application like voice-controlled assistants and voice-to-text/speech-to-text applications. These applications take audio clips as input and convert speech signals to text.


This talk is aiming for Python developers or ML practitioners who are knowing Python, and interested in working with audio machine learning use case. I will cover minimum slides about ML algorithm in this talk. Instead, I will walk through types of ASR applications, like automatic subtitling for videos and transcribing meetings. So you will know what are the occasions to work with ASR models. And talk about data processing of audio data, how to do feature extraction, and Fine-tune Wav2Vec2 using HuggingFace. The notebook that presented in the talk is running on Amazon SageMaker, the concept for this talk is cloud agnostic and applies to local computer(on premises) as well.


Prior Knowledge Expected

Previous knowledge expected

As a ML Specialist Solutions Architect, based in Berlin, Germany, Mia shares best practices around running AI/ML workload on AWS cloud with customers. Before she became the SA role, she has backend engineer and data scientist experience, solving machine problems end-to-end, from model training to model deployment. She enjoys working in tech industry due to the positive impact that technology brings to people's daily life. Talk with her: AI/ML on AWS; any suggestion for summer vacation plan :) Reach out her via: https://www.linkedin.com/in/mia-chang/