Feature engineering for time series forecasting PyData London 2022

Feature engineering for time series forecasting
.ical

06-19, 14:15–15:00 (Europe/London), Tower Suite 2

To use our favourite supervised learning models for time series forecasting we first have to convert time series data into a tabular dataset of features and a target variable. In this talk we’ll discuss all the tips, tricks, and pitfalls in transforming time series data into tabular data for forecasting.

Forecasting is the process of making predictions about the future based on past data. In the most traditional scenario, we have a time series and want to predict its future values. There are some challenges in creating forecasting features:

we need to transform time series data into tabular data with a well-designed set of features and a target variable;
when creating forecasting features we need to be extra careful to avoid data leakage via look-ahead bias;
time series data, as expected, changes over time; we need to take this into account when building forecasting features;
predicting the target value at multiple timesteps in the future requires us to think carefully about how to extrapolate our features from the past into the future.

We can forecast future values of the time series using off-the-shelf regression models like linear regression, tree-based models, support vector machines, and more. However, these models require tabular data as input. For forecasting we don’t start with a table of features and a target variable, but instead a set of time series, perhaps just one. We need to transform the time series into tabular data with a target variable and a set of features that can be used by supervised learning models. Therefore, the main challenge is about creating a well-designed target variable and specially designed features that allow us to predict the future value of a time series.

Creating the target variable and features for time series forecasting comes with its own pitfalls. A major concern is a form of data leakage known as look-ahead bias. This is where you accidentally use information that is only known in the future, not at predict time, to make a prediction. This can give you the illusion that you have a great forecasting model, however, in practice it will not perform. It is very easy to introduce look-ahead bias during feature engineering and we show how you can avoid it.

Time series data change over time, that is, future data may or may not have the same distribution and patterns that we have in past data, this is different from the assumptions made about traditional tabular data. This change in distribution and patterns over time is called non-stationarity. In time series data, the simple presence of trend and seasonality can cause non-stationarity. Creating features that capture this dynamic is thus a challenge in time series forecasting.

We very often want to forecast multiple timesteps into the future. There are multiple ways to do this, such as 1) recursively applying a model that is built to forecast one step ahead, and 2) building a model that directly forecasts the target at a later time period in the future. A challenge is that the feature engineering required for these two methods are different.

How can we create a set of features that allow us to predict future values of a time series based on its past values? And how can we add additional information to create a richer dataset for our forecasts? In this talk we will discuss all of these topics and more.

Prior Knowledge Expected –

No previous knowledge expected

Kishan Manani

Kishan is a machine learning and data science lead, course instructor, and open source software contributor. He contributes to well known Python packages including Statsmodels and Feature-engine. He has 10+ years of experience in applying machine learning and statistics in finance, e-commerce, and healthcare research. He leads data science teams to deliver data and machine learning products end-to-end.

Kishan attained a PhD in Physics from Imperial College London in applied large scale time-series analysis and modelling of cardiac arrhythmias; during this time he taught and supervised undergraduates and master's students.

Twitter: https://twitter.com/KishManani

LinkedIn: https://www.linkedin.com/in/kishanmanani/

Medium: https://medium.com/@kish.manani

Website: https://www.courses.trainindata.com/p/feature-engineering-for-forecasting

Feature engineering for time series forecasting .ical 06-19, 14:15–15:00 (Europe/London), Tower Suite 2

Feature engineering for time series forecasting
.ical

06-19, 14:15–15:00 (Europe/London), Tower Suite 2