It should not be surprising that in any field of human activity, when many persons work (more or less) independently across the world, the same thing can be known by different terms. For example, in the classical statistics, the same variables can be described as “explanatory variables” or “predictors” while in the related ML world they are called “features” or “input variables”. For more elaborate text on this topic, read Statistical Modeling: The Two Cultures by Leo Breiman or concise subchapter from EMA book by P. Biecek and T. Burzykowski (they refer to Breiman’s paper).
In this vignette we consider not only the data science terminology, but also the names existing in different packages’ APIs (what is, obviously, partially based on the scientific naming).
In the table below you can see, what names is decided to use, where they come from and what are their synonyms in the scientific world or among the other R/Python packages API etc.
Name | Explanation | Consistent with | Other names |
---|---|---|---|
index |
A time variable like Date
|
Inspired by index variable in tsibble package |
• .date_var in timetk • time_idx in Python library pytorch-forecasting
|
key |
A variable (or variables) to distinguish different time series in the dataset | • tsibble package• SQL databases • data.table
|
• id in modeltime.gluonts (for example: deep_ar ) • group_ids in Python library pytorch-forecasting
|
timesteps |
A number of timesteps used to train the model | Timestep is a commonly used word in Deep Learning terminology to describe a “moment” in a time series (sequence). | Meaning partially reflected by: • lookback_length in modeltime.gluonts (e.g. nbeats ) • lookback in forecastML
|
horizon |
Length of a output sequence, i.e. how many steps ahead we’d like to forecast. If we consider that each future timestep refer to a separate horizon, horizon is the maximal horizon of the forecast |
• horizons in forecastML • FPP book by Hyndman and Athanasopoulos • term used in scientific papers, e.g. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting by Lim et al., by meaning in the context of this particular paper slightly differ (refers to a single timestep in the forecast, not to the maximal length of forecast) |
• prediction_length in modeltime.gluonts (e.g. nbeats ) • h in forecast package |
predictors |
Input variables |
recipes package API from tidymodels
|
• ML: features or input variables • Statistics: explanatory variables, independent variables etc. |
outcomes |
Target variables | • recipes package API from tidymodels • outcome_col variable and outcome term in vignettes in forecastML
|
• ML: outputs, targets • Statistics: response, dependent variables etc. • target in Python library pytorch-forecasting
|
Bear in mind that API may evolve. Especially, if I would like to implement new engines to parsnip
models provided in modeltime.gluonts
, we have to stick to the same name arguments in both cases.