Propose the length of embedding vector for each embedded feature.

These functions returns proposed embedding sizes for each categorical feature. They are "rule of thumbs", so the are based on empirical rather than theoretical conclusions, and their parameters can look like "magic numbers". Nevertheless, when you don't know what embedding size will be "optimal", it's good to start with such kind of general rules.

google Proposed on the Google Developer site $$x^0.25$$
fastai $$1.6 * x^0.56$$

embedding_size_google(x, max_size = 100)

embedding_size_fastai(x, max_size = 100)

Arguments

x: (integer) A vector with dictionary size for each feature

Value

Proposed embedding sizes.

References

Examples

dict_sizes <- dict_size(tiny_m5)
embedding_size_google(dict_sizes)
#>      item_id      dept_id       cat_id     store_id     state_id        value 
#>            3            2            2            2            2            4 
#>     wm_yr_wk      weekday         wday        month         year event_name_1 
#>            5            2            2            2            2            3 
#> event_type_1 event_name_2 event_type_2         snap 
#>            2            2            2            2 
embedding_size_fastai(dict_sizes)
#>      item_id      dept_id       cat_id     store_id     state_id        value 
#>           10            5            3            6            3           33 
#>     wm_yr_wk      weekday         wday        month         year event_name_1 
#>           37            5            5            6            4           11 
#> event_type_1 event_name_2 event_type_2         snap 
#>            4            4            3            2