R/categorical.R
embedding_size.Rd
These functions returns proposed embedding sizes for each categorical feature. They are "rule of thumbs", so the are based on empirical rather than theoretical conclusions, and their parameters can look like "magic numbers". Nevertheless, when you don't know what embedding size will be "optimal", it's good to start with such kind of general rules.
google Proposed on the Google Developer site $$x^0.25$$
fastai $$1.6 * x^0.56$$
embedding_size_google(x, max_size = 100)
embedding_size_fastai(x, max_size = 100)
(integer
) A vector with dictionary size for each feature
Proposed embedding sizes.
dict_sizes <- dict_size(tiny_m5)
embedding_size_google(dict_sizes)
#> item_id dept_id cat_id store_id state_id value
#> 3 2 2 2 2 4
#> wm_yr_wk weekday wday month year event_name_1
#> 5 2 2 2 2 3
#> event_type_1 event_name_2 event_type_2 snap
#> 2 2 2 2
embedding_size_fastai(dict_sizes)
#> item_id dept_id cat_id store_id state_id value
#> 10 5 3 6 3 33
#> wm_yr_wk weekday wday month year event_name_1
#> 37 5 5 6 4 11
#> event_type_1 event_name_2 event_type_2 snap
#> 4 4 3 2