Factorization Machine for Prediction
Factorization Machine for regression and classification
Note 1 : The PyTorch implementation is mine while the Keras and Theano versions were found on the web (references at the top of relevant notebooks).
Note 2 : The PyTorch implementation uses the efficient O(k.N) formula from Steffen Rendle.
Note 3 : I am quite new to PyTorch so do not hesitate to highlight all improvements you deem necessary.
Factorization Machine offer an efficient way to include interactions between pair of variables in a linear or logistic equation.
When the problem has only a few explanatory variables one can simply extend the linear equation with product variables :
Y = [w1 x1 + w2 x2 + w3 x3 + w4 x4] + [w12 x1.x2 + w13 x1.x3 + w14 x1.x4 + w23 x2.x3 + w24 x2.x4 + w34 x3.x4]
However when the model has 1000s of variables (when is likely to happen with one-hot encoding of categorical variables) this strategy is not optimal. It can still be done (using regularization penalties) tough but Factorization Machines are a good alternative.
Factorization Machine constrain the previous equation to have cross-weights that are the dot-products of feature embedding vectors. Concretely the model searches vectors V1, V2, V3, V4 such that w(i,j) can be replaced by V(i).V(j) :
Y = [w1 x1 + w2 x2 + w3 x3 + w4 x4] + SUM(i,j)OF[ dotprod(V(i),V(j)) x(i).x(j) ]
Instead of (0.5 * N^2) cross-weights the model "only" has to find N embedding vectors.
Some great links about Factorization Machine for binary classification
- https://arxiv.org/abs/1701.04099 (Field-aware Factorization Machines in a Real-world Online Advertising System)