Personalization systems are used for tasks like CTR prediction and rankings. Two perspectives contributed to the current design of models for personalization and recommendation:
Content-filtering, where users selected their preferred categories and were matched on their preferences.
Collaborative filtering, where recs are based on past behaviors.
Neighborhood methods that group users/products in a latent space were also deployed.
Predictive models.
Design and Architecture
Embeddings: they map each category to a dense representation in an abstract space. They can map categorical features to a dense representation.
Matrix factorization:
$i$-th product: $w_i$, $j$-th user: $v_j$, $m$ and $n$ denote total number of products and users, $r_{ij}$ is the rating of the $i$-th product by the $j$-th user.
Matrix factorization formula diagram.
Factorization machine:
Prediction function $\phi \rightarrow T$, from input datapoint $x \in \mathbb{R}$, to target label $y \in T$.
Multilayer perceptron:
Series of fully connected layers and activation function.
Users and products are described by many continuous and categorical features. Categorical features are represented by embeddings. Continuous features are transformed by an MLP to yield a dense representation of the same length as the embedding layer.
Second-order interactions — dot product of all vectors. Concatenate with the original processed dense features and post-process with output MLP, fed into sigmoid function to give a probability.
Parallelism
Embeddings contribute the majority of the parameters, with several tables requiring an excess of multiple GBs of memory.
MLP parameters are smaller in memory and translate into sizeable amounts of compute.
Data-parallelism is preferred for MLPs.
Diagram showing data parallelism strategy for DLRM with embedding tables and MLP layers.DLRM full architecture diagram showing embedding lookups, MLP bottom/top, and interaction layer.