The simplest and maybe the most intuitive way to aggregate the sub relevance tables is to manually evaluate the quality and assign a weight for each relevance table, then we can just do weighted linear combination of those relevance tables to generate the result. However apparently this is not good enough: First the quality of the relevance tables will change when they update; second this global model is not the best to capture all the useful information in those relevance tables. For example some accurate relevance information will be neglected because the overall quality of the relevance table containing it is low. So we use a more sophisticated aggregation algorithm.
The entire workflow is as the following. In the next chapters the details of the process will be described.
Fig 1. Relevance Table Aggregation Workflow
First we have a front-end to collect domain experts’ label results with the name “Libra”. In each label result, the expert is presented three shows (referred to as a “show tuple”, denoted by A, B and C), and the expert should answer the question that “Which show in B and C is more relevant to show A?” The answer could be B or C. When both B and C are not relevant, or it’s hard to make the judgment, the expert could also select “skip this tuple”.
Note the same show tuple could be labeled by different experts or even the same expert at different time to test if consensus could be made.
Instead of letting the expert directly assign the relevance value between two shows, we prefer the above way because it's much easier for the expert to compare two relevance values than decide one relevance value.
Second, a machine-learning-based algorithm based on the label results (as ground truth) and the sub-relevance tables (as features) is introduced to generate the final combined relevance table for our various products.
The following objective function is defined:
where (k,i,j) is the labeling result showing that the expert prefers show i to show j for source show k; is the vector containing the relevance values from show k to show i in all sub-relevance tables; is a scoring function to generate the final relevance table, and is a parameter controlling the "smoothness" of the objective function. So the goal is to minimize the objective L and get the optimal function .
We propose two ways to model : linear combination and non-linear combination.
(1) Linear Combination
is modeled as follows:
where w is the weight vector. The optimal w could be generated by stochastic gradient descent (SGD), as the following algorithm shows.
a. For each round of iteration, enumerate all labeling results and for each label result (k,i,j), do the following:
(i). Calculate the gradient of the loss function w.r.t the weight vector: (ii). Update the weight vector: b. The iteration is repeated until the weight vector converges or the total loss is lower than a threshold.
(2) Non-linear (Additive) Combination
is modeled as a boosted regression tree:
where is a one-level tree, i.e. is a binary function with only 0 and 1 as output and is generated by thresholding one of the sub relevance tables, and is the corresponding multiplier.
The gradient boosting method is used to get the (sub-) optimal , as the following algorithm describes.
b. In each round of iteration, do the following:
(i). Enumerate all pairs of shows and for each show pair , compute the pseudo-residuals for each item:
(ii). Fit a sub-model to pseudo-residuals .
(iii). Compute the multiplier by:
(iv). Update the model:
c. Repeat the above process until the total loss is below a threshold.
Significant improvements have been observed from experiments using the new aggregated relevance table, compared with the manual-weighted linear combination method. We have found:
- Around 4.51%+ on eCTR (effective CTR) on the "YouMayAlsoLike" tray relatively, and
- Around 5.69%+ on watch minutes from the part of autoplay that is related to the relevance table.
The relevance table serves as an important part in the recommendation system in Hulu. Online experiments show significant improvements on the machine-learning-based relevance table aggregation over fixed weighted combination, especially with the non-linear combination method,. Furthermore there are still other questions to answer in our recommendation engine, such as how to ensure diversity, adjust the relevance table by users’ explicit feedback, and utilize context information.