In Hulu we continuously seek ways to improve our users’ content discovery experiences by various recommendation techniques. One of the most important components supporting the content discovery products is the relevance table.The relevance table could be simply regarded as a 2-dimensional real-valued matrix showing how "relevant" or "similar" every two pieces of content are. Ultimately we need one single relevance table to generate the recommendation results such as autoplay videos or top 10 recommended shows for our users. But the problem is that, internally instead of one single relevance table, we get many (sub-) relevance tables from different sources, for example we have relevance tables generated from our users' watch behaviors, search behaviors and the content metadata in Hulu, respectively. We will introduce how do we generate the final relevance table in production by aggregating these sub relevance tables with domain expert labeling information.Without loss of generality, to simplify the problem, let's assume all relevance tables are in the grain size of TV shows, i.e., each relevance table represents the relevance between TV shows.
The Workflow

The simplest and maybe the most intuitive way to aggregate the sub relevance tables is to manually evaluate the quality and assign a weight for each relevance table, then we can just do weighted linear combination of those relevance tables to generate the result. However apparently this is not good enough: First the quality of the relevance tables will change when they update; second this global model is not the best to capture all the useful information in those relevance tables. For example some accurate relevance information will be neglected because the overall quality of the relevance table containing it is low. So we use a more sophisticated aggregation algorithm.

The entire workflow is as the following. In the next chapters the details of the process will be described.

Fig 1. Relevance Table Aggregation Workflow

The "Libra" Front-end

First we have a front-end to collect domain experts’ label results with the name “Libra”. In each label result, the expert is presented three shows (referred to as a “show tuple”, denoted by A, B and C), and the expert should answer the question that “Which show in B and C is more relevant to show A?” The answer could be B or C. When both B and C are not relevant, or it’s hard to make the judgment, the expert could also select “skip this tuple”.

Note the same show tuple could be labeled by different experts or even the same expert at different time to test if consensus could be made.

Instead of letting the expert directly assign the relevance value between two shows, we prefer the above way because it's much easier for the expert to compare two relevance values than decide one relevance value.

The Machine-learning-based Relevance Aggregation

Second, a machine-learning-based algorithm based on the label results (as ground truth) and the sub-relevance tables (as features) is introduced to generate the final combined relevance table for our various products.

The following objective function is defined:

where (k,i,j) is the labeling result showing that the expert prefers show i to show j for source show k; fmq17 is the vector containing the relevance values from show k to show i in all sub-relevance tables; fmq17 is a scoring function to generate the final relevance table, and fmq17 is a parameter controlling the "smoothness" of the objective function. So the goal is to minimize the objective L and get the optimal function fmq17.

We propose two ways to model fmq17: linear combination and non-linear combination.

(1) Linear Combination

fmq17 is modeled as follows:

where w is the weight vector. The optimal w could be generated by stochastic gradient descent (SGD), as the following algorithm shows.

Algorithm I

a. For each round of iteration, enumerate all labeling results and for each label result (k,i,j), do the following:

  (i). Calculate the gradient of the loss function w.r.t the weight vector:   (ii). Update the weight vector: b. The iteration is repeated until the weight vector converges or the total loss is lower than a threshold.


(2) Non-linear (Additive) Combination

fmq17 is modeled as a boosted regression tree:

where fmq17 is a one-level tree, i.e.fmq17 is a binary function with only 0 and 1 as output and is generated by thresholding one of the sub relevance tables, and fmq17 is the corresponding multiplier.

The gradient boosting method is used to get the (sub-) optimal fmq17, as the following algorithm describes.

Algorithm II

a. Initialize fmq17

b. In each round of iteration, do the following:

  (i). Enumerate all pairs of shows and for each show pair , compute the pseudo-residuals for each item:

  (ii). Fit a sub-model fmq17 to pseudo-residuals fmq17.

  (iii). Compute the multiplier by:

  (iv). Update the model:

c. Repeat the above process until the total loss is below a threshold.



Significant improvements have been observed from experiments using the new aggregated relevance table, compared with the manual-weighted linear combination method. We have found:

  1. Around 4.51%+ on eCTR (effective CTR) on the "YouMayAlsoLike" tray relatively, and
  2. Around 5.69%+ on watch minutes from the part of autoplay that is related to the relevance table.

The relevance table serves as an important part in the recommendation system in Hulu. Online experiments show significant improvements on the machine-learning-based relevance table aggregation over fixed weighted combination, especially with the non-linear combination method,. Furthermore there are still other questions to answer in our recommendation engine, such as how to ensure diversity, adjust the relevance table by users’ explicit feedback, and utilize context information.