One of the reasons I love working on advertising at Hulu is that many of the problems we are trying to solve have an algorithmic flavor. The problem of predicting how many unique users a given advertising campaign will reach lends itself to an algorithmic approach. More specifically: if we are given two factors – the number of ad views the advertising campaign is scheduled to deliver and the dates the campaign is scheduled to run – how can we better predict the number of unique users the campaign will reach?

**A First Approach**

A first approach at solving this problem is to compute the average number of ad views per unique user in the given time period. For example, if I want to estimate unique users reached for a two-week long campaign with one million ad views, I might do the following:

- From historical data we find that the average number of impressions seen by a user over two weeks is
*X*. - The ad will deliver one million views so the estimated number of unique users it will hit is
*1000000 / X*.

**Analysis**

This approach drastically underestimates the actual number of uniques reached. Imagine if all users watched X ads in a two-week period. Then this approach is actually calculating the smallest possible number of users that can be reached and still have this ad deliver one million views!

**A Second Approach**

To more accurately estimate the reach of this campaign, we need more information. We need to know the distribution of views by user – that is, how many users watched 1 ad in the time period, 2 ads, 3 ads, etc. The approach we use at Hulu is to run a simulation using this distribution. The concept is fairly simple: given the distribution of views per user, randomly pick one million views out of the distribution and keep track of how many unique users are reached. The distribution of views specifies a discrete probability distribution; the probability that a given view reaches a given user is the number of ad views for that user divided by the total number of ad views over all users.

We can reduce this problem to the problem of simulating the drawing of colored balls at random out of a bag where the number of balls in each color is given to us. The user IDs are the colors and the number of ad views the user sees is the number of balls in the bag. In our case, there are millions of colors and we are choosing many random balls from a given distribution, so we should choose an algorithm that scales well with the number of colors and that makes the “choose” operation fast, possibly with the help of pre-computations.

There are two related problems to consider. The first is the classic problem of drawing from a discrete distribution with replacement where we do not alter the distribution after every draw. Algorithms for this problem are well known and we’ll discuss some standard algorithms for solving it. The second, more difficult, problem is to draw from a distribution without replacement where we alter the distribution after every draw by decreasing the count of the drawn color by one. In this blog post, we’ll discuss an extension to one of the standard algorithms for solving the first problem that allows it to handle the problem of the changing distribution with the same asymptotic runtime efficiency as algorithms to the first problem achieve.

**Random Sampling from a Static Discrete Probability Distribution**

First we’ll look at the classic problem of drawing a random color from the distribution with replacement. The standard algorithms for solving this problem are a simple linear search and a faster binary search.

*Linear Search*

A simple solution to this problem is to iterate through the distribution. Say we are given the distribution in the form of a hash map mapping colors to counts, and we have standard random number generation routines.

//randomly choose an integer between 0 (inclusive) //and sumCounts (exclusive) int randInt = random(0, sumCounts); int curSum = 0; foreach (string color in dist.keys) int colorCount = dist[color]; curSum += colorCount; if (curSum > randInt) return color;

This solution requires *O(n)* time where *n* is the number of colors. Indeed, this is the best we can do if we must process the distribution when we choose the random color, but if we invoke the choose operation many times on a distribution, we can achieve a better amortized bound using some pre-computation.

*Binary search*

The first solution determines in which interval the randomly chosen integer falls. As an example, suppose the distribution has 3 red balls, 2 green balls and 1 white ball. In this distribution, we choose a random integer from 0 to 5 (6 possible choices corresponding to the 6 balls). The red interval is 0 to 2, the green interval is 3 to 4, and the white interval is 5 to 5. The end points of the intervals, 2, 4 and 5 respectively, will always be strictly increasing, and so we can do a binary search over them. We require pre-computation to compute the end points of the intervals, but after that, randomly choosing a color is an *O(log n)* operation.

//Pre-computation int[] partialSums = new int[dist.length]; string[] colorMap = new string[dist.length]; int psumInd = 0; foreach (string color in dist.keys) { if (psumInd == 0) partialSums[psumInd] = dist[color]; else partialSums[psumInd] = partialSums[psumInd - 1] + dist[color]; colorMap[psumInd] = color; psumInd++; } //Choosing a random color int randInt = random(0, sumCounts); /* If randInt is not in the array, most binary search implementations will allow you to compute which index it would be in if it were inserted into the array in sorted order. This is the index we want. */ int colorInd = binarySearch(partialSums, randInd); return colorMap[colorInd];

The second solution requires *O(n)* for the pre-computation step, which is done once for the distribution. It requires *O(log n)* for the choose operation.

I should mention that there is another solution that yields *O(1)* time for the choose operation. We can expand out the distribution array, creating an array of length 6 with 3 reds, 2 greens,and 1 white in the above example. This approach uses space proportional to the sum of all the counts of the colors which is too much space for our application at Hulu. If space is not an issue, then this is the fastest implementation of the choose operation.

**An Extension**: **Simulation Without Replacement**

The previous solutions solved the problem of drawing from the probability distribution with replacement. For our problem of estimating uniques, it is more accurate to do simulations without replacement – we would like to keep the ball out of the bag and alter the original distribution by decreasing the count of the chosen color by one. If we apply the binary search solution to this problem, we will need to update the partial sum array. Specifically, we need to decrease the partial sum of every element whose index is greater than or equal to the index of the chosen color. This update requires *O(n)* time making the final running time of the solution *O(n)*. Ideally, we would like to preserve the *O(log n)* bound. To solve the main difficulty of updating all counts after a certain index, we use a modified binary search tree that supports bulk updates.

First we start with a standard binary search tree where each node stores a color and the partial sum of the color:

class Node { int partialSum; String color; Node left, right; }The tree satisfies the standard binary search property: every node in the left sub-tree of a node has a smaller partial sum, and every node in the right sub-tree has a larger partial sum. Searching through this tree is easy but we still run into the problem of updating all values after a certain value. If the tree allowed us to efficiently bulk update all the nodes in the right sub-tree of a given node, then we could use this operation to efficiently update all the nodes in a tree whose partial sum is greater than the chosen partial sum. We can modify the tree to support this operation by storing the partial sums of the nodes in the right sub-tree as a delta from the root of that tree. For example, if the original tree was:

We modify the tree to look like:

All nodes are stored as the delta from the partial sum value of its most immediate left ancestor. If it has no left ancestors, the partial sum of the node is stored directly. The stored values in the modified tree do not necessarily satisfy the binary search property, but the implied values (the partial sums) do. As we traverse the tree, we can easily compute the implied value of any node by keeping track of the accumulated deltas as we go, updating the total if and only if we go down the right branch. This tree allows us to efficiently update the counts of all nodes that follow the chosen node. First we observe that the set of all nodes that follow our chosen node is not just the nodes in the right sub-tree of the chosen node. For example, the set of all nodes greater than the 5 in the above tree includes the 7, as well as the 10 and all the nodes in 10’s right sub-tree. We can see that the set of all nodes greater than our chosen node is precisely the union of the right sub-trees of all ancestor nodes from which we went left in order to arrive at the chosen node. We can decrease the count of all these nodes by 1 by decreasing the count of all such ancestors by 1. There are *log(n)* ancestors in a balanced tree (more about this in a bit) so this algorithm runs in *O(log n)* time.

**C# Implementation of this Algorithm:**

public int Draw() { if (_totalSum == 0) { throw new Exception("Tried to draw from empty dist."); } int r = rand.Next(0, _totalSum); Node n = Search(_tree, r); _totalSum--; return n.index; } /* Try to find the least node whose partial sum is greater than the targetDelta */ private Node Search(Node node, int targetDelta) { if (targetDelta > node.delta || node.delta == 0) { if (node.right == null) { /* No Nodes in this subtree have partial sum greater than the targetDelta */ return null; } else { /* All nodes in the right subtree have partial sum increased by the delta of the root. Handle this by decreasing our target delta. */ return Search(node.right, targetDelta - node.delta); } } else { /* Update delta of ancestors that are greater than the chosen node */ node.delta--; if (node.left == null) { return node; } else { /* Search the left subtree for the smallest node whose partial sum is greater than target delta. If none exists, then the root is the node we want. */ Node leftSearch = Search(node.left, targetDelta); return leftSearch == null ? node : leftSearch; } } }

**Balancing the tree**

We’re lucky. We know the number of nodes at pre-computation time so we can just go ahead and build a balanced binary tree right off the bat. We start with the sorted partial sum array and aim to build a balanced tree. We do this by taking the midpoint of the array as the root, recursively building a tree out of the left sub-array and attaching it as the left child of the root and doing the same for the right sub-tree:

/* psums is the sorted array of partial sums we wish to convert into a tree minInd and maxInd define the portion of the array we are working on curSum is our current delta for the right subtree */ private Node BuildTree(int[] psums, int minInd, int maxInd, int curSum) { if (minInd > maxInd) return null; Node ret = new Node(); int mid = (minInd + maxInd) / 2; ret.index = mid; ret.delta = psums[mid] - curSum; ret.left = BuildTree(psums, minInd, mid - 1, curSum); ret.right = BuildTree(psums, mid + 1, maxInd, psums[mid]); return ret; }

**Optimizations**

Instead of explicitly storing the nodes, we can make the implementation faster by storing the tree in a flat array using the array representation of a binary tree. In the array representation of a binary tree, the root is at index 0 and the left and right children of a node at index *i* are at *2 i + 1 and 2i + 2*, respectively. For the array to not contain any null entries, it is necessary for our binary tree to be both balanced and complete, that is, all the nodes are filled in from left to right at each level, and we go to the next level only after the current level is full.

Building a complete tree out of a sorted array is a bit more difficult, since taking the midpoint no longer suffices. That’s because, in a complete tree, nodes are filled in from left to right, so generally the left sub-tree will have more nodes than the right sub-tree. In turn, the index of the root will tend to be larger than the midpoint of the array. To compute the index of the root, we first have to compute the depth of the tree and the number of leaves in the last level.

Here’s an example: Say we want to compute the index of the root in a tree with 40 nodes.

- The largest power of two less than 40 is 32 = 2
^{5}. Thus a complete tree with 40 nodes will be a complete tree with five levels plus a partially filled sixth level. - There are 2
^{5}– 1 = 31 nodes in the first five levels. One of them is the root which leaves 30 non-root nodes. 15 of these nodes will be in the left sub-tree. - The sixth level can hold up to 2
^{5}= 32 nodes. The first 16 of these will be in the left sub-tree. 40 – 31 = 9, so there will be 9 nodes in this level. Since 9 < 16, all 9 are in the left sub-tree. Thus, the total number of nodes in the left sub-tree is 15 + 9 = 24. Therefore, the root is the 25th node and is at index 24.

The revised version of the above method to build a complete binary tree is this:

private Node BuildTree(int[] psums, int minInd, int maxInd, int curSum) { if (minInd > maxInd) return null; int numNodes = maxInd - minInd + 1; int pow2 = 1; while (true) { if (pow2 * 2 - 1 > numNodes) { break; } pow2 *= 2; } int maxLeavesBefore = pow2 / 2; pow2--; int numBefore = (pow2 - 1) / 2; numBefore += Math.Min(numNodes - pow2, maxLeavesBefore); int mid = numBefore + minInd; Node ret = new Node(); ret.index = mid; ret.delta = psums[mid] - curSum; ret.left = BuildTree(psums, minInd, mid - 1, curSum); ret.right = BuildTree(psums, mid + 1, maxInd, psums[mid]); return ret; }

**Possible improvements**

The algorithm is fast enough for *our* application at Hulu, but I’m sure it could still be improved. The most obvious area of improvement that I see is in the construction of the tree. Right now, we build a complete binary tree fairly haphazardly. Since a search can be terminated as soon as it finds the chosen node (without necessarily having reached a leaf), we can improve the average running time of the algorithm by placing nodes with larger counts closer to the root so that we don’t have to go as deep on average. This won’t improve the worst case time bound of this algorithm from *O(log n)* but it will make it run faster in practice.

Here we were able to use an algorithm to significantly improve the accuracy of our estimates for a problem that was not well handled by previous approaches. In this blog post, I presented an example of a problem that had an “algorithmic core” underlying it. Such problems appear quite often at Hulu, and being able to apply interesting algorithms to solve them and make an impact on the business is one of the best parts of working at Hulu.

*When Tony Zhang’s not mining vespene gas, he’s busy cracking Russian math competition problems with his bare hands. He also happens to work as a developer on inventory forecasting for our advertising team.*