• Hulu
  • TV
  • Movies
  • More TV. On more devices.
Hulu Tech Blog

BeaconSpec: A domain-specific language for generating MapReduce jobs

April 10th, 2014 by Prasan Samtani

Perhaps of all the creations of man, language is the most astonishing

- Giles Latton Strachey, Words and Poetry


Hulu viewers generate a tremendous amount of data: our users watch over 400 million videos and 2  billion advertisements a month. Processing and analyzing that data is critical to our business, whether it is for deciding what content to invest in in the future, or to convince external partners on the superiority of Hulu as an advertising platform. In order to ingest, process and understand that data, we rely on a framework called Hadoop (http://hadoop.apache.org/) – an open source project that allows for the storage and distributed processing of large data-sets.

The two major components of Hadoop are a distributed file system (HDFS) and MapReduce – a framework for distributed computing over the cluster. In order to understand the rest of this blogpost, it is important to have a basic understanding of how MapReduce works.

When authoring a MapReduce job, programmers specify two functions – a map function that processes each chunk of data to produce a set of key-value pairs, and a reduce function the merges all values associated with the same key (1). A surprising number of conventional algorithms can be expressed in this manner and thereby be converted into programs that can run in parallel. This is best understood with an example.

Let us imagine that we are trying to compute the total number of minutes watched for a given set of videos. Each video has a unique identifier, and what we would like at the end of this computation to have a table in the following form:

Video identifier
Minutes watched
1 25
2 34
3 126
4 5

In a distributed file system such as HDFS, the raw data will be stored on several separate computers, or nodes. A naive way to compute this data would be for the program to serially request the raw data from each of the nodes, and perform the computation. Such an approach works fine for small data-sets such as this example, however we want an approach that scales to very large data-sets. For example, for any typical hour, we have about 50 GB of raw data on our Hulu cluster. Clearly, sequentially processing all this data would take a very, very long time and consume a lot of network resources. Using MapReduce, the mapper process runs on each node in parallel, generating a key-value pair consisting of the video identifier (the key) and the minutes watched (the value). Illustration 1 shows what the output of the mapping phase would look like if the data was distributed across three nodes.




From here, we go into the reduce phase, where all outputs with the same key are guaranteed to be sent to the same reducer. The reducer then computes some function over the intermediate output to produce a final output. In this case the function is summation, as all we want is the total number of minutes per video. Illustration 2 shows what this would look like if this processing ran on two reducers.




Event data from our various Hulu players is encoded in the form of beacons. A beacon is just a URL-encoded description of the event, as shown in the example below:

80    2013-04-01    00:00:00    /v3/playback/start?bitrate=650&cdn=Akamai&channel=Anime&client=Explorer&computerguid=EA8FA1000232B8F6986C3E0BE55E9333&contentid=5003673

The Hulu players on all our devices are sending a constant stream of beacons to our servers, those beacons are subsequently stored onto HDFS where they can be processed by our MapReduce jobs.


So far, we’ve seen how we can transform a conventional single-threaded computation into a MapReduce computation by specifying a mapper and reducer function. We’ve also seen how we encode events as beacons, and collect them onto our distributed file system, HDFS. However, in our experience, writing MapReduce jobs by hand is both tedious and error prone, although like most skills, you get better at it with practice. Additionally, the resultant code contains a significant amount of boilerplate, making the logic hard to see at first glance. The latter is in practice the most significant impediment to overall system understandability and debuggability. Since we run many (on the order of 150-175) different types of MapReduce jobs every hour, we wanted a solution that would allow us to encode the logic in a straightforward way that was easier to maintain than hand-written Java code.

We started by looking at our code and realizing that the majority of our MapReduce jobs perform very similar functions – selecting a set of dimensions that we care about (for example the video identifier, the zipcode, etc), performing some lookups against meta-data tables in our key-value store (for example deriving the zip code dimension from the IP address of the request) and aggregating over a corresponding fact (for example, the total minutes watched). We realized that we could embed the basic knowledge of how to do these things in a language, and then simply use programs written in this language to generate our MapReduce code for us. Since internally we refer to the events we receive from our players as beacons, and the purpose of this language was to process raw beacon data, we called the language BeaconSpec.

An example of BeaconSpec code is shown below:

basefact playback_start from playback/start {
    dimension harpyhour.id as hourid;
    required dimension video.id as video_id;
    required dimension contentPartner.id as content_partner_id;
    required dimension distributionPartner.id as distribution_partner_id;
    required dimension distributionPlatform.id as distro_platform_id;
    dimension distributionPlatform.isonhulu as is_on_hulu;
    dimension package.id as package_id;
    dimension package.isplusbypackage as is_plus_by_package;
    dimension plan.id as plan_id;
    dimension plan.isplusbyplan as is_plus_by_plan;
    dimension plusCategory.pluslowercategoryid as plus_lower_category_id;
    dimension plusCategory.plusuppercategoryid as plus_higher_category_id;
    dimension client.out as client;
    fact sum(count.count) as total_count;
    dimension packageAvailability.chosen as package_availability;
    dimension siteSessionId.chosen as site_session_id;
    dimension facebook.isfacebookconnected as is_facebook_connected;

There are a few key points to note – first, there is no specification of how to compute the final result, only a declaration of what we would like to compute. It is the role of the BeaconSpec compiler to take this declarative specification and convert it into imperative MapReduce code that can run on the cluster. Second, there are a lot of special keywords that have meaning for us – the keyword basefact denotes a metric that we want to measure, the keyword from tells us which source beacons we need to process in order to compute the metric, the keyword dimension denotes a particular dimension of the source beacon that we care about and that should form a part of the intermediate key out of the Mapper phase, and the keyword fact denotes a dimension of the source beacon that we want to aggregate over (in this particular example, we are performing a summation, as the fact keyword is immediately followed by the sum specifier – we could just as easily calculate an average [avg] or take the maximum value [max]).

From this specification, our in-house compiler produces runnable Java MapReduce code, a snippet of which is shown below:

Screen Shot 2014-04-10 at 12.28.34 PM

We use a variety of open-source technologies in order to build our compiler – in particular JFlex for lexical analysis & CUP for parser-generation. These are the Java cousins of the old C programs you probably used if you’ve ever taken a compilers class – lex and yacc.

A major advantage of a formal declarative specification of our process is that it allows us to extend functionality far beyond what we initially planned. For example, we are currently in the process of building out a program whose purpose is to validate beacons sent by implementations of our Hulu player on a range of devices. For this purpose, we can use BeaconSpec as an input to the validation program, which will subsequently examine incoming beacons and compare them to the specification, and send us reports about whether the incoming beacons match or deviate from the specification. As another example, as we move towards real-time processing of our incoming data, we are examining the possibility of creating a secondary code-generator for the BeaconSpec compiler that will output code to run on Apache Storm instead of MapReduce.

Related work

Apache Pig is a project that has similar goals to BeaconSpec. Pig programmers write their scripts in a language called Pig Latin, which is subsequently compiled into MapReduce code. However, unlike BeaconSpec, Pig is both an imperative and general purpose language. We feel that for this particular use case, the advantages conferred by a declarative domain-specific language are too great to consider abandoning them for a general purpose language. An imperative general-purpose language cannot avoid introducing boilerplate and insignificant details cause the final program to be significantly less clear than what we could achieve with a declarative domain-specific language.

Summingbird is a project at Twitter which can generate code targeting MapReduce, Scalding or Storm. It offers a powerful set of abstractions for performing aggregations. An example of the canonical word-count program in Summingbird is shown below:

def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
   source.flatMap { sentence =>
     toWords(sentence).map(_ -> 1L)
   }.foreach { case (k, v) => store.update(k, store.get(k) + v) }

For an example of the equivalent code written directly in Java MapReduce, see this link. Summingbird was written to solve similar problems to the ones that led us to create BeaconSpec, and we believe it does so in a way that is significantly more expressive than Pig. However, as it is written in a highly idiomatic style, and learning to write Summingbird programs has a steeper learning curve than BeaconSpec.

A few other languages have emerged around the Hadoop ecosystem, such as Kiji (which offers a table abstraction and several insertion/retrieval operators over HBase), and Hive (HiveQL) – which offers a subset of relational database operators that are compiled to MapReduce. We have not fully explored Kiji, however, we make heavy use of Hive at Hulu.

Sawzall (developed at Google) can arguably be acknowledged as the progenitor of all these languages, it is another general-purpose data processing language designed to compile to MapReduce code on Google’s proprietary distributed data processing platform. A link to a paper on it can be found here.


The key takeaway is that we don’t want a general purpose language, we want a language that expresses exactly what we care about, and suppresses details that are not central to the task (2). Whenever you are working on a DSL, adding general purpose features to the language is a serious temptation, but one that must be avoided if you don’t want your project timeline to rival that of Duke Nukem Forever. This sentiment is best captured by the pioneer computer scientist Alan Perlis in the following quote:

Beware of the Turing tar-pit, in which everything is possible, but nothing of interest is ever easy.

-Alan Perlis, Epigrams in Programming



  1. MapReduce: Simplified data processing on large clusters (Jeffrey Dean and Sanjay Ghemawat, Google Inc., 2004) http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf
  2. Structure and interpretation of computer programs (Hal Abelson and Gerald Jay Sussman, MIT, 1984) http://mitpress.mit.edu/sicp/
  3. Epigrams in Programming (Alan Perlis) http://www.cs.yale.edu/homes/perlis-alan/quotes.html
  4. Interpreting the Data: Parallel Analysis with Sawzall (Pike et al, Google Inc.) http://research.google.com/archive/sawzall.html

Categorizing Customer Support Contacts with Machine Learning

December 17th, 2013 by Chris Liu


One of the cool things about being an intern at Hulu is that we are given the freedom to creatively apply our computer science knowledge to everyday problems. Our customer support process generated such a problem: we’ve got a stream of incoming customer support emails, and would like to automatically assign textual tags to them based on their content. This is a machine` learning problem of text classification, and one that we tackle below.

A binary text classification problem is to learn a rule that classifies a concept given a training set. The training set consists of pairs (x_i, y_i) , x_i being a text from some instance space X , and y_i \in (0,1) being a binary label. We assume there exists a true decision function f: X \rightarrow (0,1) where f(x_i) = y_i . Our goal is to find a hypothesis h: X \rightarrow (0,1) that best approximates f .

For example, if we take our instance space to be the set of all sentences in the English language, and y_i = 1 to mean the sentence has positive sentiment, then our problem is to find a classifier that classifies the sentiment of English sentences. This example already illustrates problems we will face in the future — how do we reconcile the fact that the set of all sentences is potentially infinite? How do we represent our sentences in a way that suits learning best? And, most importantly, how do we best learn the true hypothesis h from just our training examples?

Hulu gets thousands of customer support emails daily. Each email is represented by a ticket and associated with a set of concise descriptors called tags. With a well-performing tag classifier, the benefits are tangible — we can save our Customer Support Advocates (CSAs) the need to manually tag tickets, and we can route tickets to specific groups of CSAs who specialize in a particular subject area. In general, we would be able to streamline the support experience.

Our problem comes with a unique set of challenges. We would like our service to run on Donki, Hulu’s internal service hosting platform. Since we’d like to limit memory usage, we’d like to keep our approach bounded to a somewhat conservative 512mb. The size of our training set may be very large, and potentially not bounded. With these challenges in mind, we will describe the approach taken, what worked, what didn’t work, and the end result.


Given our wish for the service to run on Donki, Python seems like the natural choice. The NumPy/SciPy/Scikit-Learn stack is an accessible, open source, and comprehensive set of libraries that suffices for our purposes.

As with any Machine Learning problem, feature extraction, feature selection and model selection is crucial. Feature extraction decides which tickets are suitable for training, and extracts the useful parts of suitable tickets. This is the process of canonicalizing a ticket. Feature selection turns the canonical ticket extracted above into a training set. Model selection turns the training set into a classifier.

Feature Extraction

We would like our canonicalized ticket to be faithful to the eventual prediction context. This means, for example, that the CSA’s response to the ticket should not be included. Fortunately, all changes to a ticket are recorded in chronological order, so we are able to extract only the text and tags relating to the customer’s first contact.

Feature extraction also relies upon heuristics and domain-specific knowledge of the problem. Some additional tickets fields are also of value. We take the operating system field, the subject field, and whether the ticket was created by a Hulu Plus subscriber. For example, tickets from Hulu Plus subscribers are more likely to be about billing. To incorporate these additional fields, we append them to the existing text. For example, if a user is a Hulu Plus subscriber, we append the word “IsPlusSub”.

Sometimes, manually viewing the tickets and the extracted result is the only way of gaining insight. One particular issue was with users replying to emails would bring in a large amount of highly suggestive words. For example, an automated email from Hulu contained a link allowing users to “unsubscribe” to these emails, which was highly suggestive for certain tags. A heuristic was utilized to filter out all parts of the reply parts of the text. Another amusing example was that certain Hulu emails contained the words “Vizio”, “Samsung”, “Sony”, and other TV platforms on which Hulu Plus is available. An pre-existing rule would activate at these keywords, giving these tickets a large number of irrelevant tags.

Feature Selection

As we hinted before, our instance space X where we draw tickets from may be infinite. One common approach to this problem is the “bag of words” representation. We simplify our instance space from the set of all ticket text to the set of all “suitable” words, which is always finite. The “suitable” criteria is another heuristic, but intuitively, we want to exclude common but meaningless words. For example, the word “I” or “we”, or, in our case, “Hulu”, is meaningless with regards to the tags of a message. Excluding these words assists our model in finding meaningful relationships.

Our revised instance space is often called the vocabulary of our examples. Another optimization we can make is to expand our vocabulary using the technique of “n-grams”, which is a contiguous sequence of words of length n. n-grams can potentially capture relationships between words. For example, a ticket with words “not cancel” and another with “cancel not” would be equivalent in a vocabulary consisting solely of words. However, a 2-gram will capture this difference, since for the first ticket, the vocabulary will contain (not, cancel, not \;cancel) , while the second will contain (cancel, not, cancel\;not) .

In order to use the bag of words representation, we need words. This involves pre-processing and tokenizing the canonicalized tickets. We settled on the n-gram range of (1,2) through experimental testing. This process results in very high dimensional data, since each word and each 2-sequnce of word is a dimension. We attempted dimensionality reduction techniques, to combat the so called “curse of dimensionality”. Dimensionality reduction techniques such as Principal Component Analysis and Latent Semantic Analysis attempt to find linear combinations of data that best correlate. Exploration on this front produced some graphics.



The semantic meaning of an “account_billing” tag and an “account_login_activation” tag is quite similar. Both have to do with a user’s account problems. Hence, when we projected the example down to its two most “information preserving” dimensions, we do not see clear separation. However, “account_billing” and “technology_playback” tags have quite distinct meanings from each other, and we see clear separation after dimensional reduction. Just to confirm our intuition on this, we plot “android” vs “ios”, two tags with distinct meanings, and get the following:


It’s always nice when mathematics are in line with our intuition. While these techniques were not ultimately used due to scalability problems, the ability to visualize and confirm our suspicious regarding separability of tags was highly valuable.

For each document, we want to create a corresponding bag of words representation. The simplest approach is to scan over the set of all documents (also called the corpus), and build a dictionary consisting of the entire vocabulary. If the length of this dictionary is n , then a vector v of length n models a document, where the ith index represents an element of the vocabulary, and the value at the ith index is its frequency in the document. One common technique on top of frequency count is the “inverse document frequency” re-weighing, which intuitively says frequent words in the corpus are inherently less meaningful.

The above described technique is often referred to as term frequency-inverse document frequency vectorizing. The technique requires the entire vocabulary in memory, rendering it stateful. So we explored alternatives for scaling purposes. We settled upon using what is called the “hashing trick”. Very simply, instead of building a vocabulary by scanning the corpus once ahead of time, we explicitly pre-define the output space to be a high dimensional vector space. Then for each word and n gram, we hash and increment that index of the output vector. Because collisions are unlikely in high dimensional space, the hashed vector is a faithful representation of the original text. The one thing we gain in this case is statelessness, so we can use this technique on batches of documents. A brief survey of papers on this subject (here, and here) shows the hashing trick has both theoretical and empirical support.

Model selection

In order to select a model, we need a way to calculate performance and a dataset to calculate performance on. Being a binary classification problem, performance is defined as counts of true positives, true negatives, false positives, and false negatives. How we view these numbers is interesting and worth discussing. In our case, two relevant metrics are “precision” and “recall”. Precision measures the ratio \frac{ \text{true\ positive} } {\text{true\ positive} + \text{false\ positive}} whereas recall measures the ratio \frac{\text{true positive}}{\text{true positive}+ \text{false negative}} . Achieving a high precision means that when we do predict the positive label, we are mostly correct. High recall means we are correctly predicting most of the positive labeled example positive. High precision usually comes at the cost of low recall, and vice versa. For example, if a model does not predict positive on any example except the very few that it’s very sure of, it achieves high precision, but low recall. If a model predict positive on every label, it achieves high recall but low precision.

Because our tool will be used as a pre-processing step in the handling of a ticket, we decided that high precision is more valuable than high recall. We would rather not have the classifier make a positive prediction, unless it is reasonably sure that this prediction is correct.

To get a dataset for testing, we employ cross validation and use a test set. In cross validation, we split the training set into n folds, and for 1 \leq i \leq n , we train on all folds other than the ith fold, and do scoring on the ith fold. We make a test set by holding back a ratio of the training set from training, and to calculate performance on that set after the model is trained. Both methods ensure the set we are scoring on does not participate in the training of the model.

The overall metric we would like to optimize is the F_1 score, which is the harmonic mean of precision and recall. It can be calculated as 2(\frac{\text{precision} + \text{recall}}{\text{precision}\ \cdot\ \text{recall}}) . We survey a broad variety of algorithms for classification, including: Naive Bayes, K-Nearest Neighbor, Support Vector Machine, Stochastic Gradient Decent, Decision Tree, and Random Forests, and AdaBoost. Unfortunately, the Decision Tree, Random Forests, and AdaBoost classifiers cannot handle sparse data, so they were immediately disqualified.

The scores, training, and test times of our different classifier is listed below for a collection of 15,000 training and 6,500 test documents.

Classifier F1 Precision Recall Training time Prediction time
Linear SVM 0.590 0.703 0.509 68.5 s 0.43s
Stochastic Gradient Decent 0.587 0.689 0.512 33.2s 0.37s
Logistic Regression 0.556 0.718 0.454 31.8s 0.43s
Bernoulli Naïve Bayes 0.438 0.353 0.577 2.20s 1.76s
Multinomial Naïve Bayes 0.450 0.361 0.597 33.0s 0.40s
K-nearest neighbors 0.477 0.429 0.538 n/a 446.23s

In the end, the Stochastic Gradient Decent classifier was chosen. It has the desirable combination of speed and performance, and can be tweaked to optimize for a variety of objective functions. Furthermore, it is an online algorithm, meaning the training process can be split up in batches, alleviating memory concerns.

Odds and Ends

There are many tags that are simply unpredictable. Text classification relies upon the label having a clear and distinguishable meaning. For example, we found tags that indicate an action such as giving a customer a credit performed very poorly, since it is hard to gauge from the contents of the text whether a credit will be given. As such, we’ve implemented an automated pruning system into our classifier. We score each tag’s classifier at the end of training on the test set, and if that tag’s performance does not satisfy some minimum, we do not ever predict true for that tag.

Secondly, we may want to partition tags into classes, where at most one tag can be predicted per class. For example, we may wish to avoid the many device tags problem described above and enforce a maximum of 1 device tag per ticket. To solve this problem, we appeal to geometry. Any linear classifier is a hyperplane in the feature space, and can be represented as a weight vector, w . Predictions for an incoming vector x is made via sign(w \cdot x) , since any positive number means this example lies on the positive label side of the hyperplane. However, if we have a tag class C , and a set of classifiers (w_i\;:\; i \in C) , then we can output a prediction \arg \max_i (w_i \cdot x) , provided \max_i (w_i \cdot x) > 0 . This is commonly referred to as the one-vs-all method of extending binary classifiers to predict on a tag class C .


Applying machine learning techniques to real world data is tricky, exciting, and rewarding. Here, we’ve outlined some of the decisions we’ve made, approaches that didn’t work, ones that did, and how the steps of feature extraction, feature selection, and model selection were performed. In the future, we may expand into classifying phone transcriptions, and personalize the classifier for individual CSAs, to further optimize the automation within our customer support platform.

Chris Liu is a software developer intern on our customer platform team.

Hulu iPad redesign: lessons and code

December 11th, 2013 by Bao Lei

Earlier this year we launched a brand new revision of our iPad application. With lots of new features like our show and video discovery panel, mini player, etc, we had to think very hard about speed and stability of both the software we wrote as well as of the development process. With a few months behind us now, we wanted to introduce some of the interesting technical tidbits from our product.

Building the UI

At Hulu, we choose not to use the interface builder for our apps. Traditionally, we’d coded the positions and resizing logic for each view. While this is frequently a lot of labor, we felt it still left us with better control over the results than using a mouse to drag buttons. Using this approach you can keep your views under control when the view is simple, or you’re adding a few small features. With a big redesign, which included many different new pages and components, aligning everything manually with code becomes a lot less fun — not to mention challenging as the design is often tweaked after observing actual on-device user interaction.

There has a be a better approach, and the most interesting one for us was autolayout — which Apple introduced along with iOS 6. While awesome, we ran into two problems: (1) it works for iOS 6 and above, while we still support iOS 5; (2) it’s designed primarily for interface builder users, so it is little harder to use programmatically.

Then we looked at one small interesting part of the autolayout: the ASCII-art-like Visual Formatting Language (VFL). That eventually became our solution: we would use VFL, but not autolayout. Our solution is now released as an open source project called vfl2objc.

How does it work? We built a system to generate Objective-C code based on a VFL-defined layout. This allowed us to retain iOS 5 compatibility while keeping the layout instructions easy to read and easy to modify. The approach is performant because most of computations are done at compile time (except the magic behind the autoresizing mask, but that is relatively simple). It is also very flexible: you can call any sizing or positioning APIs before or after the VFL-generated code block — unlike autolayout, which ignores the frame you manually set, our VFL code generation tool won’t.

For example, you can go the loadView method of a view controller class, and add the following:

// VFL begin
// VFL end
Then, run our vfl2objc script. This will generate the following:

    // You need to predefine superview before this.
    CGRect frame;
    frame = label.frame;
    frame.origin.x = 0 + 30;
    frame.size.width = superview.bounds.size.width - (0 + 30) - frame.origin.x;
    frame.origin.y = 0 + 10;
    frame.size.height = 20;
    label.frame = frame;
    label.autoresizingMask |= UIViewAutoresizingFlexibleWidth;
    label.autoresizingMask |= UIViewAutoresizingFlexibleBottomMargin;
    [superview addSubview:label];
    frame = button.frame;
    frame.origin.x = 0 + 50;
    frame.size.width = 100;
    frame.origin.y = 0 + 10 + 20 + 15;
    frame.size.height = 50;
    button.frame = frame;
    button.autoresizingMask |= UIViewAutoresizingFlexibleRightMargin;
    button.autoresizingMask |= UIViewAutoresizingFlexibleBottomMargin;
    [superview addSubview:button];
And triggering the script, native UI alignment code will emerge between VFL begin and VFL end. They are inside curly braces so that you can fold them, and look at just the VFL part. If your design changes you can just edit the VFL definition and run the script again. Of course you can also automatically trigger the generation process with an Xcode pre-build script.

To learn more about vfl2objc, please visit http://github.com/hulu/vfl2objc

Defining Events

While NSNotification is something very handy, it is sometimes bug prone because of the nature of weak typing. For example, when you post a notification, the compiler cannot check if the notification name actually matches the name expected by the observers. Also any information passed along the notification is either a weakly typed object, or a dictionary. In order to reduce the incidence of bugs you need document the notification clearly, and always keep this documentation up to date even as your application changes. Decoupling documentation from implementation often leads to people updating one but not the other, thus becoming the source of problems.

To deal with this issue, we introduced a replacement for NSNotification — the HUTypedEvents. The basic idea is to use a protocol, plus a method signature, to represent an event. This allows the compiler to check both the event name and parameters.

For example, if we have an event PlaybackStart with integer parameters videoID and showID. If we were to do it with NSNotification, the code to define it will be:

const NSString *PlaybackStartNotification = @"PlaybackStart";
const NSString *PlaybackStartVideoIDKey = @"PlaybackStartVideoID";
const NSString *PlaybackStartShowIDKey = @"PlaybackStartShowID";
In order to post the event, somewhere inside the playback component we need to call:
[NSNotificationCenter defaultCenter] 
   postNotificationName: PlaybackStartNotification
   object: nil
   userInfo: @{PlaybackStartVideoIDKey: [NSNumber numberWithFormat:videoID], 
      PlaybackStartShowIDKey: [NSNumber numberWithFormat:showID]}];
Anyone who needs to handle this event will need to do:
// during init
[[NSNotificationCenter defaultCenter] 
   registerObservorForNotificationName: PlaybackStartNotification 
   selector:@selector(handlePlaybackStart:) object:nil];
// implement handler
- (void)handlePlaybackStart:(NSNotification *)notification {
int videoID = [[notification.userInfo 
    objectForKey:PlaybackStartVideoIDKey] intValue];
// some logic based on video ID
With HUTypedEvents, this is both simpler and safer. Create a singleton class, for example CentralEventHandler. Declare the event at the beginning of CentralEventHandler.h outside the class interface:
HUDeclareEvent(PlaybackStart, videoID:(int)videoID showID:(int)showID)
In the class interface, add:
HUDeclareEventRegistration(PlaybackStart, videoID:(int)videoID showID:(int)showID)
In the implementation for the class, simply add:
HUImplementEvent(PlaybackStart, videoID:(int)videoID showID:(int)showID)
Then, to trigger this event, call:
[[CenterEventHandler instance] 
And to handle this event, let the class conform to protocol EventPlaybackStart, and then do:
// during init
[[CentralEventHandler instance] registerEventPlaybackStart__observer:self];
// implement handler
- (void)handleEventPlaybackStart__videoID:(int)videoID showID:(int)showID {
// logic based on videoID and showID
This approach provides the following advantages:

  • When you trigger the event, you don’t have to look up or remember the event name, parameter names, or paramter types.
  • You don’t have to convert primitive types into objects.
  • Whenever you are triggering or handling your event, Xcode will provide autocomplete on the method and parameter names.
  • Whenever you register to receive an event, the compiler will check whether the class conforms to the right protocol, and will ensure that you implemented the right handler method with the right parameters and types.
  • If you want to find every use of this event — both where it’s fired, and where it’s handled — you can peform a global code search on “handleEventPlaybackStart”. The search results are much cleaner than when you attempt to do the same with NSNotification and search for PlaybackStartNotification.

To learn more about HUTypedEvents, please checkout http://github.com/hulu/HUTypedEvents

Text Rendering

We’ve upgraded GSFancyText, our open-source rich text rendering framework. We moved all the markup text parsing, line breaking, and alignments to a background thread, and only do the final drawing on main thread. Since our new application has a lot of scroll views that contain lots of images and rich text labels, this new approach is much more performant.

The library’s usage is similar to the older version, except that you need to explicitly call updateDisplay after a GSFancyTextView object is constructed. In case you need to align some other UI based on the size of a GSFancyTextView obtained from the asynchronous calculation, you can use a new method called updateDisplayWithCompletionHandler.

The vfl2objc system, HUTypedEvents, and the updated GSFancyText are just a few examples of integrating little, fun technical improvements into our big iPad app redesign project. During the process of building the Hulu iOS application, we are constantly looking for better ways to make it easier to build a nice looking, maintainable, and performant application. We’re happy to share some of our learnings and code with the wider development community.

Bao Lei is a senior software developer on the mobile team who works on our iOS platform.

Last comment: Apr 23rd 2014 1 Comment

RestfulGit: an open source web service for accessing git data

September 9th, 2013 by Rajiv Makhijani

We love Git at Hulu — it’s central to almost all software development here. RestfulGit was built to make it easier to build tools and processes that leverage data from our internal Git repositories. It provides a read-only restful web API for accessing low-level Git data. For compatibility, and to make it easier to build tools that can access both open-source hosted Git repos and internally hosted repos, the API was modeled off of the GitHub Git DB API.

The API provides the following endpoints:


Retrieves a list of commit objects:

GET /repos/:repo_key/git/commits

optional: ?start_sha=:sha
optional: ?ref_name=:ref_name
optional: ?limit=:limit (default=50)

Retrieves specific commit object:

GET /repos/:repo_key/git/commits/:sha


Retrieves a specific blob object:

GET /repos/:repo_key/git/blobs/:sha


Retrieves a specific tree object:

GET /repos/:repo_key/git/trees/:sha


Retrieves a list of refs:

GET /repos/:repo_key/git/refs

Retrieves a specific ref:

GET /repos/:repo_key/git/refs/:ref_name

Raw Files

Returns the raw file data for the file on the specified branch:

GET /repos/:repo_key/raw/:branch_name/:file_path

For more information on the currently supported features, see the read me. Of course, this project would not be possible without the many open-source projects it is built upon: Flaskpygit2libgit2, and Python.

Find the source code on GitHub. Contributions welcome!

Last comment: Sep 15th 2013 1 Comment

Tips and Highlights from Developing Mobile apps for Windows

July 15th, 2013 by Zachary Pinter

For the past year or so, our mobile team has been hard at work on creating native apps for the Microsoft ecosystem.

Now that we’ve launched both Hulu Plus for Windows Phone 8 and Hulu Plus for Windows 8, I’d like to take some time reflect on their development by showing off some of the platform highlights.


The first and most obvious highlight is the async and await keywords added to C# 5. This pattern has been enormously helpful in cutting down code complexity and keeping our UI fast and responsive.

Typically, when writing code that makes use of asynchronous libraries (such as node.js), you end up structuring your project’s code around the handling of callbacks. With async/await, you get to write code that takes full advantage of asynchronous, non-blocking, off-the-ui-thread functions, while organizing your code much the same way as you might have done using synchronous calls.

There’s a lot of depth and detail about how this is done under the hood (see here), but I think it’s best to start with few quick examples:

public async Task<string> Nonce()
   var res = await ExecuteSiteRequest(new NonceRequest());
   return res.Value;

public async Task<SiteUser> Login(string username, string password)
    var req = new AuthenticateRequest(username, password, await Nonce());
    var res = await ExecuteSiteRequest(req);
    var user = new SiteUser(res);

    return user;

In the above example, the actual delay involved in waiting for the HTTP calls triggered by ExecuteSiteRequest happen outside the UI thread. However, the rest of the logic runs on the UI thread. The intelligence of the async/await keywords comes from the compiler knowing how to rewrite the methods and choosing when to run the various fragments of code that surround the calls to ExecuteSiteRequest.

Notice how the await keyword is placed inside the constructor of the AuthenticateRequest object. This means that before the first AuthenticateRequest object ever gets constructed, the framework will first wait for the successful execution of the async Nonce() method. After the Nonce() method’s call to ExecuteSiteRequest returns successfully, the code in the Login method resumes where it left off (now on the UI thread) and is able to construct the AuthenticateRequest.

If any of the calls to ExecuteSiteRequest were to fail (e.g. a server-side error), an exeception would bubble up on the UI thread just as if this were a blocking method (even if the original source of the exception came from a worker thread executing the HTTP request). We added a few keywords, but the organization of the code remains simple and straightforward.

Keep in mind that the value provided by these keywords is not limited to IO calls. The metaphor extends to anywhere you might want to break away from the current thread. For example, a common issue that frequently comes up in UI development is stuttering/freezing and how to prevent it. The obvious first step is to move all your service calls off the UI thread, though you can still end up with a stuttering app if you take up too much CPU time on the UI thread. For example, you might execute your HTTP call off the UI thread, but then process the result on the UI thread. In C# 5.0, you can pass a lambda to Task.Run (which then executes that lambda on a worker thread pool) and await the result.

Here’s an example of how we use Task.Run to process JSON:

// Example:
//  var user = await JsonToObject<User>(userJson);
public async Task<T> JsonToObject<T>(string json)
    var res = await Task.Run<T>(() => {
        return JsonConvert.DeserializeObject<T>(json,
            new JsonSerializerSettings
                MissingMemberHandling = 
    return res;

The same sort of pattern can be applied to any snippet of code or function call that you might expect to be CPU intensive. Assuming the snippet doesn’t try to manipulate shared state, just add in the async/await keywords and wrap the snippet in a call to Task.Run.

Static Extension Methods

Extension methods have been in C# for quite some time (since 3.0), but I’ve found several handy use cases for them during the development of our Windows 8 and Windows Phone 8 applications.

One of them I’d like to highlight is type conversion helper methods. Like many mobile apps, we work with and parse data from a variety of different backend services and libraries. A lot of times, this means dealing converting to and from typed and primitive values (e.g. string, double, bool, int, object, Double). So, we created extension methods to make the process easier:

public static int AsInt(this object me, int defaultValue = default(int))
    if (me == null) return defaultValue;
    if (me is string)
        int result;
        if (int.TryParse(me as string, out result))
            return result;

    if (IsNum(me))
        return Convert.ToInt32(me);

    return defaultValue;

public static bool AsBool(this object me, bool defaultValue = default(bool))
    if (me == null) return defaultValue;

    if (me is string)
        var meStr = (me as string);

        if (("true".Equals(meStr, StringComparison.OrdinalIgnoreCase)) ||
            ("1".Equals(meStr, StringComparison.OrdinalIgnoreCase))
            return true;
        if (
            ("false".Equals(meStr, StringComparison.OrdinalIgnoreCase)) ||
            ("0".Equals(meStr, StringComparison.OrdinalIgnoreCase))
            return false;

    if (me is bool)
        return (bool)me;
    return defaultValue;

Granted, in many cases you can just type cast and be done with it (e.g. var x = (int)foo). However, doing so naively can run into issues later on if the server-side data changes format (e.g. values that are largely whole numbers, but can have decimal values under some scenarios).

Some areas where these extension methods are helpful:

  • Saved settings from IsolatedStorageSettings

  • Parsing page paremeters in NavigationContext.QueryString

  • Json objects represented as IDictionary

Another benefit of using static extensions is that they can be chained and still behave correctly when null values are returned earlier in the chain.

For example:

protected virtual void LoadState(System.Windows.Navigation.NavigationEventArgs e, 
                                 IDictionary<String, Object> pageState)
    // omitted code...

    bool resetHistory = 

    // omitted code...

    if (resetHistory)
        while (NavigationService.RemoveBackEntry() != null)
            ; // do nothing

In the above example, ValueOrDefault can return null (the default value for a string) if the “reset_history” string isn’t present in the QueryString dictionary. However, that’s no problem for our AsBool() helper, since it returns the default boolean value (false) if called on a null object.

The ability to consolidate all this type conversion logic into easy-to-call, composable extension methods makes our code both cleaner and safer.

For the full set of type conversion helpers we use, see ObjectHelper.cs.

XAML Data Binding and MVVM

A fairly common idea in UI development is that your views should be decoupled from your models in a way that allows for, at least in theory, alternative views to share the same model.

One of the best ways to achieve this on Microsoft platforms is through XAML and the MVVM pattern. XAML (eXtensible Application Markup Language) is an XML format for specifying UI layouts and templates. MVVM (Model-View-ViewModel) is a design pattern in the same genre as MVC with a focus on data binding.

Let’s see how this works in practice:

<Grid x:Name="detailsContainer" Grid.Row="1" Background="White" 
  VerticalAlignment="Top" Grid.ColumnSpan="2">
    <Grid Margin="15">
            <RowDefinition Height="Auto"/>
            <RowDefinition Height="Auto"/>
            <RowDefinition Height="*"/>

            <b:EventBehavior Event="Tap" Action="{Binding TileTextClickedAction}"/>

            Style="{StaticResource MastheadShowTitleTextStyle}"
            Text="{Binding Title}" 
            Visibility="{Binding Title, 
                               Converter={StaticResource HuluConverter}, 

            Style="{StaticResource MastheadVideoTitleTextStyle}"
            Text="{Binding Subtitle}"
            Visibility="{Binding Subtitle, 
                               Converter={StaticResource HuluConverter}, 

            Style="{StaticResource MastheadDescriptionTextStyle}"
            Text="{Binding Description}" MaxHeight="80"
            Visibility="{Binding Description, 
                               Converter={StaticResource HuluConverter}, 

In the above example we have three text blocks where any given text block can be hidden/collapsed if the text it’s bound to is blank (null or an empty string). Any time the title, subtitle, or description changes, the UI will automatically be updated (and each text block will be made visible or collapsed as needed). Behind the scenes, there’s a view model object (a fairly simple/typical C# object) driving this view. However, the view model has no specific knowledge about what this view looks like or how it is arranged (nor does it need to). Instead, the view model’s chief concern is in providing a useful set of bindable properties (getters/setters) that any particular layout might want.

XAML and MVVM ends up being a really nice workflow/pattern when it comes to iterating designs and reusing code across different templates (e.g. templates for Windows 8 snapped mode versus full screen mode). With a bit of legwork, this reusability extends even across applications. For example, the bulk of our Windows Phone 8 templates are bound to the same view models we created for the Windows 8 app.

If you’re interested in learning more about the MVVM pattern and how it helped drive the development of our Windows 8 and Windows Phone 8 apps, check out this video from Build 2013.

XAML Control Template Overrides

Another perk of working with XAML-based frameworks is that you can extract and override the standard XAML templates used by the built-in controls.

For example, if you want a Windows Phone 8 LongListSelector to have some extra padding at the bottom of the list (but inside the scrollable region), just override the template and add the padding to the inner ViewportControl:

<Style x:Key="LongListSelectorWithBottomPadding" 
    <Setter Property="Template">
            <ControlTemplate TargetType="fcontrols:HLongListSelector">
                <Grid Background="{TemplateBinding Background}">
                        <VisualStateGroup x:Name="ScrollStates">
                                <VisualTransition GeneratedDuration="00:00:00.5"/>
                            <VisualState x:Name="Scrolling">
                                    <DoubleAnimation Duration="0" To="1" 
                            <VisualState x:Name="NotScrolling"/>
                    <Grid Margin="{TemplateBinding Padding}">
                            <ColumnDefinition Width="*"/>
                            <ColumnDefinition Width="auto"/>
                      <!-- Note the bottom padding being set to 80 -->
                        <ViewportControl x:Name="ViewportControl" 
                        <ScrollBar x:Name="VerticalScrollBar" 
                          Grid.Column="1" Margin="4,0,4,0" 
                          Opacity="0" Orientation="Vertical"/>

The template itself is large and might look like a lot of gibberish due to all the built in styling and transitions. However, we didn’t have to write all of that. You can find the built-in control’s default XAML using Expression Blend (just right-click on the component and choose Edit Template -> Edit a Copy). From there, we simply copy the default XAML and make our tweaks/changes as needed.

This separation of UI from behavior, found in all the system controls, ends up providing a lot of power when you start approaching the limits of the built-in controls and want to extend your UI to do things outside of the original design. For example, if you want to use the FlipView component on Windows 8, but you don’t like the appearance of the previous/next buttons, override the FlipView template and provide your buttons (see how this works for our masthead). The logic of the control (animations, multitouch gestures, item recycling, etc) all stays the same.

Zachary is a software developer on the mobile team at Hulu, and has worked on our Android, Windows 8, and Windows Phone apps.

Last comment: Apr 21st 2014 1 Comment

Bank: an open source Statsd/Metricsd aggregation frontend

July 9th, 2013 by Feng Qi

At Hulu we often use Statsd and Metricsd. Metrics of our high-through-traffic services are continuously sent to Statsd/Metricsd clusters, and are eventually surfaced via Graphite. Human or machine can then take the advantage of the Graphite UI or API to set up customized monitoring and alerting systems.

While we enjoyed the connectionless packet transmission enabled by the UDP ports of Statsd and Metricsd, we found the metric data carried inside such UDP packets proportionally small. For example, to tell that the API “hulu_api” has processed one request with 30 milliseconds, we send this message to Statsd: “hulu_api:30|ms”. Consider, for a moment, how this looks at the level of a UDP packet on top of IP. With a UDP header of 8 bytes and an IP header of 20 bytes, the 14-byte metric data corresponds to only 33% of the bytes transferred. Aggregating many such packets into a single packet can increase this percentage. In the previous example, if we combine many hulu_api metrics and send 1000 bytes of metrics data within one packet, the data portion of the payload will correspond to 97% of the bytes transferred. This is a great improvement. To what extent can such aggregation be helpful depends on the Maximum Transmission Unit (MTU) of a specific network path. Sending a packet of larger length will cause fragmentation of data frames, which again introduces header overhead.

Using packet aggregation, the advantage of reducing transmission overhead comes with a few disadvantages as well. First, it reduces the temporal resolution of time-series data, because packets within a time window will be sent together. Second, packet loss rate may increase due to checksum failures, because longer packets have a higher probability of data corruption, and any failure in a checksum will result in the receiver discarding that entire packet. Third, sending larger packets takes more time and hence increases latency. To overcome the first disadvantage, we can use a time threshold — when this threshold is reached, we send the aggregated packet even if aggregation has not reached its max capacity. The second and third disadvantages cannot be compensated easily, but it is usually reasonable to assume a network to be still accurate and fast when packet sizes are close to but below the MTU.

When we have a target temporal resolution, we can perform some simple math-based data aggregations within the time window:

  1. Many packets of “Count” data type can be aggregated into one packet with their values summed: “hulu_api_success:10|c” and “hulu_api_success:90|c” can be aggregated to “hulu_api_success:100|c”
  2. Many packets of “Gauge” data type can be aggregated into one packet with the one latest value: “hulu_api_queue_size:30|g” and “hulu_api_queue_size:25|g” can be aggregated to “hulu_api_queue_size:25|g”
  3. Other data types are difficult for math aggregation without re-implementing functionalities of Statsd or Metricsd.

Implementing math-based aggregation is actually not trivial. It requires parsing incoming UDP data, organizing them by metric names and data types, and performing the math operation. The benefit is that it reduces the actual amount of metrics information you send. In the scenario that a high frequency of count or gauge data points are sampled, hundreds of messages can be reduced to one!

To implement above aggregation logic, one could implement it within each sender. Instead of spreading this complexity throughout our applications, we decided instead to build a independent daemon. A standalone daemon has all the advantages of modularization: no matter with what language a service is implemented, and no matter how many different services one machine has, the network admin just needs to bring up one daemon process to aggregate all the packets sent from that host.

We decided to call the project “Bank” because we save penny packets and hope the “withdrawals” to be large. Here are some interesting details of Bank:

  1. Bank sends an aggregated packet when one of the three criteria is met:
    • a configurable max packet length is reached
    • a configurable number of packets have been aggregated
    • a configurable time interval has elapsed
  2. Bank can be used as the frontend of both Statsd and Metricsd because it respects both protocols. For example, it supports the Metricsd data type “meter”, which allows clients to omit “|m”. Also it does not send the Statsd style multi-metric packet, which is not understood by Metricsd.
  3. Bank respects “delete”. A packet with data “hulu_api:delete|h” will cause bank to discard current histogram data of hulu_api and send “hulu_api:delete|h” to downstream.
  4. Bank can parse packets that are already aggregated. Sometimes certain smart upstreams do certain level of aggregation by themselves, and sometimes math aggregation can be carried out at multiple levels, so it is nice to be able to parse the aggregated packets.

When our Seattle team tried Bank against their Statsd cluster, they decided to add downstream consistent hashing logic to it. To work with a cluster of Statsd machines, the same metric from different frontend should be sent to the same Statsd instance, because that instance will need all the data to do statistics. Consistent hashing is a perfect choice in this scenario. Each Bank uses the TCP health check of Statsd to determine which downstream Statsd’s are up, and distribute packets based on metric names.

We implemented the initial version of Bank using Python in a weekend — indeed most of the time was spent on parsing and assembling strings to comply with the Statsd and Metricsd protocols. That version proved to not be as highly performant as we’d wanted it to be, and so we ended up rewriting Bank into C. This C version of Bank only uses standard libraries, so it should be portable across most platforms.

You can find the code at https://github.com/hulu/bank — please feel free to file issues or suggest patches if you end up using bank.

Feng is a software developer on the Core Services team.

GSAutomation — an open-source iOS test library

April 24th, 2013 by Bao Lei

We all wish testing applications was as much fun as watching Hulu content. Just sit back, watch some Family Guy or The Office, and the app is fully tested. While sometimes this is part of our quality assurance process, most of the time testing is much more mundane. Logging in, logging out, resetting passwords, adding items to your queue, checking watch history, checking search results, rotating the device, making sure you can’t watch Chasing Amy when logged with a 13-year-old’s account, ensuring correct ad logic… And as tedious as this might be to do on a beautiful new iPad with a Retina display, you have to repeat this process on the iPad 3, and iPad 2, and on the iPad 1. And then a new build comes, and we do it all over again.

Test automation is the obvious solution. On the iOS team we’ve traditionally relied on Apple’s UIAutomation, which is a cool framework that lets you simulate user behaviors (such as taps, scrolling, flicks, rotations, string input, etc) and check UI elements in JavaScript. At first, watching buttons get magically tapped by an invisible ghost navigating throughout the app is impressive. Eventually, however, we found ourselves spending more time maintaining the script than the time saved from the automation process. One reason is that the automation script covers a rather small portion of test scenarios, and the other reason is that the automated tests are sometimes fragile, causing false negatives due to a changed application view layering rather than a legitimate bug.

With the goals of making test scripts easier to write and making their performance more reliable and robust, we created the library called GSAutomation. (Wondering where the prefix GS is from? Check out our last year’s blog post on GSFancyText.) It is an extension/wrapper for UIAutomation, which makes iOS app testers’ and test script developers’ lives a lot easier.

First, GSAutomation makes writing scripts parallel real human testers’ behaviors. For example, if you want to tap a button and then check if some label says the right thing, just define a task array like:

task = [
  [Tap, "Button"],
  [Check, "Text I'm expecting", "Text I'm expecting from another label"],

And then call performTask(task)

A task array consists of a number of steps. Each step is itself an array starting with the action name, followed by a series of parameters (e.g. for “Tap” it’s the title of a button, for “Check” it’s a list of labels/text views.) The simplicity of this syntax means that it’s no longer a requirement to be a software developer to create a test. Anyone with a text editor and some light education about the syntax can chip in and get their favorite features covered by tests that are running nightly.

But why arrays? Why not just define some helper methods and then make the scripts like

Check("label1", "label2");

One reason is of course that in the Cult of Objective-C we think square brackets are more beautiful than parentheses. Another reason is that while interating with the task arrays, we got many instabilities and sharp edges handled with some common logic. So the scripts are more robust against all scenarios. For example, there is always one second delay between individual actions. Also, for tapping or checking results, if the element you are looking for doesn’t exist at the beginning, we patiently wait for some more time since we know it might be either network latency or an old device’s poor CPU.

To make tests even more robust, GSAutomation offers a failure rescue mechanism for some actions. For example, when we test the player within the Hulu Plus app, if our script fails to tap the pause button it might not be because pausing is broken — instead, it could have been that the control bar auto-hid after seconds of inactivity. So we simply tap the screen center and try the pause button again. The step array here will be:

[Tap, "pauseButton", [TapPoint, screenCenter()]]

The last parameter in this step is the rescue action if the original action failed. Note that in this example, the pauseButton doesn’t have a text title, so we refer to it by the image file name — credit for this flexibility goes to UIAutomation (and more fundamentally, UIAccessibility, which determines how UI elements are referred to within UIAutomation).

Other than this task array based workflow, GSAutomation also provides a list of helper methods to work around the need to use Apple’s long method name convention and make basic jobs much simpler. So for example you can call isPad() to check whether it’s iPad; you can call win() to get the main window; you can call log(“some text”) instead of UIALogger.logDebug(“some text”).

Interested and want to give a try? Check it out at http://github.com/hulu/gsautomation

The project includes an example with a simple iOS app project and GSAutomation based test scripts. For full definition of actions, parameters, and ways to reference an UI element, check the README on Github.

Bao Lei is a software developer on the mobile team who works on our iOS platform.

Last comment: Sep 16th 2013 3 Comments

Python and Hulu

March 13th, 2013 by Ilya Haykinson

At Hulu, our development teams (and individual developers) have a lot of freedom in their choice of development tools. Our overridding principle is “make the best choice for your project”, which means that we expect you to evaluate tools and platforms that are already used within the company as well as those that are new to us. While sometimes the project’s needs will drive its development team to choose something new, there are a lot of people at Hulu who make the choice of Python.

This year, as part of our commitment to Python, we are helping to sponsor PyCon 2013. Look for a few Hulugans on the conference floor and at our booth too. If you are there, stop by our booth to chat about what kinds of things we do with Python.

We like Python for its speed of development and execution, its diverse libraries, and for being easy enough to read to easily let new developers get acquainted with a project. We use Python widely, for tasks big and small. The small might include scripts to help with deployment or monitoring, or wrappers around git or other tools. The large includes systems that are core to our application API. Some internal examples are below.


At the core of our devices is an application we call Deejay. When our desktop, living room, and mobile apps start up, they connect to Deejay to learn about the Hulu environment they’re connecting to. An iPhone in Japan needs to know to use a different information architecture than an iPhone in the US. The PS3 app will need a different set of icons than an Xbox. In addition to general configuration, the Deejay service is used to help in streaming video.

Obviously, to support these core API needs, the service needs to be very high performance. Python, together with CherryPy, gunicorn, and gevent more than provides for this.


Python and the Tornado web server are behind the service that we at Hulu use to resize, crop, and otherwise manipulate images that get displayed by our site and by our device apps. We’d open sourced the core of this service a bit ago, and have used it extensively. The service provides a consistent HTTP front-end to some imagemagick capabilities, and allows an app to request an image of a particular size with a particular effect applied — whatever an app needs at the moment. Given our scale, this service benefits greatly from living behind a cache (whether local or CDN-based).


There comes a time in every company’s life when devops gets sick and tired of developers saying, “I’d like another machine for my service”. We’d gotten to this point some time ago, and built Sod. This platform is the foundation of our private cloud. It allows any of our developers to create a Xen-based virtual machine (running Centos, Ubuntu, or Windows), get it up on the correct part of our network, endow it with the desired amount of RAM and drive space, and launch it — within less than a minute. Whether using a web user interface, an API, or a command-line tool, Sod is our centralized interface to a cluster of Xen nodes. It abstracts the interaction with Xen and handles its peculiarities. It also helps our devops team to manage VM operations like cross-host moves. By integrating with our internal authenticaion system it also helps us keep track of machine and service ownership throughout our environment. We built Sod with CherryPy, gunicorn, gevent, and based its data store on MySQL. python-sod  


There comes a further time in a company’s life when developers start spinning up VMs to host a one-off web service. The developers want the service to be fault-tolerant — so they spin up multiple instances and get them load-balanced. They want to know how well the service runs, so they build log collection systems. They want to have the service geo-distributed, so they create their own schemes for hosting this in multiple data centers.

Donki was built to make this much more simple. At its core it’s a service for hosting other services. Developers write any wsgi-compliant service, then git push it to donki, which handles the rest. This includes deployment, pre-release smoke testing, setting up DNS and load balancing, handling data center distribution, log collection, and much more. Donki guarantees that at least 2 instances (or more, depending on load) are always up. Under the covers, Donki uses Sod and other internal devops services to orchestrate its hosting. In fact, the Sod service itself is hosted by Donki. We wrote Donki using Django for the front-end.



What started off as a Hack Day project for a couple of us has become an important part of many people’s lives at Hulu. After noticing that we pay significant money for a telco phone conferencing service, we hacked on a Django-based system to use the Twilio API to create a phone conferencing service. After a weekend it was a working prototype, and after a few more weeks we launched it to the company. By January of this year, we had 400+ users participating in 1,300+ calls per month — at a fraction of the cost our telco-based service cost us, and with a great deal of flexibility. Like many other such apps, Parley runs as a Donki-managed service.


There are many more Python projects at Hulu. From small to big, we depend on the language and its ecosystem to drive our business. So it’s with pride that we sponsor this year’s PyCon, and hope to continue to have a wonderful symbiotic relationship with the language, its future advances, and its great influence for years to come.

Last comment: about 21 hours ago 8 Comments

Ghost Builds on Jenkins hosted on Windows

March 4th, 2013 by Jia Cao

At Hulu we use Jenkins as our continuous integration system. We use it on Windows, Linux, and OS X — depending on the platform of the project.

Recently, we noticed that our Windows build machine exhibited a strange issue. From time to time, some builds were triggered automatically without new commits or any manual operations. We started calling them them “ghost builds”. Here’s a snippet of our Hipchat log:

PaymentsJenkins Build 4162 for project pay-net-master: SUCCESS. No commits. December 9, 2012
PaymentsJenkins Build 4163 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4164 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4165 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4166 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4167 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4168 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4169 for project pay-net-master: SUCCESS. No commits. December 12, 2012
PaymentsJenkins Build 4170 for project pay-net-master: SUCCESS. No commits. December 12, 2012

Now, errant automated builds don’t really hurt anyone. But at some point Sizheng decided that we ought to fix it.

Sizheng Chen   2:52 PM
i will pay a decent lunch if anyone can fix the ghost jenkin build issue

So, let’s figure out the issue for this “decent lunch”.

If you take a look at the log of the ghost build, you’ll find something interesting:

Started on Feb 3, 2013 2:40:34 PM
Using strategy: Default
[poll] Last Build : #94
[poll] Last Built Revision: Revision 82846c5e8a046e81c5e20874d2cd767449884304 (origin/develop)
Workspace has a .git repository, but it appears to be corrupt.
No Git repository yet, an initial checkout is required
Done. Took 12 sec
Changes found


Turns out, this is a common issue in Windows Jenkins. According to JENKINS-11547, the main reason is that there are just too many jobs making git requests at once.

Work Around

Configure Jenkins jobs with a different auto-polling schedule, so that they have less of a chance to overlap.

Here is what I changed:

pay-net-master: Polling SCM: */10 * * * *
pay-net-develop: Polling SCM: */11 * * * *


The ghost builds haven’t occurred any more. And Sizheng owes me a lunch.

Jia Cao is a software developer in our Beijing office.

Last comment: Jun 1st 2013 1 Comment

LA Scala Meetup at Hulu

December 13th, 2012 by ben.hardy@hulu.com

Hulu hosted the Los Angeles Scala Users Group’s October 2012 meet up as part of our ongoing support of the local tech community.

Our two speakers included our own Ben Hardy, who gave an introductory presentation on Scala’s Option class, and local Scala authority Paul Snively, who gave a detailed presentation on SLICK. SLICK is Scala 2.10′s advanced type-safe database access layer, which provides a lightweight alternative to ORM, and provides functional manipulation techniques for objects in databases.

Stay tuned for more Hulu tech community events!

Check out the video of the event below.