• Hulu
  • TV
  • Movies
  • More TV. On more devices.
Search
Hulu Tech Blog

GSAutomation — an open-source iOS test library

April 24th, 2013 by Bao Lei

We all wish testing applications was as much fun as watching Hulu content. Just sit back, watch some Family Guy or The Office, and the app is fully tested. While sometimes this is part of our quality assurance process, most of the time testing is much more mundane. Logging in, logging out, resetting passwords, adding items to your queue, checking watch history, checking search results, rotating the device, making sure you can’t watch Chasing Amy when logged with a 13-year-old’s account, ensuring correct ad logic… And as tedious as this might be to do on a beautiful new iPad with a Retina display, you have to repeat this process on the iPad 3, and iPad 2, and on the iPad 1. And then a new build comes, and we do it all over again.

Test automation is the obvious solution. On the iOS team we’ve traditionally relied on Apple’s UIAutomation, which is a cool framework that lets you simulate user behaviors (such as taps, scrolling, flicks, rotations, string input, etc) and check UI elements in JavaScript. At first, watching buttons get magically tapped by an invisible ghost navigating throughout the app is impressive. Eventually, however, we found ourselves spending more time maintaining the script than the time saved from the automation process. One reason is that the automation script covers a rather small portion of test scenarios, and the other reason is that the automated tests are sometimes fragile, causing false negatives due to a changed application view layering rather than a legitimate bug.

With the goals of making test scripts easier to write and making their performance more reliable and robust, we created the library called GSAutomation. (Wondering where the prefix GS is from? Check out our last year’s blog post on GSFancyText.) It is an extension/wrapper for UIAutomation, which makes iOS app testers’ and test script developers’ lives a lot easier.

First, GSAutomation makes writing scripts parallel real human testers’ behaviors. For example, if you want to tap a button and then check if some label says the right thing, just define a task array like:

task = [
  [Tap, "Button"],
  [Check, "Text I'm expecting", "Text I'm expecting from another label"],
]

And then call performTask(task)

A task array consists of a number of steps. Each step is itself an array starting with the action name, followed by a series of parameters (e.g. for “Tap” it’s the title of a button, for “Check” it’s a list of labels/text views.) The simplicity of this syntax means that it’s no longer a requirement to be a software developer to create a test. Anyone with a text editor and some light education about the syntax can chip in and get their favorite features covered by tests that are running nightly.

But why arrays? Why not just define some helper methods and then make the scripts like

Tap("Button");
Check("label1", "label2");

One reason is of course that in the Cult of Objective-C we think square brackets are more beautiful than parentheses. Another reason is that while interating with the task arrays, we got many instabilities and sharp edges handled with some common logic. So the scripts are more robust against all scenarios. For example, there is always one second delay between individual actions. Also, for tapping or checking results, if the element you are looking for doesn’t exist at the beginning, we patiently wait for some more time since we know it might be either network latency or an old device’s poor CPU.

To make tests even more robust, GSAutomation offers a failure rescue mechanism for some actions. For example, when we test the player within the Hulu Plus app, if our script fails to tap the pause button it might not be because pausing is broken — instead, it could have been that the control bar auto-hid after seconds of inactivity. So we simply tap the screen center and try the pause button again. The step array here will be:

[Tap, "pauseButton", [TapPoint, screenCenter()]]

The last parameter in this step is the rescue action if the original action failed. Note that in this example, the pauseButton doesn’t have a text title, so we refer to it by the image file name — credit for this flexibility goes to UIAutomation (and more fundamentally, UIAccessibility, which determines how UI elements are referred to within UIAutomation).

Other than this task array based workflow, GSAutomation also provides a list of helper methods to work around the need to use Apple’s long method name convention and make basic jobs much simpler. So for example you can call isPad() to check whether it’s iPad; you can call win() to get the main window; you can call log(“some text”) instead of UIALogger.logDebug(“some text”).

Interested and want to give a try? Check it out at http://github.com/hulu/gsautomation

The project includes an example with a simple iOS app project and GSAutomation based test scripts. For full definition of actions, parameters, and ways to reference an UI element, check the README on Github.

Bao Lei is a software developer on the mobile team who works on our iOS platform.

Last comment: about 14 hours ago 1 Comment

Python and Hulu

March 13th, 2013 by Ilya Haykinson

At Hulu, our development teams (and individual developers) have a lot of freedom in their choice of development tools. Our overridding principle is “make the best choice for your project”, which means that we expect you to evaluate tools and platforms that are already used within the company as well as those that are new to us. While sometimes the project’s needs will drive its development team to choose something new, there are a lot of people at Hulu who make the choice of Python.

This year, as part of our commitment to Python, we are helping to sponsor PyCon 2013. Look for a few Hulugans on the conference floor and at our booth too. If you are there, stop by our booth to chat about what kinds of things we do with Python.

We like Python for its speed of development and execution, its diverse libraries, and for being easy enough to read to easily let new developers get acquainted with a project. We use Python widely, for tasks big and small. The small might include scripts to help with deployment or monitoring, or wrappers around git or other tools. The large includes systems that are core to our application API. Some internal examples are below.

Deejay

At the core of our devices is an application we call Deejay. When our desktop, living room, and mobile apps start up, they connect to Deejay to learn about the Hulu environment they’re connecting to. An iPhone in Japan needs to know to use a different information architecture than an iPhone in the US. The PS3 app will need a different set of icons than an Xbox. In addition to general configuration, the Deejay service is used to help in streaming video.

Obviously, to support these core API needs, the service needs to be very high performance. Python, together with CherryPy, gunicorn, and gevent more than provides for this.

Ectyper

Python and the Tornado web server are behind the service that we at Hulu use to resize, crop, and otherwise manipulate images that get displayed by our site and by our device apps. We’d open sourced the core of this service a bit ago, and have used it extensively. The service provides a consistent HTTP front-end to some imagemagick capabilities, and allows an app to request an image of a particular size with a particular effect applied — whatever an app needs at the moment. Given our scale, this service benefits greatly from living behind a cache (whether local or CDN-based).

Sod

There comes a time in every company’s life when devops gets sick and tired of developers saying, “I’d like another machine for my service”. We’d gotten to this point some time ago, and built Sod. This platform is the foundation of our private cloud. It allows any of our developers to create a Xen-based virtual machine (running Centos, Ubuntu, or Windows), get it up on the correct part of our network, endow it with the desired amount of RAM and drive space, and launch it — within less than a minute. Whether using a web user interface, an API, or a command-line tool, Sod is our centralized interface to a cluster of Xen nodes. It abstracts the interaction with Xen and handles its peculiarities. It also helps our devops team to manage VM operations like cross-host moves. By integrating with our internal authenticaion system it also helps us keep track of machine and service ownership throughout our environment. We built Sod with CherryPy, gunicorn, gevent, and based its data store on MySQL. python-sod  

Donki

There comes a further time in a company’s life when developers start spinning up VMs to host a one-off web service. The developers want the service to be fault-tolerant — so they spin up multiple instances and get them load-balanced. They want to know how well the service runs, so they build log collection systems. They want to have the service geo-distributed, so they create their own schemes for hosting this in multiple data centers.

Donki was built to make this much more simple. At its core it’s a service for hosting other services. Developers write any wsgi-compliant service, then git push it to donki, which handles the rest. This includes deployment, pre-release smoke testing, setting up DNS and load balancing, handling data center distribution, log collection, and much more. Donki guarantees that at least 2 instances (or more, depending on load) are always up. Under the covers, Donki uses Sod and other internal devops services to orchestrate its hosting. In fact, the Sod service itself is hosted by Donki. We wrote Donki using Django for the front-end.

python-donki  

Parley

What started off as a Hack Day project for a couple of us has become an important part of many people’s lives at Hulu. After noticing that we pay significant money for a telco phone conferencing service, we hacked on a Django-based system to use the Twilio API to create a phone conferencing service. After a weekend it was a working prototype, and after a few more weeks we launched it to the company. By January of this year, we had 400+ users participating in 1,300+ calls per month — at a fraction of the cost our telco-based service cost us, and with a great deal of flexibility. Like many other such apps, Parley runs as a Donki-managed service.

python-parley

There are many more Python projects at Hulu. From small to big, we depend on the language and its ecosystem to drive our business. So it’s with pride that we sponsor this year’s PyCon, and hope to continue to have a wonderful symbiotic relationship with the language, its future advances, and its great influence for years to come.

Last comment: Apr 22nd 2013 7 Comments

Ghost Builds on Jenkins hosted on Windows

March 4th, 2013 by Jia Cao

At Hulu we use Jenkins as our continuous integration system. We use it on Windows, Linux, and OS X — depending on the platform of the project.

Recently, we noticed that our Windows build machine exhibited a strange issue. From time to time, some builds were triggered automatically without new commits or any manual operations. We started calling them them “ghost builds”. Here’s a snippet of our Hipchat log:

PaymentsJenkins Build 4162 for project pay-net-master: SUCCESS. No commits. December 9, 2012
PaymentsJenkins Build 4163 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4164 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4165 for project pay-net-master: SUCCESS. No commits. December 10, 2012
PaymentsJenkins Build 4166 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4167 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4168 for project pay-net-master: SUCCESS. No commits. December 11, 2012
PaymentsJenkins Build 4169 for project pay-net-master: SUCCESS. No commits. December 12, 2012
PaymentsJenkins Build 4170 for project pay-net-master: SUCCESS. No commits. December 12, 2012

Now, errant automated builds don’t really hurt anyone. But at some point Sizheng decided that we ought to fix it.

Sizheng Chen   2:52 PM
i will pay a decent lunch if anyone can fix the ghost jenkin build issue

So, let’s figure out the issue for this “decent lunch”.

If you take a look at the log of the ghost build, you’ll find something interesting:

Started on Feb 3, 2013 2:40:34 PM
Using strategy: Default
[poll] Last Build : #94
[poll] Last Built Revision: Revision 82846c5e8a046e81c5e20874d2cd767449884304 (origin/develop)
Workspace has a .git repository, but it appears to be corrupt.
No Git repository yet, an initial checkout is required
Done. Took 12 sec
Changes found

Reason

Turns out, this is a common issue in Windows Jenkins. According to JENKINS-11547, the main reason is that there are just too many jobs making git requests at once.

Work Around

Configure Jenkins jobs with a different auto-polling schedule, so that they have less of a chance to overlap.

Here is what I changed:

pay-net-master: Polling SCM: */10 * * * *
pay-net-develop: Polling SCM: */11 * * * *

Result

The ghost builds haven’t occurred any more. And Sizheng owes me a lunch.

Jia Cao is a software developer in our Beijing office.

LA Scala Meetup at Hulu

December 13th, 2012 by ben.hardy@hulu.com

Hulu hosted the Los Angeles Scala Users Group’s October 2012 meet up as part of our ongoing support of the local tech community.

Our two speakers included our own Ben Hardy, who gave an introductory presentation on Scala’s Option class, and local Scala authority Paul Snively, who gave a detailed presentation on SLICK. SLICK is Scala 2.10′s advanced type-safe database access layer, which provides a lightweight alternative to ORM, and provides functional manipulation techniques for objects in databases.

Stay tuned for more Hulu tech community events!

Check out the video of the event below.




Dominate Dragons with Git

November 26th, 2012 by Jeff Yang

People love git for many reasons. Git is super fast and works offline. Git offers cheap branches and effortless merging. Git has customizable project workflows.

Those reasons are all great. But git means much more than that to me. For most people, git is a tool used for managing source code. These people probably interact with git somewhat frequently. To me, git is a tool used for writing source code. I interact with git all the time.

But wait, you say. Git helps you write code? Don’t you have to write code before you use git? Write logical chunks of code and commit them. That’s how it works right?

Nope. That’s not how I roll. I commit a lot. I don’t care if my code is logical. I don’t care if it’s hacky, or ugly, or if it isn’t DRY. I don’t even care if my code works. I commit it all anyways.

I wasn’t always like this. Git changed me. Git transformed the way I code. A lot has been written about customizable project workflows in git. This is different. This is a customized personal workflow. My customized personal workflow.

Be a Hero


You are playing a video game. At some point in the game, you encounter a dragon. What do you do when you reach that point? You save. You call it something like “before dragon”. As a rule, you should save before every critical point. You attack the dragon with a sword. Ouch! The dragon roasts you with his fiery breath.

fire dragon

The dragon killed you! What do you do? You restore of course! No loss there. This time, you search around and find a shield to help in battle. Now you successfully block the stream of fire and kill him with the sword. Yay! Now you save again. You do this because if something happens, you can always restore to “killed dragon” so you don’t have to fight the dragon again. Another rule, save after every critical point.

dead dragon

Congratulations! We saved the kingdom! But that poor dragon! Maybe we don’t have to kill him. Let’s try it out! After all, if you aren’t happy with the result, you can always restore to “killed dragon”. First, restore to “before dragon”. Now try negotiating with the dragon. As it turns out, the dragon loves riddles.

If you answer the riddle correctly, he asks the next riddle. If you answer incorrectly… chomp! What do you do? Once again, you save after every riddle. But now you just need to make progress with each question, you don’t need to restore to the previous question. Many games offer a quicksave (and quickload) option. This is a very convenient way to save that overwrites the same save game over and over. So, before every answer you hit quicksave. If you’re wrong, hit quickload. Take a moment to think about how trivial this makes the “negotiation” even if the riddles are very hard.

riddle dragon

Congratulations! We saved the kingdom AND the dragon! Think about how great it is to be able to save. You can do anything you want. Explore! Experiment! Be fearless! Don’t hesitate–just do it! Don’t like it? Want to try something different? Restore! Something great happens? Save! When you take advantage of this it doesn’t only save you time, you actually end up playing an entirely different game.

This is so powerful. This is freedom without consequences. It’s like the movie Groundhog Day. In it, Bill Murray picks up all sorts of skills, learning, among other things, French, the piano, and ice sculpting as he repeats the same day over and over. Through multiple iterations he is able to fine tune all of his interactions, allowing him to get any girl, and allowing him to become the town hero.

Imagine if you could have as many do-overs in life as you wanted? How would you approach things differently? You wouldn’t need to worry as much. You wouldn’t need to prepare as much either. You could try crazy things. You could aim for your perfect outcome. Think about how awesome you’d be!

An Example


The real world might not offer you these powers, but real life source control can. In this example, I’ll be using Ruby/Rails. I’m going to write something that converts a date filter into a mysql where clause query.

So basically something like:

equal current_year

would translate to

"BETWEEN '2012-01-01 00:00:00' AND '2012-12-31 23:59:59'"

Let’s start with some scaffolding code.

class Blog
  def self.build_where_clause(date_filter)
    return nil
  end
class DateFilter attr_accessor :operator, :value
def initialize(operator, value) @operator = operator @value = value end end end

Now add this file and commit it.

git add blog.rb && git commit -m "scaffolding for Blog"

I’ll write some unit tests and add them (code not shown).

git add test.rb && git commit -m "unit tests"

Let’s start working on our build_where_clause method.
Start by adding the possible values.

 class Blog
   def self.build_where_clause(date_filter)
+    case date_filter.value
+    when :current_year
+    when :next_year
+    when :current_quarter
+    when :next_quarter
+    when :current_month
+    end
+
     return nil
   end

Now, I need to actually implement something… remember, before every critical point… commit.

git commit -am "starting implementation"

Quicksave


Let’s implement one case to get a feel for things.

+require 'active_support/core_ext'
+
 class Blog
   def self.build_where_clause(date_filter)
     case date_filter.value
     when :current_year
+      start_date = Date.today.beginning_of_year
+      end_date = Date.today.end_of_year.end_of_day
+
+      case date_filter.operator
+      when :equal
+        return "BETWEEN '#{start_date.to_s(:db)}' AND '#{end_date.to_s(:db)}'"
+      end
+

Here, I make a commit. I’m going to name the commit “stuff”. What a terrible name! Well, I don’t really want to stop and think of a name–that slows me down! I want to code! More on this later.
The test failed. Let’s fix it.

     when :current_year
-      start_date = Date.today.beginning_of_year
+      start_date = Date.today.beginning_of_year.beginning_of_day
       end_date = Date.today.end_of_year.end_of_day

This is really part of the current commit, so I’d like to do a quicksave here. The equivalent in git is:

git commit -a --amend

Let’s finish implementing current_year.

 start_date = Date.today.beginning_of_year.beginning_of_day
    end_date = Date.today.end_of_year.end_of_day
+      db_start_date = start_date.to_s(:db)
+      db_end_date = end_date.to_s(:db)
+
       case date_filter.operator
       when :equal
-        return "BETWEEN '#{start_date.to_s(:db)}' AND '#{end_date.to_s(:db)}'"
+        return "BETWEEN '#{db_start_date}' AND '#{db_end_date}'"
+      when :not_equal
+        return "NOT BETWEEN '#{db_start_date}' AND '#{db_end_date}'"
+      when :less_than
+        return "< '#{db_start_date}'"
+      when :less_than_or_equal
+        return "<= '#{db_end_date}'"
+      when :greater_than
+        return "> '#{db_end_date}'"
+      when :greater_than_or_equal
+        return ">= '#{db_start_date}'"
       end

And quicksave again.

git commit -a --amend

Slay the Dragon


We’ve finished current_year, time to do next_year. I can do this super quick! Copy, paste, change start and end date, DONE!

     when :next_year
+      start_date = (Date.today.beginning_of_year + 1.year).beginning_of_day
+      end_date = start_date.end_of_year.end_of_day
+
+      db_start_date = start_date.to_s(:db)
+      db_end_date = end_date.to_s(:db)
+
+      case date_filter.operator
+      when :equal
+        return "BETWEEN '#{db_start_date}' AND '#{db_end_date}'"
+      when :not_equal
+        return "NOT BETWEEN '#{db_start_date}' AND '#{db_end_date}'"
+      when :less_than
+        return "< '#{db_start_date}'"
+      when :less_than_or_equal
+        return "<= '#{db_end_date}'"
+      when :greater_than
+        return "> '#{db_end_date}'"
+      when :greater_than_or_equal
+        return ">= '#{db_start_date}'"
+      end
+

This works and was really easy to do, but now I obviously need to refactor it. This isn’t going to be quite as simple as the above implementation. I’ll equate this to fighting the dragon in our analogy. Therefore I’ll create a “before refactor” commit. Then I start fighting… er, coding.

 class Blog
   def self.build_where_clause(date_filter)
-      [deleted code]
+    start_date = get_start_date(date_filter.value)
+    end_date = get_end_date(date_filter.value)
+    return nil if start_date.nil? || end_date.nil?
       db_start_date = start_date.to_s(:db)
       db_end_date = end_date.to_s(:db)
@@ -25,33 +24,6 @@ class Blog
         return ">= '#{db_start_date}'"
       end
-      [deleted code]
     return nil
   end

git commit -am "stuff"

+  def self.get_start_date(value)
+    case value
+    when :current_year
+      return Date.today.beginning_of_year.beginning_of_day
+    when :next_year
+      return (Date.today.beginning_of_year + 1.year).beginning_of_day
+    when :current_quarter
+    when :next_quarter
+    when :current_month
+    end
+
+    return nil
+  end

git commit -am "stuff"

+  def self.get_end_date(value)
+    case value
+    when :current_year
+      return Date.today.end_of_year.end_of_day
+    when :next_year
+      return (Date.today + 1.year).end_of_year.end_of_day
+    when :current_quarter
+    when :next_quarter
+    when :current_month
+    end
+
+    return nil
+  end

git commit -am "stuff"

Friend the Dragon


This works, but I don’t like it. There is repetitive code and the start and end date calculations should really be treated as one unit. It’s a mess. I want to start over. With a project of this size, you could go either way, forward or back. But I’m sure you’ve encountered hairy refactoring scenarios with multiple files, and you just want to rethink it and start all over again. Often, I end up restarting the refactoring with a completely different perspective. What you might be tempted to do is hit undo many many times, and maybe redo a few times if you went back too much. You have to remember how much to undo and you have to remember which files to undo. Not this time! Instead of undo we will restore to the exact spot I want to start from.

To look at my log I type:

git log --oneline master..HEAD
5d66c8c stuff 5290a6a stuff fd0173a stuff 4f1031b before refactor e71ed80 stuff 606ca1b starting implementation 5d28af8 unit tests 36efc45 scaffolding for Blog

To restore my “before refactor” commit I type:

git reset --hard HEAD~3

This tells git to reset to 3 commits before the HEAD commit. You could also specify the hash like so:

git reset --hard 4f1031b

If this is an alternative that you might come back to, you can type git tag experiment1 before you do the reset to “save” the current commit as a tag called experiment1. Very quick and easy.

Side note: git has something called the reflog. Every change you make, every commit, every reset, it’s all recorded in the reflog. It’s kind of like git log, but it shows your change history rather than your repository history. As long as you’ve committed your code, it’s very hard to lose it. For example, what if you did want to tag the commit but you did the reset already? Or what if you typed git reset –hard HEAD~4 by accident? Try it out with “git reflog” or “git log -g”.

Now, let’s try to refactor this code in a different way.

 class Blog
   def self.build_where_clause(date_filter)
-      [deleted code]
+    start_date, end_date = get_date_range(date_filter.value)
+    return nil if start_date.nil? || end_date.nil?
       db_start_date = start_date.to_s(:db)
       db_end_date = end_date.to_s(:db)
@@ -25,33 +23,6 @@ class Blog
         return ">= '#{db_start_date}'"
       end
-      [deleted code]
     return nil
   end

git commit -am "stuff"

+  def self.get_date_range(value)
+    today = Date.today
+
+    case value
+    when :current_year
+      start_date = today.beginning_of_year
+      end_date = today.end_of_year
+
+    when :next_year
+      start_date = today.beginning_of_year + 1.year
+      end_date = start_date.end_of_year
+
+    when :current_quarter
+    when :next_quarter
+    when :current_month
+    end
+
+    start_date = start_date.beginning_of_day if !start_date.nil?
+    end_date = end_date.end_of_day if !end_date.nil?
+
+    return start_date, end_date
+  end

git commit -am "stuff"

Show Off!


Great! We are done refactoring! Let’s implement current_quarter.

     when :current_quarter
+      current_quarter = today.month / 3
+      start_date = Date.new(today.year, (current_qurter * 3) + 1, 1)
+      end_date = start_date + 3.months - 1.day
+

A coworker stops by and wants to see a demo of my code. But my code isn’t working! Let’s even assume that you don’t trust the refactored code yet. Not a problem!

Create a show_off branch:

git checkout -b show_off

Reset to “before refactor”:

git reset --hard 4f1031b

Show off my code.

Go back to dev branch:

git checkout dev

Isn’t this cool? We can jump anywhere we want in the code and then jump right back to where we left off! If you prefer, you can also use tags.

Track Down Bugs


Back to current_quarter. Something is failing. I spot a typo and fix it.

-      start_date = Date.new(today.year, (current_qurter * 3) + 1, 1)
+      start_date = Date.new(today.year, (current_quarter * 3) + 1, 1)

git commit -a --amend

My unit tests are failing. Let’s put in some debugging code and fix the problem. Here is my diff.

   def self.get_date_range(value)
     today = Date.today
+    puts "today = #{today}"
     case value
     when :current_year
@@ -39,10 +40,16 @@ class Blog
       end_date = start_date.end_of_year
     when :current_quarter
-      current_quarter = today.month / 3
+      current_quarter = (today.month - 1) / 3
       start_date = Date.new(today.year, (current_quarter * 3) + 1, 1)
       end_date = start_date + 3.months - 1.day
+      puts "current_quarter = #{current_quarter}"
+      test_var = current_quarter * 3
+      puts "test_var = #{test_var}"
+      puts "start_date = #{start_date}"
+      puts "end_date = #{end_date}"
+
     when :next_quarter
     when :current_month
     end

Take Advantage of Diffs


Look at that last diff. It’s quite clear which part is the fix and which part is not. Even though I have debugging code in multiple places, I don’t have to remember where it is. All I have to do is pull up this diff. Now, when I encounter an issue, I am free to throw whatever I can think of at the problem. I can add debugging code all over the place, in multiple files. I can delete whole chunks of code or short circuit it to quickly get to the meat of the problem. I don’t have to worry about what I changed or about screwing anything up. In this case, I commit the one line and throw away the rest. Sometimes I will even create a debugging code commit so that I can leave it in while working. Later, I can remove that commit in one clean stroke without having to remember any of what I did.

Diffs are a central part of my workflow. I always have my diffs pulled up on my screen. We’ve already seen how useful they are for filtering out debug code. Diffs are great for spotting bad code in general. You essentially get to code review your code before every commit and it’s all in digestible chunks. I’ve caught many potential bugs this way. You can reduce your mistakes drastically by having a habit of checking your diffs.

More importantly, the diff allows git to keep context for me. At any given point I have a current train of thought. The diff highlights my current train of thought for me. There was a typo that I fixed in my code above. Notice how in the previous diff I don’t even see that? I already fixed it! I don’t need to think about it anymore. In this way, I keep my brain free of baggage. Every piece of code in this post is in the form of a diff because it’s such a great way to display your current context. You know exactly what I’m thinking and exactly what changed.

Squash!


Let’s go back to the code.

While I’m working, I notice that there are some cases that I missed in my unit tests. I fix that and create a commit.

git commit -am "squash with unit tests"

I don’t want to deal with squashing it right now, I’m in the middle of something. I leave the commit there. More on squashing in a bit.

I finish the rest of the work and commit it.

git commit -am "finish rest of work"

Now I review the log of all my work so far.

git log --oneline master..HEAD
ebf3815 finish rest of work 99d4ce5 squash with unit tests 7ef5312 stuff ddd0260 stuff 2e0a5b3 stuff 4f1031b before refactor e71ed80 stuff 606ca1b starting implementation 5d28af8 unit tests 36efc45 scaffolding for Blog

We’ve created a bunch of commits to facilitate our work. Some of those commits have logic split up across several commits, some are missteps, some are incomplete, and some leave you in a non-working state. We need this while working but later on nobody cares how you got here. They care about being able to view (and roll back) changes in logically consistent and atomic operations. You want to clean up your history so everything is clear and easy to understand. To do this you combine your commits together. Git calls this squashing. The steps are:

  1. Create an interactive rebase
  2. Reorder your commits
  3. Choose which commits to squash

Let’s start the interactive rebase:

git rebase -i master

This shows you the following (note this is in reverse order from git log):

pick 36efc45 scaffolding for Blog
pick 5d28af8 unit tests
pick 606ca1b starting implementation
pick e71ed80 stuff
pick 4f1031b before refactor
pick 2e0a5b3 stuff
pick ddd0260 stuff
pick 7ef5312 stuff
pick 99d4ce5 squash with unit tests
pick ebf3815 finish rest of work

Now we reorder our commits. Move the “squash with unit tests” commit after “unit tests”. Then we choose which commits to squash by changing the “pick” to “squash”.

pick 36efc45 scaffolding for Blog
pick 5d28af8 unit tests
squash 99d4ce5 squash with unit tests
pick 606ca1b starting implementation
squash e71ed80 stuff
squash 4f1031b before refactor
squash 2e0a5b3 stuff
squash ddd0260 stuff
squash 7ef5312 stuff
pick ebf3815 finish rest of work

What squashing does is combine the commits together. We are moving the unit test commit next to the other unit test commit and combining them together. We are combining the implementation commits together.

Here is the log after we are done (once again in reverse order from above):

git log --oneline master..HEAD
81c0339 finish rest of work 6cd3185 starting implementation 27ed5cc unit tests 36efc45 scaffolding for Blog

This seems like a lot of extra work. Or is it?

Sort Cards Efficiently


You need to sort a deck of cards. You pick up the deck in your right hand and go through each card, moving it to your left hand in the correct place. It takes a while and when you are done you have a sorted deck of cards in your left hand.

Reset to before sort. You pick up the deck and toss the cards on the family room floor. You spread them out, sliding them to the correct positions. When all the cards are spread out you clean them all up into one deck at the end. Yes, you have to clean up the deck at the end but this is MUCH faster. You are optimizing for the actions that occur most frequently (sorting) rather than the one action (keeping the deck orderly).

I LOVE this. You are trading space for time. In the real world you have to handle garbage collection yourself but often that cost is negligible. This is a life hack. I use it EVERYWHERE. Folding laundry. Building a crib. Planning a trip. Browsing the web. This is an entire blog post by itself.

In exchange for speed while coding I use a lot of extra commits. At the end I garbage collect that and figure out which commits belong where. You can adjust depending on your needs. The better you name your commits, the easier this part is. I find that thinking about names slows me down so I try to spend as little time on naming as I can get away with.

Git Gives You Freedom


Git gives you the freedom to work the way you want to work.

  1. Freedom from Baggage. Baggage is all the stuff you try to keep track of in your head. You can only remember so many things. Baggage takes time to retrieve. Baggage causes stress. Baggage makes you look back. Get baggage out of your head and into git. You don’t have to keep track of your code or remember your context anymore.

  2. Freedom from Fear. Fear holds you back, makes you slow, and causes hesitation before you act. Don’t be afraid of forgetting things. Don’t be afraid of moving forward or doing too much. You can easily jump back to any previous state in an instant. It doesn’t matter which path you took, or how much code you’ve changed since then. You’re free to do anything you want once you’ve saved your commit. You can forge ahead and back out without fear of consequences.

High Quality, High Productivity


Ultimately our goal is to write high quality code and be super productive. We want to get stuff done, get it done fast, and make it great.

  1. Focus. Freedom gives you focus. You don’t have to remember, you don’t have to think, you don’t have to worry. Your head should be clear, focused, and streamlined. In my example, every step was focused and concise–even if it was incorrect. Focus helps with quality because you are only concentrating on one thing and git only shows this one thing. With only one thing to think about, bad code pops out at you. Focus let’s you be productive because you’re not wasting effort on other things and because it’s hard to become distracted.

  2. Speed. Freedom gives you speed. When you don’t have to do extra work, or think about other things, you can work faster. When you aren’t afraid to act, when you aren’t afraid to try, there is no need to hesitate. It’s a given that speed can help your productivity. What about quality? Speed is awesome for quality. Fast iterations are so powerful. We solved the dragon’s riddles easily. We tried both ways of dealing with the dragon and chose the best option. Iteration gives you the ability to tweak. Be aggressive. Improve your code, improve your product.

In the beginning of this post I asked how you would approach things differently if you could have do-overs in life. This is what I chose to do with git. What are you going to do? Explore how you can use git to tailor a workflow that fits you.

Jeff Yang is a software developer on the Ad Platform team at Hulu.

Edits:
Explained HEAD~3 better.

Last comment: Dec 30th 2012 16 Comments

Simple Service Monitoring with Canary

August 2nd, 2012 by Eric Buehl

Photo credit to Michael Sonnabend

During one of our first internal hackathons at Hulu we set out to redo a portion of our service monitoring tools. Within two days we had completed most of the functionality for a solution we call “Canary” — named after the miner’s canary. Canaries were once used during mining operations to alert miners (by the absence of singing) to the presence of invisible yet deadly gasses. Similarly, the canary service sniffs out potential problems and alerts the appropriate parties.

While Hulu has a multitude of other monitoring systems for machine, application, and overall system health, Canary provides a last-resort notification if a service instance dies for any reason — ranging from application crashes, to machine lockups, to network issues.

Design and Implementation:

The design goals for canary were as follows:

  • Make it dead simple to integrate
  • Zero configuration for service owners
  • Consume as little host resources as possible
  • Centralize all notifications so they can be routed and filtered to the most appropriate owner

Canary is implemented in two parts and the first is a few lines of code in each participating service. Here is the Python version in its entirety:

def heartbeat(identifier, hostname=socket.gethostname()):
    message = "HEARTBEAT_" + hostname + "_" + identifier
    sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
    sock.sendto( message, ("", PORT) )
Participating services proactively send periodic undirected “heartbeats” onto the network using a UDP broadcast. Each heartbeat contains host and service-identifying information. They can be sent at arbitrary points within critical event loops similar to a watchdog timer or in a separate thread as long as they are sent frequently enough.

The second part is the canary server. This server listens for heartbeat packets on the network and notifies someone over email when they stop being received. When a new heartbeat is seen for the first time, the server expects to receive periodic updates matching the same identifying string or else an alert is fired.

Because broadcasts are used, only machines on the same network segment can monitor for heartbeats; however, routing tricks can be employed to ensure that all heartbeats are collected in a central location. We utilize two different methods of relaying heartbeats between networks by converting broadcast packets to unicast packets, the first being with Cisco’s “ip helper-address” option applied to a routing interface and the second method is with an active relay that listens within each network of a zone.

Canary doesn’t have to be limited to services as it can be anything that should be constantly running: cron jobs, backups, etc. A simple future enhancement would be to support arbitrary per-instance expected arrival times. More important services might want to have a shorter timeout period while others — like a backup job — may only be expected to heartbeat once a day.

Eric Buehl is a software developer in the DevOps team at Hulu.

Last comment: Nov 28th 2012 1 Comment

Automating System Provisioning and Application Deployment With Chef

July 6th, 2012 by Steve Jang

Chef is an open source configuration management and system integration framework that is generating a lot of interest and momentum these days. At Hulu, we are also actively investing in Chef infrastructure to simplify operations.

Among the many benefits that Chef provides, I am personally intrigued by the possibility of unifying “server provisioning” and “application deployment” under single framework. Traditionally, server provisioning has been the territory of system administrators (a.k.a the ops team), while software developers usually guide application deployment.

This division is a natural one, since correctly provisioning a server or an entire infrastructure often involves fairly different skill sets and knowledge from those of software development. Nonetheless, the requirements for how applications are installed and configured come from software developers. This division is unfortunate, since it invites developers to be one layer removed from actual production considerations. This often results in an application working perfectly in the development environment, but breaking once it is deployed to production.

There is no silver bullet for this problem, but bringing system provisioning and application deployment under same framework is a step in the right direction. This creates an environment in which developers and operations can collaborate closely towards a common goal – completely automating the process of bringing up a functional application server starting from bare metal hardware.

System Provisioning

Automated provisioning process should be managed under source control, just like any application code. It should be deterministic, repeatable, flexible, well-organized and predictably convergent. Here are some of Chef features and characteristics that can help implement such an automated process.

1) Chef recipes, roles and “run lists” define the order of configuration changes deterministically. They are just Ruby code, and they will execute in the same sequence of operations every time they run.

2) Chef attributes can be overridden at multiple levels of organization, and you can normalize your configuration items (e.g. node attributes in Chef) into cookbook, environment, role or node defaults and overrides. Here is the actual node attribute precedence from low to high:

cookbook default < environment default < role default < node default
    < cookbook set < node set < cookbook override < role override
              < environment override < node override
By using this precedence rule, you can configure node attributes across an entire Chef environment with a single configuration change, or override one specific node’s attribute without making changes to the rest of the environment.

3) Chef provides search capability in the form of a Ruby API as well as web service. This allows your recipes to query the server and make configuration decisions based on the query result. With this feature, you can write your recipes in such way that every node in your infrastructure can automatically converge towards the correct configuration for the environment to which it belongs.

The above features are only a subset of the capabilities provided by the Chef framework. There are yet other benefits, such as an active community of experienced system administrators and developers. But enough talk about provisioning-related benefits: let’s examine how you might integrate your application deployment with Chef!

Application Deployment: Ruby on Rails Example

If you work on web applications, you often deploy your applications straight from source code without going through a build process or packaging system. Chef supports such deployment scenarios very well. Let’s go over a Ruby on Rails deployment scenario to make this more concrete.

Once you have your Ruby environment set up, Rails scaffolding makes getting up and running with Ruby on Rails a breeze. However, when you are ready to deploy your Rails application to production there are many questions that need to be answered:

  1. How will you install the version of Ruby necessary for your Rails application?
  2. How do you create your application’s service account?
  3. How do you set up data and log directories? What about log rotation?
  4. How do you set up a reverse proxy, such as nginx?
  5. How do you start/stop/monitor your Rails application process?

Let’s look at how you can use Chef to address these questions! We will use an application called “cage” as a convenient example below. Cage is a Rails application I wrote for the Hulu Hackathon in March 2012.

Quick Introduction to Chef Concepts

Before we dive into details around the recipe for deploying a Rails application, let’s review a minimum set of basic Chef concepts.

Chef server is the repository of infrastructure configuration information. It is basically a data stroage backend and corresponding web service front end that provides RESTful APIs for chef clients to access configuration information about the node that the client is running on. A node maps to a machine in your infrastructure, and every node belongs to a Chef environment. You can organize your infrastructure into multiple Chef environment for management purposes. For example, you might have production, staging, and QA Chef environments.

A cookbook is a collection of related recipes, attributes, and templates files, which collectively describe how a software package is to be installed and configured. Recipes consist of resources, which represent the smallest unit of configuration activity. For example, a “user” resource represents a local user account to be provisioned according to resource attributes, such as user ID, group ID and home directory. Cookbooks are uploaded to Chef server to represent infrastructure components that are available.

Each node in Chef has a run list that lists recipes and roles are assigned to it in the order of execution. A role consists of other roles and recipes that should be executed. Hence, you can compose a new application role based on existing roles and recipes. For example, an application role may consist of a nginx recipe, a rails recipe, and a log rotation recipe.

A data bag stores arbitrary information about the infrastructure in a nested hash structure. Just like any other Chef objects, it can be accessed via RESTful API. A data bag does not belong to a specific Chef environment, so it should be used to store truly global configuration items. You can also encrypt a data bag to store sensitive information that you need to keep out of your source code repository.

You can read about these concepts in more detail at Opscode community wiki site.

Cage Cookbook Directory Structure

Here is my cookbook structure for cage (inside Hulu’s Chef git repo).

cage
├── attributes
│   └── default.rb                 # default cookbook attributes
├── metadata.rb                    # cookbook definition
├── README.rdoc
├── recipes
│   ├── default.rb                 # includes service, deploy, nginx, runit recipes
│   ├── deploy.rb                  # cage:deploy recipe contains deploy_revision resource
│   ├── nginx.rb                   # cage:nginx recipe
│   ├── runit.rb                   # cage:runit recipe (= process init/monitoring)
│   └── service.rb                 # cage:service recipe
└── templates
    └── default
        ├── logrotate.conf.erb     # logrotate configuration template
        ├── nginx.erb              # nginx configuration template
        ├── service.yml.erb        # application configuration template
        ├── sv-cage-log-run.erb    # runit svlogd “run” file
        └── sv-cage-run.erb        # runit “run” file
As you can see, rather than having one default recipe file, I broke up the recipe into 4 small pieces, and include them in main recipe file recipes/default.rb.

Ruby Installation

If you are running on an Ubuntu platform (e.g. Ubuntu 10.04 LTS), you may not have the latest Ruby package in the official Ubuntu apt repository. For example, Cage was developed as Rails 3.2 application running on Ruby 1.9.3. Also, I would like my application’s Ruby installation to be separated from system Ruby installation if any.

For this, we have an internal Debian package called hulu-ruby19, which installs a clean copy of Ruby into /opt/hulu/ruby-1.9.3 directory. This package also contains the latest rubygem module as well as the latest bundler gem. The idea is that once this package is installed, any ruby gems can be managed by the bundler from there on. We also have an internal Debian repository that will serve up this package. So, in my Chef recipe, I just need to add the following line at the top of cage/recipe/deploy.rb file.

package 'hulu-ruby19'

Application Account Creations

It is usually a bad idea to run your applications as root. At Hulu, our applications run under application-specific machine-local accounts, and each account needs to be created. For Cage, I add the following snippet to the JSON file that define the service account data bag.

"cage": { "uid": 975, "gid": 975, "shell" : "/bin/false", "home": "/tmp", “system”: true},

We store our accounts in a data bag, and create it consistently across all Linux boxes. In this case, we are creating an account called “cage” with UID/GID of 975. This is a “system” account that my application will run under, and has no interactive login credentials. Once you have this data bag item, it’s trivial to map the data to the corresponding user resource in Chef DSL.

Application Directories

Now that we have the two external dependencies taken care of (Ruby installation and user account creation), let’s examine how Cage is installed. All files used by our application are installed into /opt/hulu/cage directory, which is laid out as follows.


$ tree -L 3 /opt/hulu/cage
/opt/hulu/cage
├── current -> /opt/hulu/cage/releases/5ed123d91ee74210341765d76271d314fe58f3d0
├── releases
│   ├── 5ed123d91ee74210341765d76271d314fe58f3d0
│   │   ├── app
│   │   ├── conf -> /opt/hulu/cage/shared/conf
│   │   ├── config
│   │   ├── config.ru
│   │   ├── data -> /opt/hulu/cage/shared/data
│   │   ├── db
│   │   ├── doc
│   │   ├── Gemfile
│   │   ├── Gemfile.lock
│   │   ├── lib
│   │   ├── log -> /opt/hulu/cage/shared/log
│   │   ├── public
│   │   ├── Rakefile
│   │   ├── README.rdoc
│   │   ├── script
│   │   ├── test
│   │   ├── tmp -> /opt/hulu/cage/shared/tmp
│   │   └── vendor
│   └── f419316f9bc63bc20ee7f348f7519461771916ee
│       └── ...
└── shared
    ├── ...
    └── vendor_bundle
First, specific snapshots of the application repository are installed into the /opt/hulu/cage/releases/<revision> directory. Then, symbolic links are created into /opt/hulu/cage/shared directory, which survives across application deployment. Finally, /opt/hulu/cage/current symbolic link is switched to point to the latest release (in this case 5ed123d91ee74210341765d76271d314fe58f3d0 directory). We use symbolic links here because Ruby on Rails assumes that it can write to certain subdirectories from its installation root. For example, Rails will create log files under the Rails.root + “/log” directory.

If you have used Capistrano before, this process should be familiar. This style of deployment was directly ported from Capistrano, and is available as Chef resource called “deploy_revision”. Let’s examine the use of deploy_revision resource inside cage/recipes/deploy.rb below.

CAGE_SERVICE_ROOT = '/opt/hulu/cage'
deploy_revision CAGE_SERVICE_ROOT do
  deploy_to         CAGE_SERVICE_ROOT
  repo              'ssh://hulu-internal-git-repository/repos/cage.git'
  ssh_wrapper       "/home/chefclient/bin/gitssh"
  revision          node['cage']['revision_tag']
  action            node['cage']['release_action']
  shallow_clone     true
  enable_submodules true
  migrate           false
  environment       "RAILS_ENV" => node['cage']['rails_env']
  purge_before_symlink %w{conf data log tmp public/system public/assets}
  create_dirs_before_symlink []
  symlinks(                        # the arrow is sort of reversed:
    "conf"   => "conf",            # current/conf          -> shared/conf
    "data"   => "data",            # current/data          -> shared/data
    "log"    => "log",             # current/log           -> shared/log
    "tmp"    => "tmp",             # current/tmp           -> shared/tmp
    "system" => "public/system",   # current/public/system -> shared/system
    "assets" => "public/assets"    # current/public/assets -> shared/assets
  )
  before_restart do
    Dir.chdir '/opt/hulu/cage/current'
    system("/opt/hulu/ruby-1.9.3/bin/bundle install") or raise "bundle install failed"
    system("RAILS_ENV=#{node.cage.rails_env} /opt/hulu/ruby-1.9.3/bin/rake assets:precompile")
  end
  notifies :restart, "service[cage]"
  notifies :restart, "service[nginx]"
end
The above recipe code does the following:

1) Check out the revision specified as node attribute node['cage']['revision_tag'] from the git repository specified by “repo”, using the ssh_wrapper in chefclient user’s home directory. This wrapper is used to specify the necessary credentials to ssh when accessing git repository. Here is the content of /home/chefclient/bin/gitssh.

#!/bin/sh
exec ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i "/home/chefclient/.ssh/id_rsa" "$@"
The -o options given in the ssh command line above are used to avoid interactive warnings like:
The authenticity of host 'hulu-internal-git-repository (10.12.49.133)' can't be established.
RSA key fingerprint is 89:62:0f:9c:78:d6:f5:ce:e6:b8:36:38:e4:c7:d4:f0.
Are you sure you want to continue connecting (yes/no)?
The -i option specifies the private key to be used for accessing the git repository.

Notice that node[’cage’][’revision_tag’] attribute can be overridden at the Chef environment level (e.g. “production” or “staging” or “qa”) to specify the revision to be deployed across all machines that belong to each chef environment.

2) node[’cage’][’release_action’] attribute may be “deploy”, “force_deploy” or “rollback”. The Chef client will examine the currently deployed revision, and will bail out if it determines that that this revision is already deployed — i.e. that the target release tag is the current revision tag. The “force_deploy” option is a convenient way to override this behavior, allowing a full redeployment of the application. This is particularly handy if you are debugging your Chef recipe that performs deployment.

3) Other interesting resource attributes are “purge_before_symlink” and “symlinks”. These attributes specify symbolic links to be created from /opt/hulu/cage/releases/<revision>/ directories to /opt/hulu/cage/shared directory.

Note that the direction of arrows in “symlinks” attribute appear to be reversed. “system” => “public/system” means to create a symbolic link from “/opt/hulu/cage/releases/<revision>/public/system” to /opt/hulu/cage/shared/system directory.

4) There are several hooks that allow you to inject some custom actions during deployment. In this example, we are using the “before_restart” hook to precompile Rails assets (under Rails.root + “/app/assets” directory in source tree) before restarting the service. This is required if your Rails application is running in “production” mode.

5) Finally, notice the “notifies” attributes at the bottom of the resource definition. Notification resources in Chef are used to delay certain actions until prerequisites are satisfied. Elsewhere in our “cage” recipe, we defined the cage service resource and the nginx service resource. Here, we are notifying those resources to “restart” themselves once deployment is finished.

Nginx Set Up

You can use the community nginx cookbook to set up nginx fairly easily. In our example, I wanted to show how you can serve up your application from port 80, which is by default assigned to the default nginx site. Here is what our recipes/nginx.rb looks like:

template "#{node.nginx.dir}/sites-available/default" do
  source "nginx.erb"
  owner "root"
  group "root"
  mode 0644
  variables(
    :service_name     => "cage",
    :service_port     => "80",
    :worker_port      => "8800",
    :nginx_access_log => "/opt/hulu/cage/shared/log/nginx-access.log",
    :nginx_error_log  => "/opt/hulu/cage/shared/log/nginx-error.log"
  )
end
nginx_site 'default' do
  options(:enable => true)
end
Here is what cage/templates/default/nginx.erb looks like:
server {
  listen <%= @service_port -%>;
  server_name <%= @service_name -%>;
  access_log <%= @nginx_access_log -%>;
  error_log <%= @nginx_error_log -%> warn;
  location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass http://localhost:<%= @worker_port -%>/;
  }
}
This will set up /etc/nginx/sites-available/default by instantiating the ERB template, and enable the nginx default site that will proxy for our Cage application.

Runit Set Up

Runit is a process initialization and monitoring solution often used by Opscode-authored cookbooks. One obvious benefit of using runit is that you don’t need to worry about writing your own init.d script, which is often a hassle to get right. Runit will capture STDOUT from your application and log it and rotate it automatically. Also, runit monitors and restart your application process if it dies for some reason.

There is a community cookbook for Runit that works well on Ubuntu. Here is how I use this cookbook.

1) Include the runit recipe, and add a runit_service resource as follows.

include_recipe 'runit'
runit_service 'cage' do
  options(
    :service_curr => '/opt/hulu/cage/current',
    :owner        => 'cage',
    :group        => 'www-data',
    :unicorn      => '/opt/hulu/ruby-1.9.3/bin/unicorn',
    :logdir       => '/opt/hulu/cage/shared/log',
    :rails_env    => 'production'
  )
end
We will be running our application process as ‘cage’, but our effective group ID will be www-data. Also, we will be using unicorn to run our Rails application.

2) Create templates/default/sv-cage-run.erb, which will turn into a /etc/sv/cage/run file when instantiated (replace “cage” in the ERB file name with your application name). The content of the ERB file looks like the following.

#!/bin/bash
cd <%= @options[:service_curr] %>
exec 2>&1
exec chpst -u <%= @options[:owner] %>:<%= @options[:group] %> <%= @options[:unicorn] %> -E <%= @options[:rails_env] %> -c config/unicorn.rb
3) Create templates/default/sv-cage-log-run.erb, which looks like.
#!/bin/sh
exec chpst -u root:root svlogd -tt <%= @options[:logdir] %>
Note that we are running as root here. I found this to be a bit of a quirk in runit, but nothing too serious. This file gets instantiated into /etc/sv/cage/log/run, which is the process that captures STDIO from the main run file (/etc/sv/cage/run) and writes it to the given log directory in a file named “current” (e.g. /opt/hulu/cage/shared/log/current).

Once you have this in your recipe, runit will take care of starting your application during boot, as well as monitoring and restarting your application process as necessary.

The process that runs our application’s “run” file is a program called svrun, which will spawn the unicorn process for us. Svrundir is the grandfather process that monitors all svrun processes, and restarts them as necessary. You can use ps to see this relationship as follows.

$ ps axjf | grep -e cage -e unicorn -e runsvdir
    1  3776  3776  3776 ?  -1 Ss       0   0:44 runsvdir -P /etc/service log: .......................
 3776  9273  9273  9273 ?  -1 Ss       0   0:00 \_ runsv cage
 9273  9274  9273  9273 ?  -1 S        0   0:00    \_ svlogd -tt /opt/hulu/cage/shared/log
 9273 19694  9273  9273 ?  -1 Sl     975   0:02     \_ unicorn master -c config/unicorn.rb
19694 19700  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[0] -c config/unicorn.rb
19694 19703  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[1] -c config/unicorn.rb
19694 19706  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[2] -c config/unicorn.rb
19694 19709  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[3] -c config/unicorn.rb
Note that once your Rails application under runit’s control, you cannot simply use UNIX kill command to stop it. Runit recipe conveniently maps /etc/init.d/cage command to /usr/bin/sv command, so you can start and stop your application in the usual way (e.g. “/etc/init.d/cage stop” or “service cage stop”).

Role: Cage

It’s also a good idea to create a role that contains your recipe, so that people always think in terms of “roles” rather than individual recipes when putting together a new environment. Here is what the role file looks like for Cage. Here “role[hulu-common]” is the common set of recipes that run on all Hulu machines, and contains basic settings such as user accounts and NTP configuration.

name "cage"
description "Automated visual verification service"
run_list(
  "role[hulu-common]",
  "recipe[cage]"
)

Deploying the Recipe

Once you have the cookbook and role written, you need to upload your cookbook to the Chef server, and run chef-client on your target machine to deploy the role. The chef framework provides a command line tool called “knife”, which is used to administer the Chef server. Here are the knife command I run to deploy cage.

<pre>
knife cookbook upload cage
knife role from file roles/cage.rb
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo /etc/init.d/chef-client stop'
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo chef-client'
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo /etc/init.d/chef-client start'
</pre>
Here I am using knife ssh to run chef-client on all nodes that are supposed to be running Cage in Chef environment named “staging”. And, voilà! My Rails application will be up and running in production mode on all machines that have the role “cage” assigned to them.

Conclusion: Push vs. Pull

In this article, I tried to highlight the Chef features that are designed to help deploy your applications. If you are currently using Capistrano, Fabric or just plain rsync/ssh as your application deployment mechanism, you are used to “push” model. In contrast, Chef is based on “pull” model.

This is an important distinction, in that “push” model implies a lot of explicit actions on system operators’ part. For example, operators need to worry about which machines to push changes to, and when to push the changes. The idea behind “pull” model is that each machine in the infrastructure is downloading and applying its own configuration, without explicit involvement of operators.

Moving your application deployment model from “push” model to “pull” model, you are making your application be more like regular part of provisioning and infrastructure maintenance, hence making the deployment process more scalable and robust. This increased robustness is a core benefit of Chef-based application deployment, in addition to promoting close collaboration between software developers and system administrators under a unified framework and process. This is why we are investing our time into Chef.

Last comment: Jul 29th 2012 1 Comment

GrannySmith for iOS: Open-Source Text Formatting, VoiceOver, and More

June 11th, 2012 by Bao Lei

What I love about working at Hulu is having the opportunity to solve complex problems with creative solutions that improve our user experience. A few months ago, our customer support team received an email from a Hulu Plus subscriber requesting VoiceOver support on our iPhone and iPad app so users that are visually impaired can access Hulu content.

Watch the full story here:

Enabling VoiceOver would have been extremely simple for apps using simple UILabels to display texts. But in our scenario, it was much less trivial.

The Hulu Plus app had styled text in multiple places, and some of it required complicated alignment. For example, the description next to a video thumbnail has a bold show title (which can be one or two rows), a regular video title, and detailed information in gray. Depending on the video’s expiration date, there may also be a line containing an expiration notice, which begins with an icon. And depending on whether captions exist for the video, a cc icon may appear on the last row. This group of metadata is vertically center-aligned.

The code dealing with the rich texts was a bit clunky. The text and icons were rendered in -drawRect: methods of UIView subclasses, with alignment based on multiple -sizeWithFont: calculations and if statements. The logic varied from place to place, so there were similar but unsharable implementations at various locations (e.g. the video page, the show page, thumbnails, and message popovers).

When we received the VoiceOver request, we figured it would be a good opportunity to refactor our text-rendering code base. And that led to the birth of GSFancyText.

What is GSFancyText?

First off, why the “GS” prefix? Because Hulu + iOS = green apple. And what do you think of when you hear “green apple”? Granny Smiths! GSFancyText is the first Granny Smith project of many that will come in the future.

GSFancyText is a rich text drawing library that allows users to format styled text with an HTML/CSS-like markup system. For example, the big chunk of code that we used to format and align the video description can now be simply defined with a simple line like “<p><strong>Family Guy</strong></p><p>Death has a shadow</p><p class=detail>S. 1 : Ep. 1 (22:31)</p>”, with the help of GSFancyText.

It follows the syntax of CSS and HTML, includes some CSS-like attributes (“text-color”, “font-size”, “text-align”, etc) and provides several tags (“<p>”, “<span>”, etc) into which you can insert styles. It is not a true subset of CSS/HTML largely due to the differences between mobile apps and the web.

Of course we considered other options such as directly using HTML in a UIWebView or using NSAttributedString. But we decided to make our own style/markup parsing and rich-text drawing system, because of the following advantages:

  1. It’s faster and consumes less memory than UIWebViews.
  2. We can reuse the styles and parsing results in many places.
  3. We can easily modify the style or text of a small part of a paragraph.
  4. We can potentially extend the system with more fancy features, like Quartz 2D features, animations, gestures, etc.
  5. It makes localization simple. For example, a phrase marked as bold might be at a different position in the sentence in Japanese. In this case we can use a single NSLocalizableString to represent a sentence with various styles.
  6. It’s easy to extract the plain text, on which we can enable the VoiceOver support.

How does it work?

The following example demos how simple it is to use GSFancyText:

NSString* styleSheet = @".green {color:#00ff00; font-weight:bold}";
[GSFancyText parseStyleAndSetGlobal: styleSheet];
GSFancyText* fancyText = [[GSFancyText alloc] initWithMarkupText: @"<span class=green>Hulu</span> Plus"];
Then we can directly draw this fancyText object in a customized UIView object’s drawRect method:
[fancyText drawInRect: rect];
Or create a GSFancyTextView to display it:
GSFancyTextView* fancyView = [[GSFancyTextView alloc] initWithFrame:frame fancyText:fancyText];
Beyond that, the GSFancyText’s killer feature is the ability to insert any image or native iOS drawing code anywhere inside the styled paragraph. For example, if we want to insert a TV icon between Hulu and Plus, we can simply do:
GSFancyText* fancyText = [[GSFancyText alloc] initWithMarkupText: @"Hulu <lambda id=tv width=40 height=40> Plus"];
[fancyText defineLambdaID:@"tv" withBlock:^(CGRect rect) {
    UIImage* image = [UIImage imageNamed: @"tv"];
    [image drawInRect: CGRectMake(rect.origin.x, rect.origin.y, image.size.width, image.size.height);
}];
The lambda tag is magical. You can draw images, call CoreGraphics methods, draw text interlaced with images – virtually anything you can do with Objective-C code.

Often a UI designer will ask for consistent, custom styles across the whole app. To support this, we’ve made GSFancyText styles and parsing results reusable via a global stylesheet. Styles are parsed only once, then can be used anywhere, for every GSFancyText object. In many cases, a markup text structure can be reused too. One typical example is a table with many cells based on styled texts. All cells have the same format, but the text and attributes (e.g., color) differ. In this scenario, we can keep a global, static copy of the parsed structure and replace the text and styles inside certain tags. For example we have a GSFancyText object (let’s call it fancyText) based on the markup string “<p id=title_line>the dummy title</p>”, we can simply call:

[fancyText changeToText:@"the real title" forID:"title_line"];
This changes “the dummy title” to “the real title,” leaving the object’s markup structure intact.

Fascinated? Hoping to use this in the next version of your app? Check it out on our Github page:

https://github.com/hulu/grannysmith

Or check out our wiki page for a full list of supported syntax and styling attributes:

https://github.com/hulu/GrannySmith/wiki/GSFancyText

Implementation challenges

Finally, we want to share some of the challenges this project posed. Parsing style sheets and markup text isn’t all that difficult. But what about line breaks? It may be trickier than you think at first. Line-breaking rules vary among natural languages. In English we typically wrap words by separating words by spaces, and we generally don’t break in the middle of a word. But in Chinese, we can place line breaks between any two characters, as long as a line doesn’t begin with certain punctuation marks. And there are many languages that none of us at Hulu knows well enough to comment on, so we can’t make assumptions about their rules. To solve this problem, we left the burden of rule determination to Apple’s -sizeWithFont: method and designed the following algorithm that is otherwise universal across natural languages:

  1. Take enough characters from the beginning of the string to fill up a little more than one line.
  2. Get the size of the substring taken in (1) (with the width limit) and set the calculated width to our target width.
  3. Remove characters from the end of the substring until the height of the substring (with the width limit) is equal to one line and its width is equal to the target width.
  4. Form a line with the current substring.
  5. Go back to 1 and start from the first character after the ones we used to form the last line.

We also put some thought into designing the data structure for storing parsed markup. This structure has to facilitate searching and text/style replacement. We used a tree structure with two kinds of nodes: container nodes and content nodes. A container node is based on a markup tag. It stores an array of child nodes as well as the styles defined by its class. A content node can either be a piece of text or a lambda block. It inherits the styles of its parent container node. The root node of a tree is a special kind of container node. In addition to its array of children, it also stores two hash maps for fast search of a given ID or a given class name. Each node also stores a reference to its parent. So when we append a new subtree under container node A, all styles along the ancestral path of node A are passed onto the new subtree (container nodes in the new subtree can either reject or accept a style that is passed down based on whether this style is defined in itself already).

We didn’t just assume the performance of this code. We have constructed different test cases to compare the rendering speed of GSFancyText to some other solutions. Testing results vary from case to case, but in general, if we reuse the parsing result of GSFancyText, its speed is quite similar to (just a little slower than) directly drawing text in a customized UIView, while using a UIWebView is normally 10 times or more slower. In our real-world example (the Hulu Plus featured video table), GSFancyText takes about ~6 milliseconds to render a table cell on iPhone 4S, while directly interacting with the drawRect method takes ~5 milliseconds (the code for this implementation was quite ugly). The extra millisecond is mainly consumed by replacing the text in the pre-parsed structure. We made several optimizations to improve performance, like skipping the line-break logic if we have already reached the line-count limit and skipping the -sizeWithFont: calculation when assigning the space for the last segment in a line.

What’s next?

We are constantly improving the code, and we look forward to seeing your fancy app with GSFancyText making big money in the App Store.

Bao Lei (aka “The THUNDER STORM”) is a software developer in the mobile team who works on our iOS platform.  

Last comment: Apr 24th 2013 2 Comments

At a glance: Hulu hits Rails Conf 2012

May 14th, 2012 by Andrew Carter

RailsConf 2012

RailsConf has always been one of my favorite conferences since I started going to it in 2007. I have been primarily a Rails developer since late 2006, long before coming to Hulu. I’ve seen it transition from a promising upstart to the platform it is today. The influence of Rails has been huge. The ideals of Rails – convention over configuration, DRY, test first, and agile are quickly becoming pervasive throughout software development.

As Hulu has been built with Ruby on Rails since the beginning, we decided to give back and sponsor this year’s RailsConf 2012 in Austin. We had seven Hulu employees at the conference. Our web site as well as several services are built with Ruby on Rails.

We gave a talk Building Asynchronous Communication Layer on our project for programatically managing devices like PlayStation 3 and Android mobile phones. The project allows us to do things like automate playback testing. The technology used to build the framework includes XMPP, JavaScript, and Ruby.

Here’s my daily summary of the events:

Day 1

The morning opened with David Heinemeier Hansson’s keynote. He talked about progress both for us as programmers and the Rails community. He stated that it’s ok to disrupt and to not get too comfortable, and emphasized that new programmers to the platform really need to learn even if it is hard or you make mistakes… I’d agree strongly with that. I’m not a fan of creating artificially easy on-ramps. I want to see Rails include anyone that wants to join but at the same time I expect that developers learn the right set of skills. Some of the talk was a bit preachy, but the sentiment of not settling and accepting change are good advice.

The next talk I went to was Sarah Mei’s presentation on Backbone.js. It was a very well done talk. Sarah is an excellent presenter striking a good balance between content and moving the audience through the topic. She gave a quick tour of what Backbone.js is about and did a nice job of relating it to the Rails structure. She did such a good job I was starting to think maybe I should look at this framework much closer as a solution for my own projects, but then at the very end she sort of torpedoed the entire talk saying that she probably wouldn’t use it anymore. I appreciate the honesty but it was kind of a surprise given the strong case she made the previous 45 minutes.

I went to Mark Bates’s talk on Coffee Script as I’m already a believer so this was not really new. It certainly reinforced how great Coffee Script is and I think Mark convinced at least one of my teammates to consider it seriously.

I hit the first major speed bump at Andy Maleh’s talk on Rails engines. He did little to convince me to look at Rails engines. His solutions using engines sounded more like hacks to me as I didn’t see a clear path to code reuse. It felt like trading one complexity for another.

John Bender’s talk on Progressive Enhancement for Mobile Web was next. John is active with JQuery Mobile. He talked a lot about the state of targeting mobile browsers and had some very particular scorn for Android (as he said the “new IE”). One thing I wish he would have talked about more is progressive design since he talked almost exclusively about segregating mobile from desktop browsers. I think it is time to be thinking mobile first with progressive scaling up. It’s disappointing to see separating a mobile site from the main site.

Finally, the closing keynote for the day was by Rich Hickey. Rich talked about simplifying but not from the programmer’s perspective. He had a lot of great points – we often design software to make our lives as programmers easy but not necessarily the user. The tone of his talk though came off a little sanctimonious as I would have liked a little more acknowledgement of the pragmatism that often leads to the decisions we make.

Day 2

Aaron Patterson presented the keynote in the morning. He didn’t try to compete with his over-the-top presentation from last year. Aaron was downright subdued compared to his previous talks and seemed to be encouraging a bit of a back to basics appeal, talking about normalizing things like the queue interface in Rail, etc.

The first talk I went to on the second day by Mike Moore on presenters and decorators. Mike gave a great talk with good takeaways. He did an excellent job breaking down the decorator, presenter, and mediator patterns. It was great timing for me as I’ve been looking at these very things for a project I’m doing now.

Next was Ilya Grigorik on making the web faster. This was a good overall talk on leveraging a number of tools and techniques. Since Ilya is a Google engineer, he was heavily biased to the Google tool kit, but his advice was valid despite what toolkit you might use. A key take-away was focusing on perceived load time for the user over most other performance metrics. It’s a really good point: the user’s experience trumps most anything else.

After lunch, I attended Will Leinweber’s talk on schemaless SQL in Postgres. There are some pretty amazing things coming to PostgreSQL, like the new key value hstore. They are also integrating V8 into the programming space of Postgres which opens up some new scenarios that compete directly with solutions like MongoDB. It’s early for much of this, but it’s a very forward looking approach for Postgres to be taking.

My colleague Steve Jang and I presented a short tour of our automation project for Hulu devices called Bender. We use XMPP, Ruby, and JavaScript to create a communication framework for controlling devices and running scripts. I think the talk went well, but we weren’t sure if the material would be useful to people or not. People had some great questions and so it was great to get a conversation going on projects we have here at Hulu.

Day 3

Yehuda Katz talked about The Next Five Years for Rails. Honestly, Yehuda’s talk should have been the day 3 keynote since I think it was the most honest and informative talk of the entire conference. Yehuda did a great job calling out what it is about Rails that attracted all of us to the platform in the first place. He said (and I agree) that the next big thing should be making Rails just as good for JSON API services as it is for HTML. It’s a mixed world more than ever; building web services whose clients are primarily mobile devices only is increasing.

Nick Quaranto had a great tour of Basecamp Next: Code Spelunking. It was actually nice to see that 37signals has some of the same kinds of tradeoffs in their software as everyone else. One of the big takeaways was to always be pragmatic about things like changing data stores. There were numerous things I want to investigate including all the cool JavaScript console tricks, using GitHub for API documentation (see 37signals BCX API), and the strong parameters gem.

Jared Ning had a nice overview of minitest. If you hadn’t seen it before, it was a good tour of what minitest is about and how it draws from both Test::Unit and RSpec. I’m already in the minitest camp so there wasn’t a lot new for me. I liked the explanation Jared used to talk about mocking and stubbing. I agree with his advice – use mocks and stubs sparingly and only after you test against the real thing.

Summary

Overall, it was another great RailsConf. As a developer that switches among multiple platforms and languages, it’s always great to dive into my favorite language: Ruby. Rails feels to me like it is continuing to mature. It’s moving a little slower now, but I think that’s to be expected. There is still lots opinions among the faithful and that’s healthy. As always, I look forward to apply what I’ve learned to our own projects.

It was great to share what we are doing with Ruby and Rails at Hulu. We would love to talk to anyone passionate about Ruby (as you can see from our list of jobs). I’m glad I got a chance to visit Austin, but I’m definitely looking forward to going back again when the conference returns to Portland.

Hackathon Spring 2012

April 23rd, 2012 by Ilya Haykinson

As we like to do every so often, a few weeks ago we held our hackathon. For the uninitiated, a hackathon is a chance for developers to take a break from their regular duties and work on something new, something unexpected, something experimental. For this hackathon, most of our development team in LA went off-site, grouped around makeshift workspaces, and hacked on a bunch of different projects in small teams. Our Seattle team also formed ad-hoc groups to write some new code. This time around we chose to spend a Thursday and a Friday working on our projects, and then came together early the following week to share our results with each other. In all, there were 18 different projects that made it far enough to be presented. They ran the gamut from setting up an NFS/HDFS proxy to a keg bot that allows you to remotely pour yourself some beer. Below are a few projects that we worked on.

Blinker

Yupeng and Eugene got curious and wanted to create a visualization of Hulu streams across the US. The result is Blinker, an internal website that plots points on a map corresponding to IP addresses beginning video streams.

To build this, they put up a CherryPy-based web service that listens for UDP unicasts containing rough geo-location data. A quick modification to our stream control service allowed it to send data for each request to the Blinker service. A web page embeds a Google Map, and uses WebSockets (and ws4py on the backend) to continually update data.

 

ShowGraph Hang and Feng decided to spend their hackathon looking at graphical representations of show similarity. They used our recommendation system’s show similarity data to perform multidimensional scaling and clustering, resulting in a zoomable chart that plots shows in similarity clusters.

The scaling and clustering was built using R, and the front-end was developed using processing.js.

Parley Fed up with the complexity and cost of our current phone conferencing system, David, Sherin and Ilya came up with Parley – Hulu’s phone conference room system. Using Django and the Twilio API, they built a service that lets Hulu employees create conference rooms with access codes. Callers can then call into an access number and participate in a phone conference. Meanwhile, the employee that created the room can see a log of room accesses and optionally invite people by typing in their phone number (or selecting them from the company contact list) and having the system call them up with an offer to join a phone conference in progress.

Cage One of our teams in Seattle works on software for connected devices – gaming consoles, set-top boxes, internet-connected televisions, and the like. Since these are often fairly closed platforms, debugging and testing can be a challenge, with some validation steps requiring a human to eyeball a screen and see if it’s correctly displaying an application. To help humans with this task, Steve and Dallas built Cage – a system that uses an Android phone and a backend service to capture screenshots of an application at various stages of testing.



When a connected device application runs, it displays a QR code describing the device and the test being performed. The Android app uses the camera to capture the barcode, and then takes a picture of the application as it performs various tasks. These photos are then sent to a service which displays the photo that was taken alongside an “expected results” picture. This will allow us to run some validation tests unattended, and then have humans verify the outcomes in bulk.

DJ Bot Unsatisfied with a quiet environment, Scott decided to build a bot to let us all play DJ. Scott took Alain Gilbert‘s code for creating bots that talk to the turntable.fm API, tweaked it a bit and integrated it into our node.js-based bot framework. Since our bot sits in all the channels in our HipChat chatrooms, we can now issue control a “DJ session” by issuing commands like this:

!dj start             starts playing !dj stop              stops playing !dj skip              skips the current song !dj info              shows the curent song info !dj history           shows playlist history !dj search            search for songs, artists, albums !dj q list            shows songs in your dj queue !dj q clear           removes all songs in your dj queue !dj q add             adds song as next in your dj queue, where is from search output !dj q rm              removes song at from your dj queue !dj q mv              moves a song index index in your queue

And the bot commands a turntable session appropriately. Just add a big speaker, and we’re (collaboratively) grooving.