Hulu Plus launched out of beta in November of last year and it's currently available on a number of mobile and living room connected devices. One of the core delivery protocols that Hulu relies on for video streaming is called HTTP Live Streaming (HLS). It's documented by Apple Inc. (see IETF) and HLS is now widely used—already available as a video delivery vehicle on some of the major devices on the market, like the iPhone/iPad, Sony PlayStation®3, Roku, Android 3.0, and more.
In a nutshell, the HLS protocol delivers video over HTTP via a playlist of small segments that are made available in a variety of bitrates from one or more delivery servers. This allows the playback engine to switch on a segment-by-segment basis between different bitrates and content delivery networks (CDN). It helps compensate for some of the network variances and infrastructure failures that might occur during playback.
But what determines a quality viewing experience—and uninterrupted video playback? That largely depends on the client's playback engine. Different device platforms usually have different playback engines, each with its own implementation of the HLS protocol. Such protocol implementations often differ not only in completeness of the HLS implementation, but also in streaming and bitrate switching heuristics—like choosing how to act under different network conditions, for example. These differences are especially noticeable with sudden changes in the stability of the network, when playing content on low-bandwidth networks, and with partial failures in the video delivery infrastructure.
Hulu reaches millions of consumers every month, so we're exposed to a wide spectrum of network conditions. Hitting edge playback scenarios is not uncommon. We provide a client-facing application, so maintaining the best viewing experience possible, even in suboptimal conditions, ultimately falls as the responsibility of the Hulu Plus app. When a user reports a problem with playback, we need to be able to simulate the user's environment (network conditions, device, content played) so we can determine the root cause of the problem and find out whether there is a viable solution for it. Sometimes this will result in discovering issues with the playback engines. Unless we can reproduce these sorts of problems, it would be virtually impossible to hand them off to a device manufacturer to fix. Other times, the right solution would be to make the UI more forgiving, to make the app smarter about recovering from unexpected failures. Either way, having the ability to reproduce the troubled scenarios is key to taking appropriate action.
At Hulu, to address the need of being able to reproduce the kinds of scenarios, we recently started working on an infrastructure service called DripLS (abbreviated from Drip LiveStreaming). The purpose of the service is to traffic-shape a video stream in accordance to a set of rules. It's an attempt to simulate real-world network conditions to help ensure that clients and streaming engines degrade gracefully and deliver the best viewing experience possible. DripLS acts as an intermediary between the server hosting HLS video segments and the HLS client, caching segments that need to be traffic shaped and rewriting the m3u8 playlists that the HLS clients receive. The basic flow of the service is outlined in Figure 1.
Figure 1. DripLS workflow
For example, DripLS allows us to simulate a sudden network drop that will cause the video playback to “stall.” It can simulate missing segments that will cause a playback “skip,” too, or simulate a mid-stream CDN failure, thus exercising CDN fallback scenarios. It’s also capable of serving video files as they would be transmitted on a low-bandwidth or “lossy” network. DripLS has almost countless useful applications for validating video playback, and these are just a few of the ones we’ve been able to capture and experiment with since we built the service. The results have already helped us in making streaming more reliable and resilient to failures and making our client-side monitoring infrastructure more aware of these problems when they occur in production.
How does it work?
DripLS appears as a normal HLS endpoint that can be used directly by any HLS client. This allows the service to be ready for use without additional provisioning by any device that supports HTTP Live Streaming. To achieve the desired traffic shaping, the URL to DripLS can be given a set of rules via its query string, which control how the incoming stream will be shaped. The DripLS URL is in the following format:
A sample of how an actual DripLS URL might look like would be:
In the example above, the transmitted stream, denoted by cid (content id), is instructed to return an HTTP error code 404 for the variant playlist encoded at 650kbit/s bitrate as well as return HTTP error code 500 for all video segment files in the 1500kbit/s bitrate playlist. Additionally, segment 2 from CDN 1 in all variant bitrate playlists will be transmitted back at 10kb/s with 1% packet loss.
DripLS supports two sets of rule classes: e<> and net<>. Matches from the e<> class result in direct rewrites of URLs in the HLS m3u8 playlists to specific URLs that raise the specified HTTP error code. Matches from the net<> class are a little more involved and result in caching and transmitting the matched segments under the rule specified network conditions.
DripLS uses a combination of technologies to achieve the desired traffic-shaping effect. Under the hood, the current setup consists of two nginx sites that proxy between each other on different ports, and ultimately forward to a cherrypy server that handles the business logic for DripLS (all on a single machine). The segment request always comes through the first nginx site that listens on port 80, which then proxies to the second nginx site on an arbitrary (already pre-shaped for the segment) port, which ultimately forwards to the cherrypy instance. The reason this setup is needed is that, in order to attain the desired traffic shaping, DripLS makes use of tc (traffic control), netem (Linux kernel module), and iptables (network rule chaining), for which the smallest level of granularity is a port.
The basic architecture of the service is shown on Figure 2.
Figure 2: DripLS architecture
Every time an HLS segment is to be traffic-shaped, it’s done exclusively on a port, which is reserved for the segment transmission to the client. The port is shaped via a small custom shell script (see set_ts_lo.sh), in accordance with the desired traffic-shape rule that the segment matched. The URL for the segment is then rewritten in a way that the front nginx site can do a location proxy_pass to the second nginx site, which would accept the request on the already-shaped port. So when the transmission of the segment’s data starts, netem/iptables will make sure that it adheres to the already applied network rules for the port.
set_ts_lo.sh – Script to simplify interaction with netem, tc, iptables
Although DripLS can be used as a remote cloud service, running the service and the device on the same network helps avoid “last mile” deviations from normal alterations in the network between the service and the device. Despite this recommendation, running DripLS on a remote network has yielded consistent results so far for us. Simulations via DripLS are an alternative to hardware based testing, which is a common way to validate network alterations. The DripLS approach has several key advantages. Namely it allows multiple developers to use the service at once; it allows precise and consistent simulations; it is easier to test variety of scenarios; it requires little-to-no setup; it can test on any network; and last but not least it is possible and easy to share pre-shaped streams with partners.
Currently, we're using DripLS mainly for manual testing and ad-hoc reproduction of some interesting playback scenarios. When we receive major device firmware upgrades we touch up on the basics and the more common edge case scenarios using the service. We also want to expand DripLS capabilities with support for additional delivery protocols in the future. We're in the process of deeply integrating our device tests with DripLS, and also arriving with a more standardized set of Acid tests that we can execute across a variety of devices. These efforts will help us establish a level of confidence that the playback engine on a device—and the Hulu app running on top of it—are able to cope with a variety of network conditions and playback scenarios.
Use it, and make it better
DripLS has been so useful for us, that we decided to share it with the world as an open-source tool. You can find DripLS on GitHub at https://github.com/hulu/DripLS -- please feel free to fork, comment, improve, fix, and repurpose as you see fit. We also welcome your comments at the discussion group at http://groups.google.com/group/dripls-dev -- please let us know if you use DripLS, how you like it, and what changes you'd like to see.
Ludo Antonov, a software engineer, is building things that make your brain go mushi-mush. * Header image by Branden Williams, licensed CC-BY 2.0 (see http://www.flickr.com/photos/captbrando/3336992646