• Hulu
  • TV
  • Movies
  • More TV. On more devices.
Search
Hulu Tech Blog

Automating System Provisioning and Application Deployment With Chef

July 6th, 2012 by Steve Jang

Chef is an open source configuration management and system integration framework that is generating a lot of interest and momentum these days. At Hulu, we are also actively investing in Chef infrastructure to simplify operations.

Among the many benefits that Chef provides, I am personally intrigued by the possibility of unifying “server provisioning” and “application deployment” under single framework. Traditionally, server provisioning has been the territory of system administrators (a.k.a the ops team), while software developers usually guide application deployment.

This division is a natural one, since correctly provisioning a server or an entire infrastructure often involves fairly different skill sets and knowledge from those of software development. Nonetheless, the requirements for how applications are installed and configured come from software developers. This division is unfortunate, since it invites developers to be one layer removed from actual production considerations. This often results in an application working perfectly in the development environment, but breaking once it is deployed to production.

There is no silver bullet for this problem, but bringing system provisioning and application deployment under same framework is a step in the right direction. This creates an environment in which developers and operations can collaborate closely towards a common goal – completely automating the process of bringing up a functional application server starting from bare metal hardware.

System Provisioning

Automated provisioning process should be managed under source control, just like any application code. It should be deterministic, repeatable, flexible, well-organized and predictably convergent. Here are some of Chef features and characteristics that can help implement such an automated process.

1) Chef recipes, roles and “run lists” define the order of configuration changes deterministically. They are just Ruby code, and they will execute in the same sequence of operations every time they run.

2) Chef attributes can be overridden at multiple levels of organization, and you can normalize your configuration items (e.g. node attributes in Chef) into cookbook, environment, role or node defaults and overrides. Here is the actual node attribute precedence from low to high:

cookbook default < environment default < role default < node default
    < cookbook set < node set < cookbook override < role override
              < environment override < node override
By using this precedence rule, you can configure node attributes across an entire Chef environment with a single configuration change, or override one specific node’s attribute without making changes to the rest of the environment.

3) Chef provides search capability in the form of a Ruby API as well as web service. This allows your recipes to query the server and make configuration decisions based on the query result. With this feature, you can write your recipes in such way that every node in your infrastructure can automatically converge towards the correct configuration for the environment to which it belongs.

The above features are only a subset of the capabilities provided by the Chef framework. There are yet other benefits, such as an active community of experienced system administrators and developers. But enough talk about provisioning-related benefits: let’s examine how you might integrate your application deployment with Chef!

Application Deployment: Ruby on Rails Example

If you work on web applications, you often deploy your applications straight from source code without going through a build process or packaging system. Chef supports such deployment scenarios very well. Let’s go over a Ruby on Rails deployment scenario to make this more concrete.

Once you have your Ruby environment set up, Rails scaffolding makes getting up and running with Ruby on Rails a breeze. However, when you are ready to deploy your Rails application to production there are many questions that need to be answered:

  1. How will you install the version of Ruby necessary for your Rails application?
  2. How do you create your application’s service account?
  3. How do you set up data and log directories? What about log rotation?
  4. How do you set up a reverse proxy, such as nginx?
  5. How do you start/stop/monitor your Rails application process?

Let’s look at how you can use Chef to address these questions! We will use an application called “cage” as a convenient example below. Cage is a Rails application I wrote for the Hulu Hackathon in March 2012.

Quick Introduction to Chef Concepts

Before we dive into details around the recipe for deploying a Rails application, let’s review a minimum set of basic Chef concepts.

Chef server is the repository of infrastructure configuration information. It is basically a data stroage backend and corresponding web service front end that provides RESTful APIs for chef clients to access configuration information about the node that the client is running on. A node maps to a machine in your infrastructure, and every node belongs to a Chef environment. You can organize your infrastructure into multiple Chef environment for management purposes. For example, you might have production, staging, and QA Chef environments.

A cookbook is a collection of related recipes, attributes, and templates files, which collectively describe how a software package is to be installed and configured. Recipes consist of resources, which represent the smallest unit of configuration activity. For example, a “user” resource represents a local user account to be provisioned according to resource attributes, such as user ID, group ID and home directory. Cookbooks are uploaded to Chef server to represent infrastructure components that are available.

Each node in Chef has a run list that lists recipes and roles are assigned to it in the order of execution. A role consists of other roles and recipes that should be executed. Hence, you can compose a new application role based on existing roles and recipes. For example, an application role may consist of a nginx recipe, a rails recipe, and a log rotation recipe.

A data bag stores arbitrary information about the infrastructure in a nested hash structure. Just like any other Chef objects, it can be accessed via RESTful API. A data bag does not belong to a specific Chef environment, so it should be used to store truly global configuration items. You can also encrypt a data bag to store sensitive information that you need to keep out of your source code repository.

You can read about these concepts in more detail at Opscode community wiki site.

Cage Cookbook Directory Structure

Here is my cookbook structure for cage (inside Hulu’s Chef git repo).

cage
├── attributes
│   └── default.rb                 # default cookbook attributes
├── metadata.rb                    # cookbook definition
├── README.rdoc
├── recipes
│   ├── default.rb                 # includes service, deploy, nginx, runit recipes
│   ├── deploy.rb                  # cage:deploy recipe contains deploy_revision resource
│   ├── nginx.rb                   # cage:nginx recipe
│   ├── runit.rb                   # cage:runit recipe (= process init/monitoring)
│   └── service.rb                 # cage:service recipe
└── templates
    └── default
        ├── logrotate.conf.erb     # logrotate configuration template
        ├── nginx.erb              # nginx configuration template
        ├── service.yml.erb        # application configuration template
        ├── sv-cage-log-run.erb    # runit svlogd “run” file
        └── sv-cage-run.erb        # runit “run” file
As you can see, rather than having one default recipe file, I broke up the recipe into 4 small pieces, and include them in main recipe file recipes/default.rb.

Ruby Installation

If you are running on an Ubuntu platform (e.g. Ubuntu 10.04 LTS), you may not have the latest Ruby package in the official Ubuntu apt repository. For example, Cage was developed as Rails 3.2 application running on Ruby 1.9.3. Also, I would like my application’s Ruby installation to be separated from system Ruby installation if any.

For this, we have an internal Debian package called hulu-ruby19, which installs a clean copy of Ruby into /opt/hulu/ruby-1.9.3 directory. This package also contains the latest rubygem module as well as the latest bundler gem. The idea is that once this package is installed, any ruby gems can be managed by the bundler from there on. We also have an internal Debian repository that will serve up this package. So, in my Chef recipe, I just need to add the following line at the top of cage/recipe/deploy.rb file.

package 'hulu-ruby19'

Application Account Creations

It is usually a bad idea to run your applications as root. At Hulu, our applications run under application-specific machine-local accounts, and each account needs to be created. For Cage, I add the following snippet to the JSON file that define the service account data bag.

"cage": { "uid": 975, "gid": 975, "shell" : "/bin/false", "home": "/tmp", “system”: true},

We store our accounts in a data bag, and create it consistently across all Linux boxes. In this case, we are creating an account called “cage” with UID/GID of 975. This is a “system” account that my application will run under, and has no interactive login credentials. Once you have this data bag item, it’s trivial to map the data to the corresponding user resource in Chef DSL.

Application Directories

Now that we have the two external dependencies taken care of (Ruby installation and user account creation), let’s examine how Cage is installed. All files used by our application are installed into /opt/hulu/cage directory, which is laid out as follows.


$ tree -L 3 /opt/hulu/cage
/opt/hulu/cage
├── current -> /opt/hulu/cage/releases/5ed123d91ee74210341765d76271d314fe58f3d0
├── releases
│   ├── 5ed123d91ee74210341765d76271d314fe58f3d0
│   │   ├── app
│   │   ├── conf -> /opt/hulu/cage/shared/conf
│   │   ├── config
│   │   ├── config.ru
│   │   ├── data -> /opt/hulu/cage/shared/data
│   │   ├── db
│   │   ├── doc
│   │   ├── Gemfile
│   │   ├── Gemfile.lock
│   │   ├── lib
│   │   ├── log -> /opt/hulu/cage/shared/log
│   │   ├── public
│   │   ├── Rakefile
│   │   ├── README.rdoc
│   │   ├── script
│   │   ├── test
│   │   ├── tmp -> /opt/hulu/cage/shared/tmp
│   │   └── vendor
│   └── f419316f9bc63bc20ee7f348f7519461771916ee
│       └── ...
└── shared
    ├── ...
    └── vendor_bundle
First, specific snapshots of the application repository are installed into the /opt/hulu/cage/releases/<revision> directory. Then, symbolic links are created into /opt/hulu/cage/shared directory, which survives across application deployment. Finally, /opt/hulu/cage/current symbolic link is switched to point to the latest release (in this case 5ed123d91ee74210341765d76271d314fe58f3d0 directory). We use symbolic links here because Ruby on Rails assumes that it can write to certain subdirectories from its installation root. For example, Rails will create log files under the Rails.root + “/log” directory.

If you have used Capistrano before, this process should be familiar. This style of deployment was directly ported from Capistrano, and is available as Chef resource called “deploy_revision”. Let’s examine the use of deploy_revision resource inside cage/recipes/deploy.rb below.

CAGE_SERVICE_ROOT = '/opt/hulu/cage'
deploy_revision CAGE_SERVICE_ROOT do
  deploy_to         CAGE_SERVICE_ROOT
  repo              'ssh://hulu-internal-git-repository/repos/cage.git'
  ssh_wrapper       "/home/chefclient/bin/gitssh"
  revision          node['cage']['revision_tag']
  action            node['cage']['release_action']
  shallow_clone     true
  enable_submodules true
  migrate           false
  environment       "RAILS_ENV" => node['cage']['rails_env']
  purge_before_symlink %w{conf data log tmp public/system public/assets}
  create_dirs_before_symlink []
  symlinks(                        # the arrow is sort of reversed:
    "conf"   => "conf",            # current/conf          -> shared/conf
    "data"   => "data",            # current/data          -> shared/data
    "log"    => "log",             # current/log           -> shared/log
    "tmp"    => "tmp",             # current/tmp           -> shared/tmp
    "system" => "public/system",   # current/public/system -> shared/system
    "assets" => "public/assets"    # current/public/assets -> shared/assets
  )
  before_restart do
    Dir.chdir '/opt/hulu/cage/current'
    system("/opt/hulu/ruby-1.9.3/bin/bundle install") or raise "bundle install failed"
    system("RAILS_ENV=#{node.cage.rails_env} /opt/hulu/ruby-1.9.3/bin/rake assets:precompile")
  end
  notifies :restart, "service[cage]"
  notifies :restart, "service[nginx]"
end
The above recipe code does the following:

1) Check out the revision specified as node attribute node['cage']['revision_tag'] from the git repository specified by “repo”, using the ssh_wrapper in chefclient user’s home directory. This wrapper is used to specify the necessary credentials to ssh when accessing git repository. Here is the content of /home/chefclient/bin/gitssh.

#!/bin/sh
exec ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i "/home/chefclient/.ssh/id_rsa" "$@"
The -o options given in the ssh command line above are used to avoid interactive warnings like:
The authenticity of host 'hulu-internal-git-repository (10.12.49.133)' can't be established.
RSA key fingerprint is 89:62:0f:9c:78:d6:f5:ce:e6:b8:36:38:e4:c7:d4:f0.
Are you sure you want to continue connecting (yes/no)?
The -i option specifies the private key to be used for accessing the git repository.

Notice that node[’cage’][’revision_tag’] attribute can be overridden at the Chef environment level (e.g. “production” or “staging” or “qa”) to specify the revision to be deployed across all machines that belong to each chef environment.

2) node[’cage’][’release_action’] attribute may be “deploy”, “force_deploy” or “rollback”. The Chef client will examine the currently deployed revision, and will bail out if it determines that that this revision is already deployed — i.e. that the target release tag is the current revision tag. The “force_deploy” option is a convenient way to override this behavior, allowing a full redeployment of the application. This is particularly handy if you are debugging your Chef recipe that performs deployment.

3) Other interesting resource attributes are “purge_before_symlink” and “symlinks”. These attributes specify symbolic links to be created from /opt/hulu/cage/releases/<revision>/ directories to /opt/hulu/cage/shared directory.

Note that the direction of arrows in “symlinks” attribute appear to be reversed. “system” => “public/system” means to create a symbolic link from “/opt/hulu/cage/releases/<revision>/public/system” to /opt/hulu/cage/shared/system directory.

4) There are several hooks that allow you to inject some custom actions during deployment. In this example, we are using the “before_restart” hook to precompile Rails assets (under Rails.root + “/app/assets” directory in source tree) before restarting the service. This is required if your Rails application is running in “production” mode.

5) Finally, notice the “notifies” attributes at the bottom of the resource definition. Notification resources in Chef are used to delay certain actions until prerequisites are satisfied. Elsewhere in our “cage” recipe, we defined the cage service resource and the nginx service resource. Here, we are notifying those resources to “restart” themselves once deployment is finished.

Nginx Set Up

You can use the community nginx cookbook to set up nginx fairly easily. In our example, I wanted to show how you can serve up your application from port 80, which is by default assigned to the default nginx site. Here is what our recipes/nginx.rb looks like:

template "#{node.nginx.dir}/sites-available/default" do
  source "nginx.erb"
  owner "root"
  group "root"
  mode 0644
  variables(
    :service_name     => "cage",
    :service_port     => "80",
    :worker_port      => "8800",
    :nginx_access_log => "/opt/hulu/cage/shared/log/nginx-access.log",
    :nginx_error_log  => "/opt/hulu/cage/shared/log/nginx-error.log"
  )
end
nginx_site 'default' do
  options(:enable => true)
end
Here is what cage/templates/default/nginx.erb looks like:
server {
  listen <%= @service_port -%>;
  server_name <%= @service_name -%>;
  access_log <%= @nginx_access_log -%>;
  error_log <%= @nginx_error_log -%> warn;
  location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass http://localhost:<%= @worker_port -%>/;
  }
}
This will set up /etc/nginx/sites-available/default by instantiating the ERB template, and enable the nginx default site that will proxy for our Cage application.

Runit Set Up

Runit is a process initialization and monitoring solution often used by Opscode-authored cookbooks. One obvious benefit of using runit is that you don’t need to worry about writing your own init.d script, which is often a hassle to get right. Runit will capture STDOUT from your application and log it and rotate it automatically. Also, runit monitors and restart your application process if it dies for some reason.

There is a community cookbook for Runit that works well on Ubuntu. Here is how I use this cookbook.

1) Include the runit recipe, and add a runit_service resource as follows.

include_recipe 'runit'
runit_service 'cage' do
  options(
    :service_curr => '/opt/hulu/cage/current',
    :owner        => 'cage',
    :group        => 'www-data',
    :unicorn      => '/opt/hulu/ruby-1.9.3/bin/unicorn',
    :logdir       => '/opt/hulu/cage/shared/log',
    :rails_env    => 'production'
  )
end
We will be running our application process as ‘cage’, but our effective group ID will be www-data. Also, we will be using unicorn to run our Rails application.

2) Create templates/default/sv-cage-run.erb, which will turn into a /etc/sv/cage/run file when instantiated (replace “cage” in the ERB file name with your application name). The content of the ERB file looks like the following.

#!/bin/bash
cd <%= @options[:service_curr] %>
exec 2>&1
exec chpst -u <%= @options[:owner] %>:<%= @options[:group] %> <%= @options[:unicorn] %> -E <%= @options[:rails_env] %> -c config/unicorn.rb
3) Create templates/default/sv-cage-log-run.erb, which looks like.
#!/bin/sh
exec chpst -u root:root svlogd -tt <%= @options[:logdir] %>
Note that we are running as root here. I found this to be a bit of a quirk in runit, but nothing too serious. This file gets instantiated into /etc/sv/cage/log/run, which is the process that captures STDIO from the main run file (/etc/sv/cage/run) and writes it to the given log directory in a file named “current” (e.g. /opt/hulu/cage/shared/log/current).

Once you have this in your recipe, runit will take care of starting your application during boot, as well as monitoring and restarting your application process as necessary.

The process that runs our application’s “run” file is a program called svrun, which will spawn the unicorn process for us. Svrundir is the grandfather process that monitors all svrun processes, and restarts them as necessary. You can use ps to see this relationship as follows.

$ ps axjf | grep -e cage -e unicorn -e runsvdir
    1  3776  3776  3776 ?  -1 Ss       0   0:44 runsvdir -P /etc/service log: .......................
 3776  9273  9273  9273 ?  -1 Ss       0   0:00 \_ runsv cage
 9273  9274  9273  9273 ?  -1 S        0   0:00    \_ svlogd -tt /opt/hulu/cage/shared/log
 9273 19694  9273  9273 ?  -1 Sl     975   0:02     \_ unicorn master -c config/unicorn.rb
19694 19700  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[0] -c config/unicorn.rb
19694 19703  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[1] -c config/unicorn.rb
19694 19706  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[2] -c config/unicorn.rb
19694 19709  9273  9273 ?  -1 Sl     975   0:00         \_ unicorn worker[3] -c config/unicorn.rb
Note that once your Rails application under runit’s control, you cannot simply use UNIX kill command to stop it. Runit recipe conveniently maps /etc/init.d/cage command to /usr/bin/sv command, so you can start and stop your application in the usual way (e.g. “/etc/init.d/cage stop” or “service cage stop”).

Role: Cage

It’s also a good idea to create a role that contains your recipe, so that people always think in terms of “roles” rather than individual recipes when putting together a new environment. Here is what the role file looks like for Cage. Here “role[hulu-common]” is the common set of recipes that run on all Hulu machines, and contains basic settings such as user accounts and NTP configuration.

name "cage"
description "Automated visual verification service"
run_list(
  "role[hulu-common]",
  "recipe[cage]"
)

Deploying the Recipe

Once you have the cookbook and role written, you need to upload your cookbook to the Chef server, and run chef-client on your target machine to deploy the role. The chef framework provides a command line tool called “knife”, which is used to administer the Chef server. Here are the knife command I run to deploy cage.

<pre>
knife cookbook upload cage
knife role from file roles/cage.rb
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo /etc/init.d/chef-client stop'
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo chef-client'
knife ssh -x chefclient 'chef_environment:staging AND role:cage' 'sudo /etc/init.d/chef-client start'
</pre>
Here I am using knife ssh to run chef-client on all nodes that are supposed to be running Cage in Chef environment named “staging”. And, voilà! My Rails application will be up and running in production mode on all machines that have the role “cage” assigned to them.

Conclusion: Push vs. Pull

In this article, I tried to highlight the Chef features that are designed to help deploy your applications. If you are currently using Capistrano, Fabric or just plain rsync/ssh as your application deployment mechanism, you are used to “push” model. In contrast, Chef is based on “pull” model.

This is an important distinction, in that “push” model implies a lot of explicit actions on system operators’ part. For example, operators need to worry about which machines to push changes to, and when to push the changes. The idea behind “pull” model is that each machine in the infrastructure is downloading and applying its own configuration, without explicit involvement of operators.

Moving your application deployment model from “push” model to “pull” model, you are making your application be more like regular part of provisioning and infrastructure maintenance, hence making the deployment process more scalable and robust. This increased robustness is a core benefit of Chef-based application deployment, in addition to promoting close collaboration between software developers and system administrators under a unified framework and process. This is why we are investing our time into Chef.

Last comment: Sep 1st 2013 2 Comments
  • Igor says:

    I’ve been using deploy resource for quite a long time both revision and timestamp based, but recently on several projects I’ve refactored it to use application+application_ruby cookbook. And I think it looks better now – it incorporates generation of service script and does some automation of bundler stuff. At a first sight it could look a little bit complex and obscure but actually it is a nothing more than declarative description of application deployment and if you already have been using deploy resource you won’t have any problems with it. You can take a look into example of real-world usage of this cookbook for deployment of rails application here – https://github.com/iafonov/stacker/blob/master/vendor-cookbooks/copycopter/recipes/deploy.rb

    Thanks!

  • Shawn says:

    Thanks for such a great well written article. My learning of Chef has been a mixture of exhilaration to the pits of despair but with articles like yours I get more and more information on how to correctly use Chef. I can never go back to the push model, life is just too short.

*
*