Redis has become an indispensable technology at Hulu as we continue to scale and innovate new features and products. Redis is a lightning-fast in memory key-object database with a very light hardware footprint, which makes it ideal for building new projects. It was so ideal that we found, in our infrastructure, dozens of virtual machines (VM's) and bare-metal servers dedicated entirely to hosting a redis-server. Clearly Redis is an easy choice from a programmer's perspective, but with such efficiency, unfortunately most of the Redis-dedicated hardware was horrendously underused. The CPU load and Disk I/O were the most commonly underused resources - with network I/O generally being well within limits as well.
To make the under-utilization worse, we need to claim more hardware to guarantee high-availability. Although Redis has made huge strides lately in high-availability (HA), you must still set up some form of automated master-slave failover in order to design a robust service relying on Redis. At Hulu, this is one of the foremost areas of focus on projects - the robustness and sensible degradation - so naturally there were several solutions (including use of Redis sentinel).
Initially, it was easy to maintain multiple clusters and the total amount of hardware used wasn't too excessive. But every new project using Redis meant another cluster to maintain. Fast-forward a few years and we find these HA solutions are not only growing more numerous and burdensome to maintain, but that we have a lot of underutilized hardware. The common denominator: Redis.
We use Redis for a lot of services, so it was time we organized how it is used. Monaco was built to provide that organization. Monaco is a clustered service, which deploys, configures, and maintains, highly available redis-servers across a cluster of hardware. By providing a robust, distributed management layer, Monaco can fill the machines to capacity with redis-servers in a predictable manner - allowing us to utilize much more of the resources in the underlying machine.
In simplest terms, Monaco is Redis as a Service.
Because Monaco provides automation for all aspects of creating, maintaining, and monitoring Redis servers, our Developers now use Monaco when they need Redis for projects. As Monaco's reliability was tested, more and more existing services migrated their Redis usage into Monaco as well. Today, we have 1.5TB of managed redis-servers in Monaco and are growing that number every day. Additionally, we have our hardware filled to our (somewhat conservative) desired limits. We're not just cutting down on VM overhead - using Monaco has ensured that utilization of our Redis-hosting hardware is consistent and maximized.
How it Works
Before I describe how Monaco works; we've open-sourced this project so you can dive in as deep as you like: http://github.com/hulu/monaco (I'll wait, go take a look).
At its heart, Monaco is based upon a Redis cluster itself. It uses Redis to distribute its internal state, communicate amongst nodes, and detect state changes. Monaco, very similarly to redis-sentinel, uses a leader election to maintain that redis-server cluster's replication roles, but that's where the similarities end.
Using the cluster's Redis replication roles as our basis, all Monaco nodes have a role of either "master" or "slave." Using a proven consensus algorithm (Raft) as the basis for leader election, we can assume this state to be consistent within our cluster. For the sake of clarity, I will refer to the Redis DB that Monaco uses as the "Monaco DB," to differentiate it from the Redis servers that Monaco hosts.
The master node is responsible for monitoring the health of the cluster, as well as the hosted redis-servers maintained in the cluster. Additionally, this master node can create/edit/delete hosted redis-servers in the cluster. Using keyspace notifications on the Monaco DB, all nodes instantly know about any change of state they need to enact.
All master/slave nodes are responsible for monitoring the Monaco DB and maintaining their node's state. Subscriptions in the local Monaco DB inform the Monaco process of any relevant work. Additionally, each "slave" node reports back their hosted redis-servers' status, so the master can perform any failover as necessary.
Finally, to provide an interface, there's a web application, which can run on any subset of the nodes in a Monaco cluster. It provides a simple interface with explanations of all Monaco API functions. In order to populate the graphs in the interface, there's a statistic-reporting Daemon that's packaged with the open source release. That Daemon uses the Monaco DB to store recent stats. However, for large installs, it is most likely not adequate as your Monaco DB could grow quite large and slow (especially with all the replication in a large cluster).
Monaco provides a bunch of Redis clusters within its hardware, but now what? How do your Redis clients contact it in a robust manner? The answers will vary based on implementation, as Monaco has a few modular components.
The “exposure” module defines how Monaco’s master will attempt to inform your infrastructure of changes in Monaco-hosted Redis clusters. Consider this a hook that is triggered by Monaco state changes. At Hulu, we’ve defined this module to create/edit/delete loadbalancer configurations, allowing our Monaco DBs to be constantly available through a Virtual IP.
The "stat" module defines how the Monaco stat reporter will attempt to publish stats. By default it will use the Monaco DB, but at Hulu we use a Graphite stats-publisher to integrate it better with our service monitoring pipeline.
Clearly, the exposure module is the more interesting one (although large deployments will have to make a stat module, even if it's just to drop the stats). Because of our particular implementation, we're able to use simple Redis clients with no HA logic baked into the client side - just retries on disconnect.
However, the default exposure module does nothing - infrastructure varies widely and it wouldn't be possible to create a module that would work in all infrastructures. But in the event you don't want to implement a custom exposure module, you can use Monaco reliably with one web API call. All cluster information is available through the web API in JSON format, which allows for more complex use cases such as splitting reads/writes across the master/slave nodes.
Redis Conf 2015
Monaco has been under development for a while, and it was a treat to share it with the Redis community at RedisConf2015. Towards the end of the conference, Salvatore Sanfillipo (creator of Redis) addressed the audience regarding open source software and that really motivated us to give back to the open source community. For those who may be interested more in the motivations and architecture of Monaco, here's a link to the talk. http://redisconference.com/video/monaco/