Supervisors are “have you tried turning it off and on again?” turned into a programming strategy. We’ve all made that trade off. Should I keep debugging what’s wrong, or just reboot? Often, the reboot fixes it and we can just move on.
But supervisors can also be used to design systems that go down and stay down. In this post we’re going to talk about when we’d want to design this kind of system and how exactly to do it.
We’re going to design a system of RabbitMQ consumers that fail at the first sign of trouble. Do not pass go, do not collect 200 dollars.
Let’s talk about what’s going on here. Our application starts a ConsumerGroup. This is a Supervisor that starts 2 complementary processes, a ConsumerSupervisor, responsible for starting our consumers, and a ConsumerMonitor. We want our consumer monitor to… Monitor Consumers. At the first sign of danger, it will instruct the ConsumerSupervisor to stop the presses, and kill all of it’s children.
After we’ve fixed the problem, we’ll bring everything back online. OK, let’s get started.
Since we covered how to setup a Producer and Consumer Pools in a previous post I won’t go into too much detail here. After setting them up, our Supervison tree should look like this.
Great, on to our supervisors.
Let’s start at the base of our Supervision tree, the Consumers. We’re going to use ExRabbitPool’s Consumer module to save us some boilerplate, and we’ll customize our restart strategy to support our “Burn the world” approach.
The biggest difference here is we set the restart option
:temporary. Supervised processes can set 1 of 3
:permanent (default): If it dies, bring it back to
life no matter what.
:transient: Only bring me back if I die under
suspicious conditions. If I die with a “normal” reason,
then it’s fine. Get me nice flowers.
:temporary: If I die at all, leave me dead.
My friends, what are we if not temporary processes, trying
to handle the right messages, lest we be killed by one
destined for someone else?
:temporary works for us here since we want to stop
consumers from potentially doing more damage. If it dies,
let it die.
ToxicityConsumer looks pretty similar, except it has
a different exchange and queue.
On to the
ConsumerSupervisor. Since it’s supervising
processes that have the
:temporary restart option, the
strategy doesn’t really matter. We’re going to leave it
with the default
We’ve added a couple of additional functions that we need
to support our
ConsumerMonitor process. The first one is
a list of all of the pid’s this supervisor is managing. The
second function terminates all of the children.
ConsumerMonitor will do well by its namesake but take
a look at its restart option.
We’re setting it’s restart strategy to
being, if this puppy dies for ANY other reason than what’s
on line 21, I want it alive. Notice we pass in a supervisor
as the argument to
On start, we monitor each of the supervisor’s pids and just wait… as soon as a process dies, we instruct the supervisor to execute order 66.
ConsumerGroupSupervisor ties it all together. Pay
special attention to the strategy option.
Supervisors get started sequentially. We completely start
ConsumerSupervisor before we start the
ConsumerMonitor. The ensures the pids are started and
ready to be monitored. The
:rest_for_one strategy allows
the monitor to fail and recover without disturbing the
consumers, but will allow us to heal the system. More on
Let’s add this to our application and take it out for a spin.
Let’s take a look at what this looks like in observer.
Supervisors are awesome
Our tree dies when it’s supposed, and comes back up in a fresh state when needed. LOVE IT!comments powered by Disqus Copyright © 2021 Steven Nuñez - HostileDeveloper