Circuit Breakers in Golang

Ioshellboy
6 min readJul 10, 2021

--

What does your conscious mind fathom as soon as you think of the term ‘Circuit Breaker’.

Literally ‘Breaking the circuit’

I guess, definitely not what my evil mind conjured up — literally breaking a circuit with a hammer.

We all have circuit breakers installed at our places to impede unnatural flow of current from the grid to our residences. Let us first look at how that works before we jump into “Circuit Breakers in Microservices”.

A typical circuit breaker setup

As shown in the image above, a typical circuit breaker setup has 2 main components — 1. The soft iron core that is tightly wrapped with live wire 2. Contacts. As long as the contacts are able to create a junction, the current flows from external power supply to our houses. On the contrary, if the junction breaks, the current ceases to flow.

The soft iron core acts as an electromagnet as current flows via the wires wrapped around it and when a higher that expected amperes of current flows through it, the electromagnet becomes powerful enough to attract the adjacent contact, thus resulting in a short circuit.

You must be thinking that what does this has to do with the microservice architecture. In my defence, it is highly relatable as we will see now!

Cascading Failures in the Microservice architecture

The Microservice architecture has taken over Monolithic architecture for good, but there are some key issues we should address to make our system highly resilient.

One of the problems with Microservices is Cascading Failures. Let’s take an example to understand it better.

Cascading Failures

In the figure above, the actor calls our Primary Service, which depends on the upstream services — A, B, C. Now say, service A is a read heavy system and depends on a database. This database has its own limitations and when overloaded, can result into connection resets. This issue can then impact not only Service A’s performance but also Primary Service as the goroutines will continue to wait on it, thus logging the thread pool.

This is what one calls that ‘A bad apple spoils the barrel’. Someone who had a awful wine will surely relate. Let us work on an example to verify this.

Example Setup

Let us build a Netflixisc application. One of the mircoservices would be the Movies Service that is responsible to serve the feed page. This endpoint also depends on the Recommendation Service to provide apt recommendations for the user.

The recommendation service exposes a route /recommendations that returns a list of recommended movies, while also logging the number of goutines every 500ms.

The movies service exposes a route /movies that returns feed along with recommendations. To fetch recommendations, it in turn calls the upstream Recommendation Service.

With this setup, let us hit the movies endpoint at a rate of 100 requests/sec for 3 seconds. We get a 100% success with the 99th percentile being in lower ranges of milliseconds. That was expected, right, with just static data being served.

Now, let’s say that the recommendation service is taking too long to respond and add a wait of 20 seconds to the recoHandler and re attack. The success rate would go down the hill while the response time would start suffering. Furthermore, the number of goroutines clogged on the recommendation service during the attack would go up dramatically.

The recommendation service’s downtime impacted the end user as even the movies feed that could have been served to him weren’t served. This is exactly what cascading failure does to our system

Circuit Breaker to the rescue

The circuit breaker is a pretty simple but a rather important concept as it lets us maintain high availability of our services. A circuit breaker has 3 states:

  • Closed State
Closed State

The closed state refers to the state where the junction is closed and the data flows through it. This is our ideal state wherein the upstream service is working as expected.

  • Open State
Open State

The open state refers to the state wherein the circuit was shorted because the upstream service wasn’t responding as expected. This short circuiting saves the upstream service from being overwhelmed while it is already struggling. Moreover, the downstream service’s business logic gets a faster feedback of the upstream’s availability state without the need of it to wait for the upstream’s response.

  • Half Open State
Half Open — Switch to close state

If the circuit is open, we would want it to be closed as soon as the upstream service is available again. While you can do it via manual intervention, the preferred approach should be to let some requests pass through the circuit after some delay from the time the circuit was last opened.

If these requests to the upstream service succeeds, we can safely close the circuit.

Half Open — Remain in open state

On the flip side, if these request fail, the circuit remains in the open state.

The state diagram of a circuit breaker pattern thus looks like:

  1. If the circuit is closed, it can be opened if the failures exceed the configured threshold.
  2. If the circuit is open, it can be partially opened after some sleep time delay.
  3. If the circuit is half open, it can be
  • Opened again, if the requests that were allowed to pass through, also fail.
  • Closed, if the requests that were allowed to pass, succeed.

Circuit Breaker Implementation in Golang

While there are multiple libraries to opt for, the most standardly used one is the hystix. As the doc suggests, hystrix is a latency and fault tolerance library designed by Netflix to isolate points of access to remote systems, services and 3rd party libraries, to stop cascading failure and to enable resilience in complex distributed systems where failure is inevitable.

The circuit breaker implementation in hystrix depends on the following configurations:

  1. Timeout — The wait time for upstream service response.
  2. Max Concurrent Requests — Maximum concurrent upstream service calls allowed.
  3. Request Volume Threshold — The number of requests before which the circuit breaker wouldn’t evaluate if the state needs to be changed.
  4. Sleep Window — The amount of delay between the open and the half open state.
  5. Error Percent Threshold — The threshold percentage of errors at which the circuit would short.

Let us use this in our example of movies and recommendations and implement a circuit breaker pattern while fetching recommendations.

With Hystrix, you can also implement the fallback logic when the circuit is opened. This logic can vary from case to case. e.g. Fetching from cache if the circuit is open.

With this updated logic, let us try and re attack the with 100 requests per second rate for 3 seconds.

Voilla!!! 100 % success rate, as in case of an open circuit, we just serve the feed and return 0 recommendations. Also, as whenever the circuit shorts, we no longer make a call to the upstream service, thus the recommendation service is not overwhelmed as the number of goroutines clogging would not be as substantial as before.

Where to go from here?

I would recommend you to checkout:

  1. About Netflix Hystrix
  2. How hystrix works?
  3. Hystrix bucketing

--

--

Responses (1)