Saturday, October 03, 2015

Handling failures and improving resilience in microservices

In a microservices architecture, one has to build services that can handle failures. For e.g. If a microservice calls another dependent microservice that is down, then we need to handle this using timeouts and implement the Circuit Breaker pattern.

Netflix has open-sourced an incredibly useful library called as Hystrix to solve such problems. Anyone building large scale distributed architectures on the Java platform would find Hystrix a boon. When you make a remote service call through Hystrix libraries, it does the following:

  1. If the remote service call does not return within a specified threshold, Hystrix times-out the call.
  2. If a service is throwing errors and the number of errors exceed a threshold, then Hystrix would trip the circuit-breaker and all requests would fail-fast for a specified amount of time (recovery period)
  3. Hystrix enables developers to implement a fall-back action when a request fails, for e,g returning a default value or a null value or from cache. 
The full operating model of Hystrix is explained in great details on Github wiki 

It was also interesting to learn that the tech guys at Flipkart have taken Hystrix and implemented a service proxy on top of it called 'Phantom'. Looks like the advantage of using Phantom is that your consumers do not have to code against the Hystrix libraries.