Monday, January 22, 2018

Ruminating on Netflix Conductor Orchestration

We have been evaluating the Netflix open source Conductor project and are intrigued by the design decisions made in it.

Netflix Conductor can be used for orchestration of microservices. Microservices are typically loosely coupled using pub/sub semantics and leveraging a resilient message broker such as Kafka. But this simple and proven event driven architecture did not suffice the needs for Netflix.

A good article on the challenges faced by the Netflix team and the genesis of Conductor is available here. Snippets from the article:

"Pub/sub model worked for simplest of the flows, but quickly highlighted some of the issues associated with the approach:

  • Process flows are “embedded” within the code of multiple applications - e.g. If you have a pipeline of publishers and subscribers, it becomes difficult to understand the big picture without proper design documentation. 
  • There is tight coupling and assumptions around input/output, SLAs etc, making it harder to adapt to changing needs  - i.e. How will you monitor that the response for a particular task is completed within a 3 second SLA? If not done within this timeframe, we need to mark this task as a failure. Doing this is very difficult with pub/sub unless we code this ourselves.
  • No easy way to monitor the progress - When did the process complete? At what step did the transaction fail?

To address all the above issues, the Netflix team created Conductor. Conductor servers are also stateless and can be deployed on multiple servers to handle scale and availability needs.