Sunday, October 04, 2015

Service Discovery Mechanisms in Microservices

In a microservices based architecture, we would not know the number of instances of a server or their IP addresses beforehand. This is because microservices typically run in VMs or Docker containers that are dynamically spawned based on usage load.

So consumers would need some kind of service discovery mechanism to communicate with microservices. There are two options to design this -

a) Server-side Service Discovery - Here the consumers make a request to a load-balancer/service registry and then the request is routed to the actual service end-point. This paradigm is clearly explained on this blog here. Examples of this design pattern is the AWS Elastic Load Balancer.

b) Client-side Service Discovery - Here the consumers use a small library for making service calls. This library makes calls to the service registry and obtains the load-balanced actual service end-point. Netflix uses this approach and its service registry is called Eureka and its client library is called Ribbon.

Saturday, October 03, 2015

Handling failures and improving resilience in microservices

In a microservices architecture, one has to build services that can handle failures. For e.g. If a microservice calls another dependent microservice that is down, then we need to handle this using timeouts and implement the Circuit Breaker pattern.

Netflix has open-sourced an incredibly useful library called as Hystrix to solve such problems. Anyone building large scale distributed architectures on the Java platform would find Hystrix a boon. When you make a remote service call through Hystrix libraries, it does the following:

  1. If the remote service call does not return within a specified threshold, Hystrix times-out the call.
  2. If a service is throwing errors and the number of errors exceed a threshold, then Hystrix would trip the circuit-breaker and all requests would fail-fast for a specified amount of time (recovery period)
  3. Hystrix enables developers to implement a fall-back action when a request fails, for e,g returning a default value or a null value or from cache. 
The full operating model of Hystrix is explained in great details on Github wiki 

It was also interesting to learn that the tech guys at Flipkart have taken Hystrix and implemented a service proxy on top of it called 'Phantom'. Looks like the advantage of using Phantom is that your consumers do not have to code against the Hystrix libraries. 

Ruminating on SemVer

Semantic Versioning (aka SemVer) of components has become mainstream today. The official page laying out the guidelines is available here -

Following SemVer, each component has a 3 digit version in the format of 'Major.Minor.Patch' - for e.g. 2.3.23
  • You increment the major version, when you make incompatible changes. 
  • You increment the minor version, when you make changes but those changes are backward compatible.
  • The patch digit is incremented when you just make a bug-fix and it is obviously backward compatible.
  • With SemVer, pre-releases can be defined by appending a hyphen and the word 'alpha/beta' after it. For e.g. a pre-release for version 3.0.0 could be 3.0.0-alpha.1. 
Following SemVer is a boon in managing dependencies between components. So if component A is using version 4.2.3 of component B, then you know that as long as version B does not become 5.x.y, there would be no breaking changes. You can specify dependencies in the manifest file of a component.

While using SemVer for software components is fine, does it make sense to have the x.y.z version in the URL of public APIs?
APIs are the interfaces you expose to your consumers. Do your consumers really need to know about the bug fixes you have made? or the new features you have added? Maybe yes or no !
IMHO, just using a single version number in your API URL would suffice majority of real life business usecases. For e.g.

A good blog post by APIGEE on API versioning is available here. As stated in the blog - "Never release an API without a version and make the version mandatory."

Ruminating on Netflix Simian Army

A friend of mine introduced me to the a suite of powerful tools used at Netflix for testing the resilience and availability of their services. The suite of tools is called 'Simian Army', which essentially is a collection of tools such as 'Chaos Monkey', 'Latency Monkey', 'Security Monkey', etc.

I was aware that Netflix runs its entire IT infrastructure on AWS and was happy to hear that all the tools are available on Github here -

A good introduction to the genesis behind these tools is given on the Netflix blog here -

Another interesting blog on the lessons that Netflix learned after migrating to AWS is available here.

Wednesday, September 16, 2015

Ruminating on Apple's DEP

Apple's device enrollment program (DEP) makes it easy for enterprises to roll out the deployment of their apple devices to their employees, agents, partners, etc.

DEP helps in automating the registration of the app to a MDM (Mobile Device Management) platform. The enterprise can also streamline the initial set-up process and modify it to suit their needs.

For any organization embarking on a mobile strategy, it is worthwhile to check if the selected MDM platform has support for DEP. 

Tuesday, September 15, 2015

Advantage of using Story Points instead of hours

Using story points for estimating user-stories in helpful because it encourages us to use 'relative sizing' and estimating the 'size of work' and not the real effort required.

Mike Cohn has given a good analogy by relating this concept to running a trail. Two people can agree on the fact that the trail is 5 miles long, but one may take 30 mins and the other may take 45 mins.

During the Planning Poker game, each developer is given cards with numbers 1,2,3,5,8 on them. Then the Scrum Master and Product Owner take the effort sizing from all developers to arrive at a consensus.

The Fibonacci scale is quite popular for estimating the user-story or epic size, as there is sufficient difference between the numbers to prevent confusion. For e.g. If the scale is sequential, then there would be a debate around sizing of 6 or 7 or 8. But a Fibonacci scale, makes it easy to relative sizing. 

Do we need a dedicated Scrum Master?

The need for a full-time Scrum Master is often a topic of hot debate in many Agile projects. Based on the numerous agile projects that we have successfully executed, I would give the following recommendations -

  • If your team is adopting SCRUM for the first time, then it is better to have a full-time Scrum Master. He would be responsible for ensuring that all agile processes are followed and everyone understands the rules of the game. The Scrum Master essentially acts as an evangelist educating teams on all aspects on SCRUM.
  • Once the teams have become comfortable with SCRUM processes, then we can have a part-time Scrum Master. IMHO, the technical architect or tech lead is most suited to play this role.
  • One of the main functions of a Scrum Master is to remove all impediments that the team faces. To be successful in this role, you need someone who can understand the technical complexities, business drivers and has a good rapport with the product owner. Hence architects are a good fit for the role of a Scrum Master. 
  • The Scrum Master also facilitates the daily Scrum and weekly Scrum of Scrums to facilitate collaboration across teams. He also leads the retrospectives and facilitates combined learning. 

Static code analyzers for native mobile app development

Listing down the tools used by my mobility team for static code analysis of mobile apps.

For iOS, the most popular tool is Clang. The default IDE (Xcode) also comes with a static code analyzer in-built in the IDE.

Sonar also provides a commercial plug-in for Objective-C that can be very useful if you are already using Sonar for all other platforms. There is another open-source Sonar plug-in for Objective C available here -

For Android, the most popular static code analyzer is lint. Lint integrates very well with Eclipse and Android Studio.

Facebook recently released a open-source static code analyzer for Android and iOS called as Infer. Facebook uses Infer to detect bugs in its Android and iOS apps. 

Ruminating on Less and Saas

CSS has been a boon to all web developers and allows for the clear separation of presentation from HTML markup. But CSS comes with it own limitations. For e.g.
  • CSS does not have the ability to declare variables. Hence if you want a color to be used across multiple element types, you have to repeat the color. 
  • CSS does not support nesting of properties. Hence we end up repeating the code again and again. 
To counter these limitations, there are new languages that have propped up that are known as 'CSS-Extension' languages. These languages support variables, nesting, etc. and make it super-easy to define themes in CSS.

Two of the most popular extension CSS languages are Less and Saas. These languages can be compiled into pure CSS language before being deployed to production. 

Sunday, September 13, 2015

Ruminating on the timelessness of the Agile Manifesto

I had signed the Agile Manifesto a decade back (in 2005) and was amazed to realize, how relevant the principle tenets are even today!

It is imperative for any software development project to imbibe the following principles to succeed -
  1. Individuals and interactions over processes and tools
  2. Working software over comprehensive documentation
  3. Customer collaboration over contract negotiation
  4. Responding to change over following a plan

Applying the Start-Stop-Continue paradigm to Sprint Retrospective

We all know the importance of Retrospective meetings after a Sprint. This is an excellent time to reflect on what worked, what did not work and what areas need improvement.

A simple way to conduct a retrospective with the entire team is to follow the Start-Stop-Continue model. You ask each team member to articulate -

  • what according to him/her should we start doing, 
  • what should we stop doing and 
  • what should we continue doing (with some changes if required). 

Then after collecting everyone's views, the team should brainstorm and debate around all the ideas presented and select the top 3 or 5 ideas that they would implement in the next sprint.

Many teams start skipping the retrospective if their project is running smoothly, but it is important to remember that there is always scope for improvement, no matter how good your team is currently functioning. 

Wednesday, August 05, 2015

Ruminating on Data Lake

Anyone contemplating to understand a Data Lake should peruse the wonderful article by Martin Fowler on the topic -

Jotting down important points from the article -

  1. Traditional data warehouse (data marts) have a fixed schema - it could be a star schema or a snowflake schema. But having a fixed schema imposes many restrictions for data analysis. A Data Lake is essentially schema-less. 
  2. Data warehouses also typically cleanse the incoming data and improve the data quality. They also aggregate data for faster reporting. In contrast, a Data Lake stores raw data from source systems. It is up-to the data scientist to extract the data and make sense of it. 
  3. We still need Data Marts - Because the data in a data lake is raw, you need a lot of skill to make any sense of it. You have relatively few people who work in the data lake, as they uncover generally useful views of data in the lake, they can create a number of data marts each of which has a specific model for a single bounded context.A larger number of downstream users can then treat these lake-shore marts as an authoritative source for that context.

Monday, July 27, 2015 - A nifty tool

We used to use browser tools such as Firebug to find out more 'backend' information about a particular site - for e.g. what servers does it run on? What server-side web technology is being used? What web content management tool is being used? etc.

Found a nifty website that gives all this info in the form of a neat table -
A useful tool to have in the arsenal for any web-master. 

Friday, July 24, 2015

Correlation does not imply Causation !

One of the fundamental tenets that any analytics newbie needs to learn is that - Correlation does not imply Causation !

Using statistical techniques, we might find a relationship between two events, but that does not mean that the occurrence of an event causes the other event. Jotting down a few amusing examples that I found from the internet.
  • The faster windmills are observed to rotate, the more wind is observed to be. Therefore wind is caused by the rotation of windmills 
  • Sleeping with one's shoes on is strongly correlated with waking up with a headache. Therefore, sleeping with one's shoes on causes headache.
  • As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream consumption causes drowning.
  • Since the 1950s, both the atmospheric CO2 level and obesity levels have increased sharply. Hence, atmospheric CO2 causes obesity.
  • The more firemen are sent to a fire, the more damage is done.
  • Children who get tutored get worse grades than children who do not get tutored
  • In the early elementary school years, astrological sign is correlated with IQ, but this correlation weakens with age and disappears by adulthood.
  • My dog is more likely to have an accident in the house if it’s very cold out.
A good site showcasing such spurious correlations is here -

Thursday, July 23, 2015

Using the Solver add-in in Excel for finding optimal solutions

Today we learned about a nifty tool in Excel that can be used to solve 'maximizer' or 'most optimal' solution to problems. For e.g. Given a set of constraints, should we make cars or trucks.

The below links would give a quick idea on how to use this tool to find out optimal solutions and also carry out 'what-if' analysis. You enter the objective, constraint and decision variable cells and let the tool do the magic.

Wednesday, July 15, 2015

How can large enterprises compete with new-age digital startups?

Chief Executive magazine recently featured an article by Nitin Rakesh on how large enterprise can compete with digital startups. The article is available at the following links:
Retraining Goliath to face digital David

The article advises large enterprises to capitalize on their strengths - i.e.

a) Utilize financial power to acquire digital competitors - How Allstate acquired Esurance..
b) Leverage existing brand equity - How Amex partnered with Walmart to launch Bluebird..
c) Mine existing customer data - Leverage customer insights to deliver highly personalized services.
d) If possible collaborate rather than compete with digital startups.

Thursday, June 25, 2015 - Next Generation Web Crawler

We had used many open source web crawlers in the past, but recently a friend of mine referred me to a cool tool at essentially parses the data on any website and structures it into a table of rows/columns - "Turn web pages into data". This data can be exported as an CSV file and it also provides a REST API to extract the data. This kind of higher abstraction over raw web crawling can be extremely useful for developers.

We can use the magic tool for automatic extraction or use their free tool to teach it how to extract data. 

Ruminating on Email marketing software

Recently we were looking for a mass email software for a marketing use-case. Jotting down the various online tools/platforms that we are currently evaluating.

  1. Mailjet - Has a free plan for 200 emails/day
  2. MailChimp - Has a free plan for 12000 emails/month
  3. Campaign Monitor
  4. Active Campaign 
  5. Salesforce Marketing Cloud 

APIs in Fleet Management

Fleet Management software is used by fleet owners to manage their moving assets. The software enables them to have a centralized data-store of their vehicle and driver information and also maintain maintenance logs (service and repair tracking).

The software also allows us to schedule preventive maintenance activities, monitor fuel efficiency, maintain fuel card records, calculate metrics such as "cost per mile" etc. You can also setup reminders for certification renewals and license expiration.

It was interesting to see Fleetio (a web based fleet management company) roll out a API platform for their fleet management software. Their vision is to become a digital hub for all fleet related stuff and turn their software product into a platform that can be leveraged by partners to create a digital ecosystem.

The API would allow customers to seamlessly integrate data in Fleetio with their operational systems in real time. For e.g. Pulling work orders from your fleet management system and pushing it to your accounting software in real time. Pushing mileage updates from a bespoke remote application to your fleet management software, Integrate driver records with Payroll systems, etc. All the tedious importing and exporting of data is gone !

TomTom also has a web based fleet management platform called as WEBFLEET that provides an API (Webfleet.connect) for integration. The Fleetlynx platform also has an API to integrate with Payroll and Maintenance systems.

Saturday, June 20, 2015

Ruminating on bimodal IT

Over the past couple of years, Gartner has been evangelizing the concept of bimodal IT to organizations for succeeding in the digital age. A good note by Gartner on the concept is available here.

Mode 1, which refers to the traditional "run the business" model focuses on stability and reliability.
Mode 2, which are typically "change the business" initiatives focus on speed, agility, flexibility and the ability to operate under conditions of uncertainty.

Bimodal IT would also need resources with different skills. As an analogy, Mode 1 IT resources would be the marathon runners, whereas Mode 2 IT resources need to be like sprinters. It would be difficult for a IT resource to be both. There is a risk that he might relegate to a mid-distance runner...and today's IT does not need mid-distance runners..

Tuesday, June 16, 2015

Ruminating on Section 508 Accessibility standards

In the UX world, you would often come across the phrases such as "compliance with Section 508". So what exactly is Section 508 and how does it relate to User Experience?

"Section 508" is actually an amendment to the Workforce Rehabilitation Act of 1973 and was signed into a law in 1998. This law mandates that all IT assets developed by or purchased by the Federal Agencies be accessible by people with disabilities. The law has stated web guidelines that should be followed while designing and developing websites.

It is important to note that Section 508 does not directly apply to private sector web sites or to public sites which are not U.S. Federal agency sites. But there are other forces at play, that may force a organization to make their websites accessible. The ADA (Americans with Disabilities Act) that was passed way back in 1990 prohibits any organization to discriminate on the basis of disability.
The following link reveals examples of law suites filed for violation of ADA -

Beyond the legal regulations, there are also open initiatives aimed at improving the accessibility of websites. W3C has an initiative named "Web Accessibility Initiative (WAI)" that lays down standards and guidelines for accessibility. There is also a standard for content authoring called - "Web Content Accessibility Guidelines (WCAG)".

The following sites provide good reading material on Accessibility -

Jotting down the high level guidelines that should be followed for accessibility.

  1. A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content).
  2. Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. For e.g.  synchronized captions.
  3. Web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. Color is not used solely to convey important information. Ensure that foreground and background color combinations provide sufficient contrast when viewed by someone having color deficits or when viewed on a black and white screen. 
  4. Documents shall be organized so they are readable without requiring an associated style sheet. If style-sheets are turned off, the document should still be readable. 
  5. Client-side image maps are used instead of server-side image maps. Appropriate alternative text is provided for the image as well as each hot spot area.
  6. Data tables have column and/or row headers appropriately identified (using the element).
  7. Pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz. No element on the page flashes at a rate of 2 to 55 cycles per second, thus reducing the risk of optically-induced seizures.
  8. When electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues.
  9. When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required.

Friday, June 12, 2015

Implementing sliding window aggregations in Apache Storm

My team was working on implementing CEP (Complex Event Processing) capabilities using Apache Storm. We evaluated multiple options for doing so - one option was using a lightweight in-process CEP engine like Esper within a Storm Bolt.

But there was another option of manually implementing CEP-like aggregations (over a sliding window) using Java code. The following links show us how to do so.

Rolling Count Bolt on Github

While the above code would help in satisfying certain scenarios, it would not provide the flexibility of a CEP engine. We need to understand that CEP engines like (Tibco BE, Esper, StreamInsights) are fundamentally different from Apache Storm; which is more of a highly distributed stream computing platform.

A CEP engine would provide you with SQL like declarative queries and OOTB high level operators like time window, temporal patterns, etc. This brings down the complexity of writing temporal queries and aggregates. CEP engines can also detect patterns in events. But most CEP engines do not support a distributed architecture.

Hence it makes sense to combine CEP with Apache Storm - for e.g. embedding Esper within a Storm Bolt. The following links would serve as good reference -

Monday, June 01, 2015

Ruminating on Shipping Containers and Docker

Today during one of the lectures at IIMB, I was introduced to a book called 'The Box' by Mark Levinson.

The book narrates the story of how the invention of the shipping container completely changed the face of global commerce. A snippet from the book -

"the cost of transporting goods was decisive in determining what products they would make, where they would manufacture and sell them, and whether importing or exporting was worthwhile. Shipping containers didn't just cut costs but rather changed the whole economic landscape. It changed the global consumption patterns, revitalizing industries in decay, and even allowing new industries to take shape."

A nice video explaining the same is available on YouTube -

A similar revolution is happening in the IT landscape by means of a new software container concept called as Docker. In fact, the logo of Docker contains an image of shipping containers :)

Docker provides an additional layer of abstraction (through a docker engine, a.k.a docker server) that can run a docker container containing any payload. This has made it really easy to package and deploy applications from one environment to the other.

A Docker container encapsulates all the code and its dependencies required to run an application. They are quite different from virtualization technology. A hypervisor running on a 'Host OS' essentially loads the entire 'Guest OS' and then runs the apps on top of it. In Docker architecture, you have a Docker engine (a.k.a Docker server) running on the Host OS. Each Docker server can host many docker containers. Docker clients can remotely talk with Docker servers using a REST API to start/stop containers, patch them with new versions of app, etc.

A good article describing the differences between them is available here -


All docker containers are isolated from each other using the Linux Kernel process isolation features.

In fact, it is these OS-level virtualization features of Linux that has enabled Docker to become so successful.

Other OS such as Windows or MacOS do not have such features as part of their core kernel to support Docker. Hence the current way to run Docker on them is to create a light-weight Linux VM (boot2docker) and run docker within it. A good article explaining how to run Docker on MacOS is here -

Docker was so successful that even Microsoft was forced to admit that it was a force to reckon with !
Microsoft is now working with Docker to enable native support for docker containers in its new Nano server operating system -

This IMHO, is going to be a big game-changer for MS and would catapult the server OS as a strong contender for Cloud infrastructure. 

Ruminating on bare metal cloud environments

Virtualization has been the underpinning technology that powered the Cloud revolution. In a typical virtualized environment, you have the hypervisor (virtualization software) running on the Host OS. These type of hypervisors are called "Type 2 hypervisor".

But there are hypervisors that can be directly installed on hardware (i.e. hard disk). These hypervisors, know as "Type 1 hypervisors" do not need a host OS to run and have their own device drivers and other software to interact with the hardware components directly. A major advantage of this is that any problems in one virtual machine do not affect the other guest operating systems running on the hypervisor.

The below image from Wikipedia gives a good illustration.

Thursday, May 14, 2015

Ruminating on Apple HealthKit backup

While my team was working on the Apple HealthKit iOS APIs, we came to know a few interesting things that many folks are not aware of. Jotting down our findings -
  • HealthKit data is only locally stored on the user's device
  • HealthKit data is not automatically synced to iCloud - even if you have enabled iCloud synching for all apps. 
  • HealthKit data is not backed up as part of normal device backup in iTunes. So if you restore your device, all HealthKit data would be lost !
  • HealthKit is not available on iPads. 
The only way we can take a backup of HealthKit data is to enable "encrypted backup" in iTunes. If this option is selected in iTunes, then your HealthKit data would get backed up.

Another interesting point from a developer's perspective is that the HealthKit store is encrypted on the phone and is accessible by authorized apps only when the device is unlocked. If the device is locked, no authorized app can access the data during that time. But apps can continue sending data via the iOS APIs. 

Thursday, February 05, 2015

Comparing two columns in excel to find duplicates

Quite often, you have to compare two columns in excel to find duplicates or 'missing' rows. Though there are many ways to do this, the following MS article gives a simple solution.

Depreciation of fixed assets in accounting

Would like to recommend the following site that gives a very simple explanation of the concept of depreciation in accounting. Worth a perusal for beginners.

Monday, January 26, 2015

Patient Engagement Framework for Healthcare providers

HIMSS (Healthcare Information and Management Systems Society) has published a good framework for engaging patients so as to improve health outcomes.

Patients want to be engaged in their healthcare decision-making process, and those who are engaged as decision-makers in their care tend to be healthier and have better outcomes. The whole idea to is to treat patients not just as customers, but partners in their journey towards wellness.

The following link provides a good reference for designing technology building blocks for improving patient experience.

Inform Me --- Engage Me --- Empower Me --- Partner with me

Ruminating on Open Graph Protocol

Ever wondered how some links on Facebook are shown with an image and a brief paragraph? I dug deeper to understand what Facebook was doing behind the scenes to visualize the link.

To my surprise, there was something called as "Open Graph Protocol" that defined a set of rules for telling Facebook, how your shared contents should be displayed on it.

For e.g. we can add the following meta-tags in any web page and Facebook would parse these tags when you post the link to this page.

  • <meta property=”og:title” content=” “/>
  • <meta property=”og:type” content=””/>
  • <meta property=”og:url” content=””/>
  • <meta property=”og:image” content=””/>
  • <meta property=”fb:admins” content=””/>
  • <meta property=”og:site_name” content=””/>
  • <meta property=”og:description” content=””/>

  • More information can be found at this link -

    Router blocking HTTPS traffic?

    Recently I had got a new cloud router for my broadband connection. Though the speed was very good, I was facing intermittent problems in accessing HTTPS sites. For e.g. webmail would hang sometimes, payment gateway pages would not load, Amazon app would not load screens, etc.

    At first, I was not sure if the router was to blame, or was it the internet connection itself. A quick google search revealed that this is a common problem faced by many routers and had to do with the MTU (Maximum Transmission Unit) size limit. I was surprised that the MTU size would affect HTTPS which is a application level protocol.

    The following links show an easy method to find out the correct MTU size for your network using the ping command. For e.g. ping -f -l 1472