Thursday, June 25, 2015 - Next Generation Web Crawler

We had used many open source web crawlers in the past, but recently a friend of mine referred me to a cool tool at essentially parses the data on any website and structures it into a table of rows/columns - "Turn web pages into data". This data can be exported as an CSV file and it also provides a REST API to extract the data. This kind of higher abstraction over raw web crawling can be extremely useful for developers.

We can use the magic tool for automatic extraction or use their free tool to teach it how to extract data. 

Ruminating on Email marketing software

Recently we were looking for a mass email software for a marketing use-case. Jotting down the various online tools/platforms that we are currently evaluating.

  1. Mailjet - Has a free plan for 200 emails/day
  2. MailChimp - Has a free plan for 12000 emails/month
  3. Campaign Monitor
  4. Active Campaign 
  5. Salesforce Marketing Cloud 

APIs in Fleet Management

Fleet Management software is used by fleet owners to manage their moving assets. The software enables them to have a centralized data-store of their vehicle and driver information and also maintain maintenance logs (service and repair tracking).

The software also allows us to schedule preventive maintenance activities, monitor fuel efficiency, maintain fuel card records, calculate metrics such as "cost per mile" etc. You can also setup reminders for certification renewals and license expiration.

It was interesting to see Fleetio (a web based fleet management company) roll out a API platform for their fleet management software. Their vision is to become a digital hub for all fleet related stuff and turn their software product into a platform that can be leveraged by partners to create a digital ecosystem.

The API would allow customers to seamlessly integrate data in Fleetio with their operational systems in real time. For e.g. Pulling work orders from your fleet management system and pushing it to your accounting software in real time. Pushing mileage updates from a bespoke remote application to your fleet management software, Integrate driver records with Payroll systems, etc. All the tedious importing and exporting of data is gone !

TomTom also has a web based fleet management platform called as WEBFLEET that provides an API (Webfleet.connect) for integration. The Fleetlynx platform also has an API to integrate with Payroll and Maintenance systems.

Saturday, June 20, 2015

Ruminating on bimodal IT

Over the past couple of years, Gartner has been evangelizing the concept of bimodal IT to organizations for succeeding in the digital age. A good note by Gartner on the concept is available here.

Mode 1, which refers to the traditional "run the business" model focuses on stability and reliability.
Mode 2, which are typically "change the business" initiatives focus on speed, agility, flexibility and the ability to operate under conditions of uncertainty.

Bimodal IT would also need resources with different skills. As an analogy, Mode 1 IT resources would be the marathon runners, whereas Mode 2 IT resources need to be like sprinters. It would be difficult for a IT resource to be both. There is a risk that he might relegate to a mid-distance runner...and today's IT does not need mid-distance runners..

Tuesday, June 16, 2015

Ruminating on Section 508 Accessibility standards

In the UX world, you would often come across the phrases such as "compliance with Section 508". So what exactly is Section 508 and how does it relate to User Experience?

"Section 508" is actually an amendment to the Workforce Rehabilitation Act of 1973 and was signed into a law in 1998. This law mandates that all IT assets developed by or purchased by the Federal Agencies be accessible by people with disabilities. The law has stated web guidelines that should be followed while designing and developing websites.

It is important to note that Section 508 does not directly apply to private sector web sites or to public sites which are not U.S. Federal agency sites. But there are other forces at play, that may force a organization to make their websites accessible. The ADA (Americans with Disabilities Act) that was passed way back in 1990 prohibits any organization to discriminate on the basis of disability.
The following link reveals examples of law suites filed for violation of ADA -

Beyond the legal regulations, there are also open initiatives aimed at improving the accessibility of websites. W3C has an initiative named "Web Accessibility Initiative (WAI)" that lays down standards and guidelines for accessibility. There is also a standard for content authoring called - "Web Content Accessibility Guidelines (WCAG)".

The following sites provide good reading material on Accessibility -

Jotting down the high level guidelines that should be followed for accessibility.

  1. A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content).
  2. Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. For e.g.  synchronized captions.
  3. Web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. Color is not used solely to convey important information. Ensure that foreground and background color combinations provide sufficient contrast when viewed by someone having color deficits or when viewed on a black and white screen. 
  4. Documents shall be organized so they are readable without requiring an associated style sheet. If style-sheets are turned off, the document should still be readable. 
  5. Client-side image maps are used instead of server-side image maps. Appropriate alternative text is provided for the image as well as each hot spot area.
  6. Data tables have column and/or row headers appropriately identified (using the element).
  7. Pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz. No element on the page flashes at a rate of 2 to 55 cycles per second, thus reducing the risk of optically-induced seizures.
  8. When electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues.
  9. When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required.

Friday, June 12, 2015

Implementing sliding window aggregations in Apache Storm

My team was working on implementing CEP (Complex Event Processing) capabilities using Apache Storm. We evaluated multiple options for doing so - one option was using a lightweight in-process CEP engine like Esper within a Storm Bolt.

But there was another option of manually implementing CEP-like aggregations (over a sliding window) using Java code. The following links show us how to do so.

Rolling Count Bolt on Github

While the above code would help in satisfying certain scenarios, it would not provide the flexibility of a CEP engine. We need to understand that CEP engines like (Tibco BE, Esper, StreamInsights) are fundamentally different from Apache Storm; which is more of a highly distributed stream computing platform.

A CEP engine would provide you with SQL like declarative queries and OOTB high level operators like time window, temporal patterns, etc. This brings down the complexity of writing temporal queries and aggregates. CEP engines can also detect patterns in events. But most CEP engines do not support a distributed architecture.

Hence it makes sense to combine CEP with Apache Storm - for e.g. embedding Esper within a Storm Bolt. The following links would serve as good reference -

Monday, June 01, 2015

Ruminating on Shipping Containers and Docker

Today during one of the lectures at IIMB, I was introduced to a book called 'The Box' by Mark Levinson.

The book narrates the story of how the invention of the shipping container completely changed the face of global commerce. A snippet from the book -

"the cost of transporting goods was decisive in determining what products they would make, where they would manufacture and sell them, and whether importing or exporting was worthwhile. Shipping containers didn't just cut costs but rather changed the whole economic landscape. It changed the global consumption patterns, revitalizing industries in decay, and even allowing new industries to take shape."

A nice video explaining the same is available on YouTube -

A similar revolution is happening in the IT landscape by means of a new software container concept called as Docker. In fact, the logo of Docker contains an image of shipping containers :)

Docker provides an additional layer of abstraction (through a docker engine, a.k.a docker server) that can run a docker container containing any payload. This has made it really easy to package and deploy applications from one environment to the other.

A Docker container encapsulates all the code and its dependencies required to run an application. They are quite different from virtualization technology. A hypervisor running on a 'Host OS' essentially loads the entire 'Guest OS' and then runs the apps on top of it. In Docker architecture, you have a Docker engine (a.k.a Docker server) running on the Host OS. Each Docker server can host many docker containers. Docker clients can remotely talk with Docker servers using a REST API to start/stop containers, patch them with new versions of app, etc.

A good article describing the differences between them is available here -


All docker containers are isolated from each other using the Linux Kernel process isolation features.

In fact, it is these OS-level virtualization features of Linux that has enabled Docker to become so successful.

Other OS such as Windows or MacOS do not have such features as part of their core kernel to support Docker. Hence the current way to run Docker on them is to create a light-weight Linux VM (boot2docker) and run docker within it. A good article explaining how to run Docker on MacOS is here -

Docker was so successful that even Microsoft was forced to admit that it was a force to reckon with !
Microsoft is now working with Docker to enable native support for docker containers in its new Nano server operating system -

This IMHO, is going to be a big game-changer for MS and would catapult the server OS as a strong contender for Cloud infrastructure. 

Ruminating on bare metal cloud environments

Virtualization has been the underpinning technology that powered the Cloud revolution. In a typical virtualized environment, you have the hypervisor (virtualization software) running on the Host OS. These type of hypervisors are called "Type 2 hypervisor".

But there are hypervisors that can be directly installed on hardware (i.e. hard disk). These hypervisors, know as "Type 1 hypervisors" do not need a host OS to run and have their own device drivers and other software to interact with the hardware components directly. A major advantage of this is that any problems in one virtual machine do not affect the other guest operating systems running on the hypervisor.

The below image from Wikipedia gives a good illustration.