Wednesday, August 17, 2016

Load Testing Tools for IoT platforms over MQTT

While IoT platforms can be tested using traditional load testing tools if standard protocols such as HTTP are used, there are a suite of other tools that can be used if you need to test the MQTT throughput capacity of your IoT platforms.

Jotting down a list of tools that can be used to test the MQTT broker of IoT platforms.

Thursday, August 11, 2016

Ruminating on MQTT

MQTT is a lightweight messaging protocol over TCP/IP that supports the publish-subscribe paradigm. It is most suited for low bandwidth / high latency and unreliable networks and hence is a natural fit for field IoT devices.  A good MQTT primer is available here.

MQTT was the de-facto protocol for all our IoT applications, but of late we have started experimenting with MQTT even for mobile apps, after we learned that Facebook Messenger app uses MQTT :)

MQTT sessions can survive across TCP connection re-connects and thus is very useful in unreliable network conditions. Also in MQTT, you can specify the QoS level - e.g.
  • Fire and forget (QoS 0)
  • At least once (QoS 1)
  • Exactly once (QoS 2)
It is very important to check if the MQTT broker we choose supports the required QoS levels. 
MQTT supports a hierarchy of topics, so you can subscribe to a top level topic and get all the messages to the subscriber. 

Most of the popular open source message brokers such as ActiveMQ, RabbitMQ and HiveMQ already support MQTT. A good comparison of the various MQTT brokers is available here - https://github.com/mqtt/mqtt.github.io/wiki/server-support

A performance benchmark of MQTT brokers is available here - http://www.scalagent.com/IMG/pdf/Benchmark_MQTT_servers-v1-1.pdf

Sunday, July 24, 2016

Cool illustrated guide to Kubernetes

If you want to understand the magic of Kubernetes and how it can be used to manage Docker containers at a high level, then the following illustrated guide is awesome :)

https://deis.com/blog/2016/kubernetes-illustrated-guide/

Cool DevOps Tools - Gerrit and Let's Chat

I would recommend all teams to leverage the following open source tools to add more juice to their DevOps operations and improve team collaboration.

Gerrit - A valuable web-based code review tool that comes with Git embedded. Can be very useful to help your junior team-mates learn about good coding practices and refactoring. A good introduction video is here - https://www.youtube.com/watch?v=Wxx8XndqZ7A

Let's Chat - Digital Natives don't like to write long emails and abhor email chains. Use this on-premise hosted web-based chat server to create discussion rooms and share knowledge and get questions answered.

Scaling Node-RED horizontally for high volume IoT event processing

We were pretty impressed with the ease of visual programming in Node-RED. Our productivity in prototyping actually increased by 40-50% using Node-RED. We used Node-RED both on the gateways as well as the server for sensor event processing.

But we were not sure if Node-RED can be used to ingest and process a large volume of events - i.e. thousands of events/sec. I posted the question on the Google Groups Node-RED forum and got interesting answers. Jotting down the various options below.

  1. If your input is over HTTP, then you can use any of the standard load-balancing techniques to load balance requests over a cluster of nodes running the same Node-RED flow - e.g. one can use HAProxy, Nginx, etc. It is important to note that since we are running the same flow over many nodes, we cannot store any state in context variables. We have to store state in an external service such as Redis. 
  2. If you are ingesting over MQTT, then we have multiple options:
    • Option A: Let each flow listen to a different topic. You can have different gateways publish to different topics on the MQTT broker - e.g. Flow instance 1 subscribes to device/a/#   Node-RED instance 2 subscribe to device/b/#  and so on.
    • Option B: Some MQTT brokers support the concept of 'Shared Subscription' (HiveMQ) that is equivalent to point-to-point messaging - i.e. each consumer in a subsciption group gets a message and then the broker load-balances using round-robin. A good explanation on how to enable this using HiveMQ is given here - http://www.hivemq.com/blog/mqtt-client-load-balancing-with-shared-subscriptions/. The good thing about the HiveMQ support for load-balancing consumers is that there is no change required in the consumer code. You can continue using any MQTT consumer - only the topic URL would change :)
    • Option C: You put a simple Node-RED flow for message ingestion that reads the payload and makes a HTTP request to a cluster of load-balanced Node-RED flows (similar to Option 1)
    • Option D: This is an extension to Option C and entails creating a buffer between message ingestion and message processing using Apache Kafka. We ingest the message from devices over MQTT and extract the payload and post it on a Kafka topic. Kafka can support a message-queue paradigm using the concept of consumer groups. Thus we can have multiple node-red flow instances subscribing to the Kafka topic using the same consumer group. This option also makes sense, if your message broker does not support load-balancing consumers. 

Thus, leveraging the above options we can scale Node-RED horizontally to handle a huge volume of events. 

Wednesday, July 20, 2016

Extending SonarQube with custom rules

SonarQube has today become our defacto standard for code analysis. We also use it for our migration projects when we define custom rules to check if the current application can be ported to the new technology stack.

The below links give a good overview of writing custom rules in SonarQube for Java, .NET and JS.

1. Custom Rules in Java
2. Custom Rules in .NET - using the Roslyn analyzer.
3. Custom Rules in JavaScript 

By leveraging the code templates and SDK given by these tools, it is easy to create new custom rules. Behind the scenes, the analysers first create a syntax tree of the code and then for each rule, a visitor design pattern is applied to run through all the nodes and apply the check/business logic.

After doing the analysis, it is also possible to auto-remediate / refactor the source code using predefined rules. The following open source tools can be used for auto-remediation.

http://autorefactor.org/html/samples.html
http://walkmod.com/
https://github.com/facebook/pfff

Friday, July 01, 2016

Ruminating on serverless execution environments

Since the past few months, I have been closely watching the serverless execution trends in the industry. I find the whole concept of writing serverless code on the cloud extremely exciting and a great paradigm shift.

Especially for mobile and IoT apps, I think the below serverless execution environments hold great promise. Developers don't have to worry about provisioning servers and horizontal scaling of their apps - everything is seamlessly handled by the cloud. And you only pay when your code is invoked !
  1. IBM OpenWhisk - http://www.ibm.com/cloud-computing/bluemix/openwhisk/
  2. Azure Functions - https://azure.microsoft.com/en-in/services/functions/
  3. Google Cloud Functions - https://cloud.google.com/functions/docs/
  4. AWS Lambda - https://aws.amazon.com/lambda/
It is also interesting to note that IBM has made the OpenWhisk platform open source under the Apache 2 license. The entire source code is available here - https://github.com/openwhisk/openwhisk
A good article explaining the underlying components of OpenWhisk is available here

Design Patterns for Legacy Migration and Digital Modernization

While designing the approach for any legacy migration, the following design patterns crafted by Martin Fowler can be very helpful.

Instead of a rip-n-replace approach to legacy modernization, the gist of the above patterns is to slowly build a new system around the edges of the old system. To do this, we leverage event driven architecture paradigms to capture inbound events to the old system and route these events to the new system. This is done incrementally till we can kill the old system. 

Having been in the architecture field for over a decade, I have realized that 'current state' and 'future state' architectures are just temporal states of reality!

It's impossible to predict the future; we can only be prepared for the future by designing our systems to be modular and highly flexible to change. Build an architecture that can evolve with time and be future-ready and not try to be future-proof. 

Another humble realization is that - the code we are writing today; is nothing but the legacy code of tomorrow :)
And in today's fast-paced world, systems become 'legacy' within a short period of time. Legacy need not just mean a 50-year-old mainframe program. Even a monolithic Java application can be considered legacy. Gartner now defines legacy as any system that is not sufficiently flexible to meet the changing demands of the business. 

Thursday, June 23, 2016

Business benefits of Telematics

Telematics can provide tremendous value to OEMs and Tier-1 vendors in improving the quality of their products and also delivering a superior customer experience.

Jotting down my thoughts on the business value of implementing telematics.

1. Predictive Maintenance - Every organization wants to reduce the maintenance costs associated with sudden unexpected failure of components in a vehicle. The downtime that occurs due to a failure can result in huge losses for all parties involved. Thus, it is imperative to make the paradigm shift from reactive maintenance to preventive/predictive maintenance.
Telematics would provide organizations with the ability to discover issues before they cause downtime and take the appropriate proactive steps to reduce costs. Various machine learning techniques are used to identify patterns.

2. Improve Product Quality - The insights gathered from telematics can be used as a feedback loop for product development teams. It would also help the management prioritize R&D investments in the appropriate areas.

3. Optimize Warranty Costs - Telematics data can provide visibility of anticipated component/part recalls and accordingly forecast warranty reserve. Using the power of analytics on telematics data, we can also identify suspicious warranty claims and effectively structure future warranty contracts.

   

Friday, June 10, 2016

Ruminating on LoRa technology

LoRa (Long Range) is a wireless technology developed for the internet of things. It enables long-range data communications with very low power requirement - e.g. a transmitter battery can last for 10 years and communicate with nodes over 15-20 kms !

LoRa technology was initially developed by SemTech, but now has become a standard and is being further developed by the LoRa alliance.

A good tutorial to get the basic of LoRa is available here - http://www.radio-electronics.com/info/wireless/lora/basics-tutorial.php

Most IoT applications also need only to exchange small data packets with low throughput. LoRa is designed for such low data-rate connectivity. 

Monday, May 30, 2016

Ruminating on IoT datastores

The most popular data-store choice for storing a high volume of IoT sensor data are NoSQL time-series databases. 

The following link contains a good list of NoSQL time-series databases that can be used in an IoT project. We have worked with both OpenTSDB and KairosDB and found both of them to be enterprise grade. 



Tuesday, May 24, 2016

Ruminating on Power BI

We were building our dashboard using Power BI and were looking at the various options available to refresh the data.

The following link would give a good overview of the various data refresh options -https://powerbi.microsoft.com/en-us/documentation/powerbi-refresh-data/#databases-in-the-cloud

It is also possible to pump in live streaming data to Power BI using it's REST APIs - https://powerbi.microsoft.com/en-us/documentation/powerbi-developer-overview-of-power-bi-rest-api/

But we were a bit concerned about the dataset size limit of 10 GB in Power BI Pro version. An excellent article by Reza Rad - http://www.radacad.com/step-beyond-the-10gb-limitation-of-power-bi

Essentially in Power BI, you have two options - either import the entire dataset into memory OR establish a live connection between Power BI and your data-source.

Power BI uses some nifty compression techniques for all data that is imported into it - Reza observed a compression from 800 MB file to 8 MB Power BI file. Hence for all practical purposes, a 10 GB limit should suffice for most use-cases.
In case you are working with large volumes of data (GB, TB, PB), then a live connection with the data-source is the only option.

Some snippets from Reza's article:
"Live connection won’t import data into the model in Power BI. Live connection brings the metadata and data structure into Power BI, and then you can visualize data based on that. With every visualization, a query will be sent to the data source and brings the response.

Limitations of Live Connection - 
1. With Live connection, there won’t be any Data tab in Power BI to create calculated measures, columns or tables. You have to create all calculations at the data source level.
2. Multiple Data Sources is not supported.
3. No Power Q&A
4. Power Query still is available with Live Connection. This gives you ability to join tables, flatten them if you require, apply data transformation and prepare the data as you want. Power Query can also set the data types in a way that be more familiar for the Power BI model to understand.
5. You need to do proper index and query optimization at data-source."

Monday, May 23, 2016

Ruminating on API Key Security

Kristopher Sandoval has written an excellent blog post on the prevalent usage of using API keys to secure your APIs.

We must not rely solely on API keys to secure our APIs, but rather use open standards such as OAuth 2, OpenID Connect, etc. to secure access to our APIs. Many developers use insecure methods of storing API keys in mobile apps or pushing the API key to Github.

Snippets from the article (http://nordicapis.com/why-api-keys-are-not-enough/) -

"Most developers utilize API Keys as a method of authentication or authorization, but the API Key was only ever meant to serve as identification.
API Keys are best for two things: identification and analytics (API metrics).

If an API is limited specifically in functionality where “read” is the only possible command, an API Key can be an adequate solution. Without the need to edit, modify, or delete, security is a lower concern."

Another great article by NordicAPIs is on the core concepts of Authentication, Authorization, Federation and Delegation - http://nordicapis.com/api-security-the-4-defenses-of-the-api-stronghold/
The next article demonstrates how these 4 core concepts can be implemented using OAuth and OpenID Connect protocols - http://nordicapis.com/api-security-oauth-openid-connect-depth/



Serverless Options for Mobile Apps

A lot of MBaaS platforms today provide a mobile developer with tools that enable them to quickly roll out mobile apps without worrying about the backend.
In a traditional development project, we would first have to build the backend storage DB, develop the APIs and then build the mobile app.

But if you are looking for quick go-to-market approach, then you can use the following options:

  • Google Firebase Platform - Developers can use the Firebase SDK and directly work with JSON objects. All data would be stored (synched) with the server automatically. No need to write any server-side code. Also REST APIs are available to access data from the server for other purposes. 
  • AWS MBaaS: AWS Mobile SDK provides libraries for working with DynamoDB (AWS NoSQL Store). The developer just uses the DynamoDB object mapper to map objects to table columns. Again no need to write server-side code and everything is handled automatically. 
  • Other open source MBaaS platforms such as BassBox, Convertigo, etc. 

Open Source API Management Tools

For folks, who are interested in setting up their own API Management tools, given below are a few options:

HTTP proxy tools for capturing network traffic

In the past, we had used tools such as Fiddler and Wireshark to analyse the network traffic between clients and servers. But these tools need to be installed on the machine and within corporate networks, this would entail taking proper Infosec approvals.

If you are looking for a nifty network traffic capture tool that does not need installation - then 'TcpCatcher' is a good option. This is a simple jar file that can run on any m/c having Java.

Whenever we are using such proxy tools, we have two options -
1. Change the client to point to the IP of the tool, instead of the server. The tool would then forward the request to the server. (Explicit man in the middle)
2.  Configure the tool IP as a proxy in your browser.  (Implicit man in the middle)

Update: 25May2016
The TcpCatcher jar tool started behaving strangely today with an alert stating - "This version of TcpCatcher has expired. Please download the latest version". We had the latest version, but looks like this is a bug in the system.

We moved on to use Burp Suite free edition. This tool is also available as jar file and can run on any machine having Java. There is an excellent article by Oleg Nikiforov that explains how to setup burp proxy and use it to intercept all http requests. You can also download their root certificate and install it in your machine or mobile phone to log all HTTPS traffic.
We could setup Burp in under 20 mins to monitor all HTTPS traffic between our mobile apps and APIs.

Friday, May 20, 2016

Utilizing Azure AD for B2C mobile apps

We had successfully utilized Azure Active Directory for authentication of enterprise mobile apps. But can Azure AD be used for B2C apps? The answer is YES - Microsoft has released a preview version of Azure AD B2C that can be used for all customer-facing apps.

In Azure AD tenant, each user has to sign in with a long userID-email - e.g. {name}@{tenant}.onmicrosoft.com. This is not feasible for B2C apps, hence in Azure AD B2C, it is possible to log in with any email address, even plain usernames are supported. These accounts are called as Local Accounts in Azure AD B2C. Social Identity logins are also supported - e.g. Facebook, Google+, LinkedIn, and Amazon.

For more details on Azure AD B2C please refer to the following links:

https://azure.microsoft.com/en-in/documentation/articles/active-directory-b2c-faqs/

https://azure.microsoft.com/en-in/services/active-directory-b2c/



Thursday, May 12, 2016

Fundamentals of NFC communication

NFC communication happens through the exchange of NDEF (NFC Data Exchange Format) messages. An NDEF message is a binary format message that consists of a set of records - with each record containing a header and a payload.

The 'Beginning NFC' book on Safari is an excellent source for getting your basics right - https://www.safaribooksonline.com/library/view/beginning-nfc/9781449324094/ch04.html
I would highly recommend buying this book.

I always wanted to know the maximum length of an NFC message and it was answered in the above book as follows:

"In theory, there is no limit to the length of an NDEF message. In practice, the capabilities of your devices and tags define your limits. If you’re exchanging peer-to-peer messages between devices and no tags are involved, your NDEF messages are limited only by the computational capacity of your devices, and the patience of the person holding the two devices together. If you’re communicating between a device and a tag, however, your messages are limited by the tag’s memory capacity.

NDEF record payloads are limited in size to 2^32–1 bytes long, which is why the payload length field of the header is four bytes (or 2^32 bits).

It’s not a protocol designed for long exchanges because the devices need to be held literally in contact with each other."

Wednesday, May 11, 2016

Ruminating on JWT

JWT (JSON Web Token) has gained a lot of traction in the past couple of years and is slowly becoming the standard choice for all authentication and authorization communication.

The best way to learn about JWT is to head straight to their site - https://jwt.io/introduction/
I was impressed with the quality of the documentation. Core concepts were explained in a simple and lucid language. It took me days to understand SAML, whereas I could grasp even the complex concepts of JWT in minutes :)
Also we can store all authorization claims in the JWT payload, reducing the need to make another database call for checking authorization access levels.

But it is important to note that JWT specification does not talk about encrypting the payload - that is out of scope in the specification. You can encrypt the payload if you want it, but you would need to control the client/server code - i.e. JWT decoding libraries.

Since the JWT payload is not encrypted, it is of utmost important that JWTs are passed over TLS (HTTPS). Eran Hammer has written a good blog post on the perils of using a bearer token without TLS. A bearer token is called so because the 'bearer' - i.e. whoever holds the token is given all rights that the token would specify. A good analogy would be 'cash' - whoever has the cash can spend it, irrespective of who the rightful owner of that cash was.

Ruminating on Biometric security

Fingerprint scanners are becoming ubiquitous in many smartphones. There are also a few other pure software biometric solutions that are gaining traction in the market. Jotting down a few references.

http://www.eyeverify.com/  - EyeVerify maps the unique veins (blood vessels) and other micro-features in and around your eyes to create a digital key (eye-print) equal to a 50-character complex password. They claim that they are more than 99.99% accurate. It can work on any existing 1+ MP (megapixel) smart device camera !

Nuance Vocal Password: A user's voice is analyzed for hundreds of unique characteristics that are then compared to the voiceprint on file. 

Monday, May 09, 2016

Ruminating on browser fingerprinting

I was aware of sites using first-party and third-party cookies to track user activity on the web. But the use of browser fingerprinting to uniquely identify a user was quite intriguing.

The following sites can tell you the various unique characteristics of your browser environment that can be tracked by websites.

https://amiunique.org/

https://panopticlick.eff.org/

Browser fingerprinting entails collecting information like your user-agent, IP address, plug-ins installed and their version nos#, timezone, screen resolution, screen size/color dept, fonts installed, etc.

Looks like the only way you can be absolutely sure that you are not being tracked is by using the Tor browser :)

Tuesday, May 03, 2016

Ruminating on User Journey Maps

Creating user journey maps is an integral part of any UX or design thinking process. There are many ways in which you can create a user journey map. The links below would serve as guidance on the different approaches one can take to illustrate customer journey maps.

http://www.joycehostyn.com/blog/2010/03/22/visualizing-the-customer-experience-using-customer-experience-journey-maps/

I liked the sample journey maps created by Lego and Starbucks.

  • Before creating a user journey map, you have to define personas - based on customer segmentation and personality types. 
  • Then identify the customer experience journeys that you want to illustrate for each persona - e.g. transactional process of buying a car insurance, lifetime journey for an insurance customer, etc. 
  • Each journey is then broken down into various stages or phases that the customer goes through.
  • For each step, identify the customer emotions (e.g. positive, negative, neutral) and think on improving the customer experience - making it a 'wow' moment. 

Joyce also has a great presentation on SlideShare that shows many examples of customer journey maps and how they can be used to create superior customer experiences. My personal favourite was the below example that was a simple yet powerful tool to create wow moments for your customers.


There is another great blog post by ThoughWorks on facilitating collaborative design workshops.

Saturday, April 09, 2016

TeamViewer for screencasting android phones to the desktop

We were using Android screencast through the ADB and USB connectivity. But a colleague of mine recently told me that TeamViewer is another easy alternative for screencasting your android phone.

I had used TeamViewer in the past, but gradually shifted to WebEx and Join-me for my personal use. I checked out TeamViewer on Android and screen-casted it on my mac and it ran smoothly. Could observe a lag only in viewing videos.

Jotting down the steps to setup Android screencast using TeamViewer.

  1. Install TeamViewer QuickSupport on your Android Phone -https://play.google.com/store/apps/details?id=com.teamviewer.quicksupport.market&hl=e
  2. Install TeamViewer on your mac machine -  https://www.teamviewer.com/hi/download/mac.aspx
Run the Android app and it would show you a secret number that should be passed on out-of-band to the other computer to connect to the Android phone. And voila ! - you are ready to screencast..

Monday, March 28, 2016

Ruminating on Descriptive, Preventive and Prescriptive Analytics

Michael Wu had made the 3 types of analytics famous - Descriptive, Predictive and Prescriptive.
A good article on this is available on InformationWeek here. Jotting down some snippets from the article and the Wikipedia page.

"The purpose of descriptive analytics is to summarize what happened. Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure. Most management reporting - such as sales, marketing, operations, and finance - uses this type of post-mortem analysis.

Predictive analytics utilizes a variety of statistical modeling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. Predictive analytics answers the question what will happen. 
In the most general cases of predictive analytics, "you basically take data that you have; to predict data you don't have." For e.g. predicting the sentiment from social data. 

Prescriptive analytics goes beyond descriptive and predictive models by recommending one or more courses of action -- and showing the likely outcome of each decision."

Friday, March 25, 2016

Ruminating on Design Thinking

Design Thinking is a philosophy or a set of principles that can be applied to any creative endeavor. User Experience SMEs have been using similar methodologies for UX design for decades.

Design thinking also embraces many of AGILE concepts such as iterative prototyping, closer end-user engagement to get a first-hand experience of the customer business context - i.e. more human-centric or customer-centric, fail-fast culture.

Jotting down a few video links for folks who want to learn about Design Thinking.

https://hbr.org/2008/06/design-thinking  -  A good video at the bottom of the article.

http://dschool.stanford.edu/dgift/#crash-course-video  - A free video crash course on design thinking.

https://www.infosys.com/insights/human-revolution/Pages/design-thinking.aspx

http://www.forbes.com/sites/reuvencohen/2014/03/31/design-thinking-a-unified-framework-for-innovation/#abaa3d856fca


Friday, February 26, 2016

Ruminating on IoT revolutionizing the supply chain

Smart sensors (IoT) are pushing the frontiers of supply chain technology. By utilizing sophisticated sensors, Logistics Providers can enable greater visibility into their supply chains.

Smart sensors today can measure a variety of environmental variables such as GPS coordinates, temperature, light sensitivity, humidity, pressure, and shock events. These sensors then wirelessly transmit the data to the enterprise systems in real-time. For e.g. SenseAware provides smart sensors that can be dropped into any package and enable tracking of all these environmental variables throughout the shipment journey. Roambee provides intelligent sensors called 'bees' to monitor shipments.

Given below are some of the business use-cases where smart sensors can add value to the supply chain.

  1. Cold Chain: A lot of shipments need tight temperature controls during the entire journey - e.g. bio-medicines, insulin, blood, live organs, vaccines, perishable food items, etc. By using temperature sensors, organizations can monitor the temperature excursions and take corrective action like re-icing the shipment, etc. 
  2. Improve security of high-value products: By utilizing sensors, we can now track the location of each shipment in real-time and raise alerts if a shipment has deviated from its planned route. Most sensor-based platforms enable users to define geofences and trigger alerts if the shipment is moved outside of the geofence. This can be very useful for high-value products such as gems, jewelry, surgical items, etc. 
  3. Enable faster cash collection: In many industries, suppliers are unable to invoice their customers till they get confirmation of the shipment delivery. By leveraging 'light sensors', suppliers can be notified that their shipment has been opened and hence considered to be delivered. This would enable suppliers to raise quicker invoices and thus faster cash collections. 
  4. Reduce buffer inventory: Many manufacturing units maintain buffer inventory to avoid stock-out situations. Lack of information on inbound logistics (delivery dates) results in higher buffer inventory. By leveraging smart sensor-based logistics, manufacturing firms would have greater visibility into inbound goods delivery as they can track the location of shipments in real-time. This can result in tighter inventory controls and lower TCO. 
  5. Reduce insurance premiums: Over a period of time, all the data collected by sensors can be utilized by an insurance firm to reduce the premiums for customers who take tangible steps to ensure the safety and quality of the delivered goods. For e.g. If Pharma company A is doing a better job at maintaining tight temperature controls than Pharma company B, then it makes sense for the insurer to incentivize Pharma company A. 
  6. Avoid delivery penalties: Large retailers such as Walmart have stringent rules on shipment delivery times. It imposes a penalty if a shipment arrives earlier or later than its scheduled time-slot. By leveraging smart logistics, vendors can monitor their shipment delivery times and take corrective action. 

Thus, smart sensor-based logistics can provide business value across a range of industries. The combination of smart hardware sensors and scalable software platforms can help organizations build a new central nervous system for their supply chain.

Thursday, February 25, 2016

Ruminating on EMPI

EMPI (Enterprise Master Patient Index) is a consolidated hub of all patient related information that would act as a single source of truth.

Hospitals have various departmental systems such as lab systems, radiology systems, EMR systems and other Health Information systems that operate in isolation. Typically patient data is spread out across these disparate systems and it is challenging to have a 360-degree view of the patient.

Hence, hospitals create an EMPI hub that assigns a unique ID to each patient. EMPI systems use algorithms to match and link records across disparate systems. The algorithms also identify duplicate records and reduce the number of false negatives. The typical attributes used by the matching algorithms are first name, last name, DOB, sex, social security number, address and more. The matching algorithms (deterministic, probabilistic/fuzzy) must consider typos, misspellings, transpositions, aliases, etc.

Besides the internal attributes, many organizations also source data from external third parties (e.g. Equifax) that can be used for increasing the accuracy of the matching engine. This is helpful as people change addresses, phone numbers, etc. with time.

Many traditional MDM product vendors such as IBM, InfoR provide platforms to implement EMPI.
Few organizations have also started using NoSQL and other Big Data platforms for creating a customer hub as explained here.

Friday, February 12, 2016

Analysis of Healthcare spend in the US

The US department of Health has released interesting stats on the healthcare spend across various dimensions. The report is a bit old and available at http://archive.ahrq.gov/research/findings/factsheets/costs/expriach/

Some eye-opening snippets from the report are copied here -

  1. Five percent of the population accounts for almost half (49 percent) of total health care expenses.
  2. The top 5 chronic diseases are - Diabetes, Hypertension, Heart Disease, Asthma and Mood Disorders. Treatment for these diseases account for almost 50% of the total healthcare spend.
  3. 5% of Medicare fee-for-service beneficiaries accounted for 43 percent of total spending, with 25 percent accounting for 85 percent of all spending.
  4. The elderly and disabled, who constituted around 25 percent of the Medicaid population, accounted for about 70 percent of Medicaid spending.
  5. The five most expensive health conditions were heart disease, cancer, trauma, mental disorders, and pulmonary conditions

Wednesday, February 10, 2016

Ruminating on Usage based Insurance

Many Auto Insurance firms have started launching usage-based insurance (UBI) products - i.e. based on how much you drive (miles) and how you drive. These are called as PAYD (Pay as you drive) and PHYD (Pay how you drive) respectively.

Insurance firms typically ask their members to plug-in an OBD device onto their vehicles. The OBD device then syncs the data wirelessly to the backend platforms of the insurance firm.

Allstate's Drivewise program is an example of this. It was enlightening to know the various parameters that are captured by the device and transmitted back to the servers. The full list of parameters is available here - https://www.allstate.com/landingpages/drivewisedevice.aspx

Some of the parameters are:
  • GPS trail
  • VIN and Odometer readings
  • Hard Braking Events
  • High Speed Events
  • Acceleration Events
  • Vehicle Error Codes
  • A comprehensive trip report - seconds in acceleration, seconds in deceleration, miles driven in each speed band, constant speed miles, varying speed miles, etc. 
With the help of these parameters, an insurance firm can assign a 'Safe Driver' score for all their members and reward members for safe driving. There was another interesting parameter that could indicate if hypermiling took place :) 
Besides the OBD device, auto insurance firms need to invest in creating a scalable cloud platform to process this vast amount of OBD data. Technologies such as Big Data Analytics, CEP, scalable messaging and event propagation engines, next best action modules are integrated together to build such a scalable and modular UBI platform.

Friday, January 22, 2016

Combining NLP with Machine Learning

SAS has published an interesting article titled - 'Your personal data scientist'.  I have always been a fan of virtual assistants such as Siri, Google Now and depend on them for most of my day-to-day task management. We have also built some cool use-cases around using NLP for self-service.

The idea of building an NLP wrapper on top of your Analytics engine is a cool one and can have a plethora of use-cases. For e.g. A business decision maker wants to know the top 10 sales talent; the sales in a particular geography last quarter, etc.

We need to build an NLP front-end that can intelligently convert natural language text to queries that can be executed against the machine learning engine.