Tech Talk: Analytics

Showing posts with label Analytics. Show all posts

Monday, March 28, 2016

Ruminating on Descriptive, Preventive and Prescriptive Analytics

Michael Wu had made the 3 types of analytics famous - Descriptive, Predictive and Prescriptive.
A good article on this is available on InformationWeek here. Jotting down some snippets from the article and the Wikipedia page.

"The purpose of descriptive analytics is to summarize what happened. Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure. Most management reporting - such as sales, marketing, operations, and finance - uses this type of post-mortem analysis.

Predictive analytics utilizes a variety of statistical modeling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. Predictive analytics answers the question what will happen.
In the most general cases of predictive analytics, "you basically take data that you have; to predict data you don't have." For e.g. predicting the sentiment from social data.

Prescriptive analytics goes beyond descriptive and predictive models by recommending one or more courses of action -- and showing the likely outcome of each decision."

Friday, January 22, 2016

Combining NLP with Machine Learning

SAS has published an interesting article titled - 'Your personal data scientist'. I have always been a fan of virtual assistants such as Siri, Google Now and depend on them for most of my day-to-day task management. We have also built some cool use-cases around using NLP for self-service.

The idea of building an NLP wrapper on top of your Analytics engine is a cool one and can have a plethora of use-cases. For e.g. A business decision maker wants to know the top 10 sales talent; the sales in a particular geography last quarter, etc.

We need to build an NLP front-end that can intelligently convert natural language text to queries that can be executed against the machine learning engine.

Thursday, January 15, 2015

Applying Analytics to Clinical Trails

The below link is a good article on using Big Data Analytics to improve the efficiency of clinical trials.

http://archive.expresspharmaonline.com/specials/pharma-technology-review/2126-leveraging-data-science-to-accelerate-clinical-trial-results

Snippets from the article -

"Recruiting patients has been a challenge for pharmaceutical companies. 90 per cent of trials are delayed with patient enrollment as a primary cause.
Effective target segmentation for enrollment is a key to success. Traditional methods of enrollment rely upon campaign and segmentation based on disease lines across wider populations. Using data science, we can look at the past data to identify proper signals and help planners with more precise and predictive segmentation.

Data scientists will look at the key attributes that matter for a given patient to successfully get enrolled. For each disease type, there may be several attributes that matter. For example, a clinical trial that is focused on a new diabetes medication targets populations’ A1C levels, age group, demographics, outreach methods, and site performance. Data science looks at the above attribute values for the target users past enrollment data and then builds ‘patient enrollment propensity’ and ‘dropout propensity’ models. These models can generate multi variant probabilities for predicting future success.

In addition to the above modeling, we can identify the target segment’s social media footprint for valuable clues. We can see which outreach methods are working, and which social media channels the ‘generation Googlers’ are using. Natural language processing (NLP) techniques to understand the target population’s sentiment on clinical trial sites, physicians, and facilities can be generated and coded into a machine understandable form. Influencer segments can be generated from this user base to finely tune campaign methods for improving effectiveness."

Tuesday, February 11, 2014

Examples of Real Time Analytics in Healthcare

The Healthcare industry has always been striving for reducing the cost of care and improving the quality of care. To do this, Payers are focusing more on prevention and wellness of members.

Digitization of clinical information and real-time decision making are important IT capabilities that are required today. Given below are some examples of real time analytics in Healthcare.

Hospital acquired infections are very dangerous for premature infants. Monitors can detect patterns in infected premature babies up to 24 hrs in advance before any symptoms are shown. This real time data can be captured and run through a real-time analytics engine to identify such cases and ensure that adequate treatment is given ASAP.
Use real time event processing to act as an early warning system, based on historical patterns or trends. For e.g. If few members are exhibiting behavior patterns of prior patients who relapsed into critical condition, then we can plan a targeted intervention for these members.

Ruminating on Decision Trees

Decision trees are tree-like structures that can be used for decision making, classification of data, etc.
The following simple example (on the IBM SPSS Modeler Infocenter Site) shows a decision tree for making a car purchase.

Another example of a decision tree that can be used for classification is shown below. These diagrams are taken from the article available at - www.cse.msu.edu/~cse802/DecisionTrees.pdf‎

Any tree with a branching factor of 2 (only 2 leafs) is called as a "binary decision tree". Any tree with a variety of branching factors can be represented in an equivalent binary tree. For e.g. the below binary tree will evaluate to the same result as the first tree.

It is easy to see that such decision tree models can help us in segmentation. For e.g. segmentation of patients into high-risk and low-risk categories; high-risk credit vs. low risk credit; etc.
An excellent whitepaper on Decision Trees by SAS is available here.

Decision trees can also be used in predictive modeling - this is known as Decision Tree Learning of Decision Tree Induction. Other names for such tree models are classification trees or regression trees; aka Classification And Regression Tree (CART).
Essentially "Decision Tree Learning" is a data mining technique using which a decision tree is constructed by slicing and dicing the data using statistical algorithms. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments.
For e.g. On Wikipedia, there is a good example of a decision tree that was constructed by looking at the historic data of titanic survivors.

Decision Tree constructed through Data Mining of Titanic passengers.

Once such a decision tree model has been created, it can be exported as a standard PMML file. This PMML file can then be used in a real time scoring engine such as JPMML.

There is another open source project called as 'OpenScoring' that uses JPMML behind the scenes and provides us with a REST API to score data against our model. A simple example (with probability prediction mode) for identifying a flower based on attributes is illustrated here: https://github.com/jpmml/openscoring

Decision Trees can also be modeled in Rule Engines. IBM iLog BRMS suite (WODM) supports the modeling of rules as a Decision Tree.

Monday, February 11, 2013

Ruminating on Big Data

Came across an interesting infodeck on Big Data by Martin Fowler. There is a lot of hype around Big Data and there are tens of pundits defining Big Data in their own terms :) IMHO, right now we are at the "peak of inflated expectations" and "height of media infatuation" in the hype cycle.

But I agree with Martin on the fact that there is considerable fire behind the smoke. Once the hype dies down, folks would realize that we don't need another fancy term, but actually need to rethink about the basic principles of data-management.

There are 3 fundamental changes that would drive us to look beyond our current understanding around Data Management.

Volume of Data: Today the volume of data is so huge, that traditional data management techniques of creating a centralized database system is no longer feasible. Grid based distributed databases are going to become more and more common.
Speed at which Data is growing: Due to Web 2.0, explosion in electronic commerce, Social Media, etc. the rate at which data (mostly user generated content) is growing is unprecedented in the history of mankind. According to Eric Schmidt (Google CEO), every two days now we create as much information as we did from the dawn of civilization up until 2003. Walmart is clocking 1 million transactions per hour and Facebook has 40 billion photos !!! This image would give you an idea on the amount of Big Data generated during the 2012 Olympics.
Different types of data: We no longer have the liberty to assume that all valuable data would be available to us in a structured format - well defined using some schema. There is going to be a huge volume of unstructured data that needs to be exploited. For e.g. emails, application logs, web click stream analysis, messaging events, etc.

These 3 challenges of data are also popularly called as the 3 Vs of Big Data (volume of data, velocity of data and variety of data). To tackle these challenges, Martin urges us to focus on the following 3 aspects:

Extraction of Data: Data is going to come from a lot of structured and unstructured sources. We need new skills to harvest and collate data from multiple sources. The fundamental challenge would be to understand how valuable some data could be? How do we discover such sources of data?
Interpretation of Data: Ability to separate the wheat from the chaff. What data is pure noise? How to differentiate between signal and noise? How to avoid probabilistic illusions?
Visualization of Data: Usage of modern visualization techniques that would make the data more interactive and dynamic. Visualization can be simple with good usability in mind.

As this blog entry puts it in words - "Data is the new oil ! Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value."

NoSQL databases are also gaining popularity. Application architects would need to consider polyglot persistence for datasets having different characteristics. For e.g. columnar data stores (aggregate oriented), graph databases, key-value stores, etc.

Monday, February 20, 2012

Business Intelligence vs Analytics

My collegue Sandeep Raut has a very simple blog-post explaining the differences between traditional BI and Analytics. Summarizing a few key points from the blog below.

"BI traditionally is concerned with creating reports on past data or even current live data. We create OLAP cubes using which we can slice & dice the data, even do a drill down. Analytics is about analyzing the data using mathematics/statistics to identify patterns. These patterns can then be used to predict what may happen in the future. Analytics is about identifying relationships between key data variables that were unknown before. It is about surfacing unknown patterns."

But in my humble opinion, should Analytics not be a subset of BI? I can understand the hype that product vendors create to differentiate their products in the market, but can Analytics exist in isolation to BI? Even predictive data analysis using "realt-time" data/text mining techniques would logically fall under BI....
After all BI is all about meeting business needs through actionable information !
Maybe it is just a game of words and semantics. I remember a few years back, the term DSS (Decision Support Systems) was more widely used than BI :)

Tech Talk