Monday, November 18, 2013

Twitter bootstrap has come a long way !

The latest version of bootstrap (version 3) really rocks ! I was mighty impressed with the default JavaScript controls available and the ease with which we can build responsive web designs.

For beginners, there is an excellent tutorial on bootstrap by Syed Rahman at:

Syed also had authored other interesting articles on bootstrap and they are available here:

Aah! It's Lorem ipsum

It's interesting to learn something new everyday. Today I learned about 'Lorem ipsum'.
I had seen such text so many times and was often mistaken that the browser was doing some language translation :)
You can also generate your Lorem Ipsum on this site:

Ruminating on Agile and Fixed Price Contracts

Fixed Price Contracts have become a necessary evil for all agile practitioners to deal with. Over the last few years, I am seeing more and more projects being awarded based on fixed-bid contracts. In a fixed price contract, 3 things are fixed - time, scope and price.

The dangers on fixed price contracts are known to all. But still customers want to go with fixed price because they want to reduce financial risk, choose a vendor based on price, support annual budgeting, etc.

But unfortunately fixed bid contracts have the highest risk of failure. In fact Martin Fowler calls it the 'Fixed Scope Mirage'. Martin suggests using Fixed Price, but keeping the scope negotiable. He calls out an real life case study that worked for him.

In this article, Scott Amber raises questions on the ethics behind fixed price contracts and also elaborates on the dire consequences of it. I have personally seen tens of projects compromising on quality to meet the unrealistic deadlines of fixed price projects. The same story gets repeated again and again - Project slips and pressure is put on developers to still deliver on time and on budget. They begin to cut corners, and quality is the first victim.

This InfoQ article gives some ideas on how to structure Agile fixed-bid contracts.Cognizant has another idea put on a thought-paper called as 60/40/20 that can be used on Agile projects.

IMHO, any kind of fixed price contact would only work if there is a good degree of trust between the customer and the IT vendor. One of the fundamental principles of the Agile manifesto is to "Put customer collaboration before contract negotiation".

Wednesday, November 13, 2013

Beautiful REST API documentation

InfoQ has published a list of tools that can be used for creating some cool documentation for our Web APIs.

I was particularly impressed with Swagger and the demo available here.
RAML from MuleSoft was also quite interesting. The list of projects available for RAML makes it a serious candidate for documenting our REST APIs. 

Thursday, October 24, 2013

Ruminating on HIE (Health Information Exchange)

The site gives a very understanding on HIE. This is not to be confused with HIX (Health Insurance Exchange). Snippet from the site:

The term "health information exchange" (HIE) actually encompasses two related concepts: 
Verb: The electronic sharing of health-related information among organizations 
Noun: An organization that provides services to enable the electronic sharing of health-related information

Today, most organizations are leveraging HL7 as the standard for HIE. HL7 v3 is XML based, where as the prominent HL7 v2.0 is ASCII text based.  Another standard that is used in HIE is CCR (Continuity of Care Record).

Blue Button is another attempt in simplifying HIE. Providers and Health Plans participating in the Blue Button initiative would allow you to download all your health records as a plain text file from their site.  The image below from gives you a good idea on the advantages of having access to your health record.

A new initiative called 'Blue Button+' would allow information to be downloaded as a XML (HL7 based) and enable machine to machine exchange of health information. 

HL7 FHIR– Fast Health Interoperable Resources is an interesting initiative that uses REST principles for HIE. 

Friday, October 18, 2013

Ruminating on ACO (Accountable Care Organizations)

The below article on national journal gives an excellent overview of ACO and what were the reasons it was formed.
Some snippets from the article, that would give you a good understanding of ACO. 
"ACOs are groups of providers that have been assigned a projected budget per patient. If the cost of caring for the patient comes in below that level, the group/payer shares the savings. The idea is that doctors will better coordinate care to prevent wasteful or ineffective treatment. 

With accountable care organizations, the theory is that if the provider does a good job taking care of the patient, something the insurer can track with quality metrics, the patient's health will be better, they will use fewer and less expensive services, and, therefore, they will cost less to insure. 

Medicare is running two pilot versions of the program. In one, providers may sustain losses if they're over budget but can be handsomely rewarded if they're under. The other rewards providers for coming in under budget but has no downside risk. The government is monitoring quality to make sure providers aren't skipping necessary treatment to come in under budget. 

Making ACOs work will require many organizational changes on the part of providers. They'll have to orient their systems more around quality than quantity. They'll have to track patients closely, using new analytics, to make sure their status is improving. And they may focus on high-risk, high-cost patients, using analytics and tailored interventions to help them. The payoff for improving the health of that population could be substantial."

Thursday, October 17, 2013

What is HIMSS?

The Healthcare Information and Management Systems Society (HIMSS) is a not-for-profit organization dedicated to improving healthcare through innovative use of modern information technology.

Found a wealth of information on the HIMSS site that is definitely worth a perusal: 

Tuesday, October 15, 2013

What is a Digital Enterprise?

In my previous blog post, I had ruminated on some of the business drivers that are pressing organizations towards a digital transformation. Today, someone in my team asked me a simple question - What exactly is a Digital Enterprise? As it always has been the case, there is no industry-standard definition for a Digital Enterprise, but there are some common themes that can be understood :) Jotting down my thoughts on the same.

A Digital Enterprise is any organization that successfully leverages modern disruptive technologies to gain a competitive edge in their business and create a better customer experience to drive business growth. The technology strategies that form the architecture foundation of a digital enterprise are listed below. Though many industry pundits are harping on the importance of SMAC technologies, IMHO there are other traditional technologies (and technology strategies) that would also play an important role in a 'Digital Enterprise'.

  1. Web Property Consolidation: This entails consistent user experience, uniform branding, consolidation of multiple web sites under a single ECM (Enterprise Content Management) platform. 
  2. Digitization of Transactions: Believe it of not, many legacy systems still do not have all business processes completely automated and hence require manual entry or paper work. Digitization of end-to-end transactions enabling STP (Straight Through Processing) using BPM and SOA platforms. 
  3. 360-degree view of Subject Areas: Creating a 360-degree view of your customer to enable effective cross-selling and up-selling. 
  4. Social Media Strategy:  Social as a platform for customer service. Social as a platform for VoC (Voice of Customer). Social as a platform for information dissemination. Sentiment Analysis on social sites, etc.
  5. Mobility: Multi-channel delivery, Usability, Native Apps vs. HTML5/CSS3 Apps.
  6. Big Data Analytics: Enable full-volume analytics rather than just sampling. Leverage open source platforms such as Hadoop. NoSQL stores such as MongoDB.
  7. Cloud Computing: Leverage economies of scale, faster provisioning, quicker go-to-market, etc. Build strategy for public vs. private clouds.
  8. Personalization:  Create a unique customer experience, fine-grained targeted marketing, highly personalized interactions based on past history, demographics, life events and analytics insight. 

Tuesday, October 08, 2013

Social Tools in the Enterprise

Microsoft has conducted an interesting survey on the demand for social tools in the enterprise.
The results of the survey are available here:

Monday, October 07, 2013

MongoDB caveats

Types of NoSQL data stores

There are essentially 4 types of NoSQL databases that are getting popular. As architects, when we consider polyglot persistence it's important to understand the pros and cons of each NoSQL type and then select the best fit for the given problem context.

  1. Key-Value stores: Redis, Riak
  2. Column Family stores (Aggregate-oriented): Cassandra, HBase
  3. Document oriented databases: MongoDB, CouchDB
  4. Graph databases: Neo4J
A good article comparing all these datastores is available here:  Another good article that compares these stores against the CAP theory is here:
Martin Fowler's infodeck on NoSQL is also worth a perusal. 

Ruminating on Bloom Filters

Of late, I have been trying to understand "Bloom Filters" and the reason why they are used in many popular NoSQL datastores such as Cassandra, HBase, etc.  Even Google Chrome brower uses it to match URLs of malicious sites.

So what exactly is a Bloom filter? A Bloom filter is a data-structure; essentially a byte array (aka byte vector). Using a bloom filter, we can quickly test for the containment of a particular element in a given set. For e.g. Suppose you have a set of 100K URLs that are malicious. How do you check if the entered URL belongs to the list? You can store all the 100K URLs in memory and iterate through them, but that would be expensive in terms of CPU cycles and memory requirements.

A Bloom filter is a very simple and efficient way to test for containment - with a one-sided error probability. What this means is that if the Bloom filter states that an element is NOT present in the list, then it is 100% true. But if it returns true for the containment test, then it means that the element "MAY" be there; i.e. you could have the risk of a 'false positive'. You can design your Bloom filter data structure in such a way, so that you can predict the probability of 'false positives'; for e.g. 3% probability in a set of 100 million entries. Hence a bloom filter can return 'false positives' but cannot return 'false negatives'.

With the above understanding, you can guess why Bloom filters are popular with NoSQL data-stores. It is used to quickly check whether a row exists in the database or not, before going for disk-access.

The following articles would give you a quick understading of how Bloom Filters work:

In a nutshell, in simple terms the following steps are taken:
  1. Take a byte array (all elements of the array set to 0)
  2. For each element in your set, do the following:
    • Hash the element k times (with different hash algorithms). Let's say you hashed it 2 times and the values were 7 and 10. Set the bits at these indexes in the byte array to 1. 
    • Do this for all the elements in the set. Thus you would end up with the byte array (Bloom filter) having 0s and 1s scattered around. 
  3. Now to test whether an element is present in the set of not, we hash the element using the same hash functions. Now check if there is a 1 or 0 in the Bloom filter at those index positions. If there is a 0, that means the element is definitely not in the list. If there is a 1, then the element may be there and you can proceed with the costly disk-access or service call. 
This stackoverflow discussion also has a good explanation of how bloom filters work.
Google's popular Guava library aslo has a Bloom Filter class that be used in Java projects. The trick is to select the right size of the byte array and the hash functions to use for excellent performance.
.NET implementations are available here and here

Monday, September 30, 2013

Comparison of Java APIs for Excel

The Aspose folks have put up a comparison of their Java Excel API vs popular open source APIs such as JExcelAPI and Apache POI.

The comparison is worth a persual:

Using Spring MVC to create REST style services

We have been always using Apache CXF or Axis 2 to build web services, but recently I was pretty impressed with the simplicity with which we can create REST style services using Spring MVC. OOTB, Spring MVC integrates with Jackson (for JSON serialization) and JAXB (for XML serialization).

We just have to annotate our Spring Controller class with @RequestMapping and @ResponseBody attributes and let Spring handle the rest. The following links would provide more info on the same.

Using Spring MVC with Ext-JS MVC

Saturday, September 28, 2013

Maths Formula Evaluators

A team mate today introduced me to a couple of open-source libraries that can be used for evaluating mathametical functions as strings. One can set variables and evaluate the expression.

Thursday, September 26, 2013

Are you a "chicken" or "pig" in a project?

Loved the below classic story of the Chicken and Pig. In any project you would have committed people and folks who only get involved :)

Ruminating on MongoDB concurrency

In my previous post, I had discussed about the transaction support in MongoDB. Since MongoDB implements a ReadersWriter lock at the database level, I was a bit concerned that a writer lock for a long query may block all read operations for the time the query is running.

For e.g. If we fire a mongoDB query that updates 10,000 documents that would take 5 mins. Would all reads be blocked for the 5 mins till all the records are udpated? If yes, then that would be disastrous.

Fortunately this does not happen, due to lock yielding as stated on the MongoDB site.
Snippet from the site:
"Write operations that affect multiple documents (i.e. update() with the multi parameter,) will yield periodically to allow read operations during these long write operations. Similarly, long running read locks will yield periodically to ensure that write operations have the opportunity to complete."

Read locks are shared, but "read locks" block write locks from being acquired. Write locks prevent both other writes and reads. But MongoDB operations yield periodically to keep other threads waiting for locks from starving. An interesting blog post that shows stats on the performance of MongoDB locks is available here

Wednesday, September 25, 2013

Cloud Computing Guidance for Data Protection Act complaince has compiled a good list of questions that any organization should consider before it decides to leverage cloud computing solutions. The article also gives some good examples to help the reader understand the consequences of sharing personal data on the cloud. It also serves as a good primer for Cloud Computing.

Ruminating on EDI

Despite being very old, EDI (Electronic Data Interchange) is still a dominant force in many industries; especially in healthcare. Organizations have invested so much in EDI, that it's not economically practical to shift to other standards such as XML.

A good primer on EDI is available at:

There are two popular standards for EDI: X12 and EDIFACT. X12 was designed as the standard for EDI transactions in US, and EDIFACT emerged out of X12 for international use. EDIFACT is a global EDI standard supporting multi-country and multi-industry exchange. There are a lot of differences between X12 and EDIFACT, one important difference being the fact that X12 assigns numeric values to documents whereas EDIFACT lists names or abbreviations.

In the EDI world, we have the concept of a VAN (Value Added Network). VANs are service providers that act as messaging hubs between trading partners and handling the EDI communication.

A EDI interchange message is made up of "Segments" and "Elements". Each segment begins with a two- or three-word identifier (ISA, GS, ST, N1, REF) and ends with a delimiter. The elements within each segment are also separated by a different delimiter. A good example for HIPAA X12 Claim transaction is available here

Friday, September 20, 2013

Good links to understand Health Insurance Domain

Was quite impressed with the simple and lucid language used on to explain the basic of Health Insurance in US. The following articles are worth a perusal for anyone wanting to jump-start on the fundamental concepts in Healthcare.

HMOs vs. PPOs – What Are the Differences Between HMOs and PPOs?
What Is a Health Insurance Provider Network? and also this link.
What's the Difference Between Copay and Coinsurance? and also here.
Why Is Health Care So Expensive?
What Is Hospice Care?
What’s the Difference Between Medicare and Medicaid?
Why do we need healthcare reforms? and What are the dangers of having uninsured people?
What is ACA (Affordable Care Act) or ObamaCare? Another good link here.
What Is a Health Insurance Exchange?
Bronze, Silver, Gold & Platinum–Understanding the Metal-Tier System
What are Health Saving Account and Flexible Saving Accounts?
Understanding Claims Adjudication
What are ICD-9 or ICD-10 Codes?
What is HIPAA regulation around privacy of patient information?
Myths About HIPAA, Patients and Medical Records Privacy
Explanation of Benefits - Understanding Your EOB  : EOB goes to the member
What is ERA? : ERA goes to the provider during electronic payments. EDI 835
What is Individual Mandate?
What is MLR? (Medical Loss Ratio). Another link here.
What are Consumer Directed Health Plans (CDHP)?
What is the National Practitioner Data Bank (NPDB)?
What is SBC? (Summary of Benefits and Coverage)
What is provider credentialing?
What are tiered network health plans?
Who runs Medicare and Medicaid? Centers for Medicare & Medicaid Services (CMS)
What is OIC (Office of the Insurance Commissioner)?
What is medical necessity?
What does NCQA (National Committee for Quality Assurance ) do?
What Is the Coordination of Benefits?

Thursday, September 19, 2013

Transactions in MongoDB

In my previous blog post, we had gone through some of the advantages of MongoDB in terms of schema flexibility and performance. The Metlife case study showed how we can quickly create a customer hub using MongoDB that supports flexible dynamic schemas.

Another added advantage of using MongoDB is that you don't have to worry about ORM tools, as there is no object-relational impedance mismatch. Also you don't have to worry about creating an application cache, as MongoDB be default uses all available memory for its working set.

But what about transactions? Any OLTP application would need full support for ACID to ensure reliability and consistency of data. The following articles shed good light on the transaction support in MongoDB.

MongoDB only supports "Atomicity" at the document level. It's important to remember that we can have nested documents and MongoDB would support atomicity across the nested documents. But if we need multi-object transaction support, then MongoDB is not a good fit.

Also if your application needs to "join" objects frequently, then MongoDB is also not suitable in that respect. For e.g. loading reference data (static data) from master tables with the transaction data.  
MongoDB locks (Readers-Writer lock) are at the database level; i.e. entire database gets locked during a write operation. This can result in lock contention when you have a large number of write operations.

Looking at the pros-n-cons of MongoDB, IMHO it is best suited for heavy-read operations type of application. For e.g. a consolidated high performance read-only customer hub, a data store for content management systems, product catalogs in e-commerce systems, etc.

Log4javascript - Logging on the client side

Being an ardent fan of Log4J and Log4Net frameworks, I was pleased to see a port of the framework on Javascript for client side logging - log4javascript

With the plethora of pure JS web frameworks in the market today, it's very important to have a solid reliable logging framework on the client side (browser) that would enable us to debug problems easily. The default way to display log messages is in a separate popup window, featuring advanced search and filtering functionality, as well as a command line. You can also log to an in-page log4javascript console, to Firebug, to the browsers built-in error consoles or back to the server via Ajax POST.

I liked the OOTB Ajax appender that could be used to asynchronously post messages back to the server. We would need to write the server side code for the service. This is not included by default in the library.

A few organizations are using MongoDB to store log files. Having both client-side and server-side logs stored in a document database can be very useful for audit and debugging purposes. 

Thursday, September 05, 2013

Ruminating on multicore CPUs and hyperthreading

My laptop has an i3 processor and I knew that i3 processors have 2 physical cores. But when I open Task Manager, I can see 4 CPUs. Even opening "device manager" shows 4 CPUs. So I was a bit confused on this.

A bit of research on the internet showed that the i3 processors have Hyper-Threading (HT) enabled by default. What that means is that the OS sees each 'physical' core as two 'logical cores'. This enables the operating system to schedule tasks to both the logical CPUs simultaneously.

The performance increases because whenever there is a cache miss or the CPU enters wait state due to a dependency, the other waiting thread can be put on the CPU core immendiately. This ensures optimal utilization of our CPU core resources.

So how is this different from traditional multithreading? In multi-threading the OS does the time-division multiplexing between multiple threads, whereas in HT the OS sees the core as two logical CPUs. Also in HT, when we talk about threads, it is hardware level threads. Each OS maps its OS level threads to the hardware threads.

Wednesday, September 04, 2013

NoSQL for Customer Hub MDM

The following article on informationweek is an interesting read on the use of MongoDB NoSQL for building a customer MDM solution.

MongoDB being a document oriented NoSQL database has its core strength in maintaining flexible schemas and storing data as JSON or BSON objects. Lets look at the pros and cons of using MongoDB as a MDM solution.
  1. One of the fundamental challenges faced is creating a customer hub is the aggregation of disparate data from a variety of different sources. For e.g. a customer could have bought a number of products from an insurance firm. Using a traditional RDBMS would entail complexities of joining the table records and fulfilling all the referential constraints of the data. Also each insurance product may have different fields and dimensions. Should we create a table for each product type? In MongoDB, you can store all the policies of the customer in one JSON object. You can store different types of policy for each customer with full flexibility and maintain a natural hierarchy (parent-child) of relationships. 

  2. Another problem that Insurance firms face is that of legacy policy records. Certain insurance products such as Annuity have a long life period,but a lot of regulations and business needs change over the years and your old policy records may not have all the fields that are captured in new policy records. How do you handle such cases? Having a strict schema would not help and hence a solution like MongoDB offers the necessary flexibility to store spare data. 

  3. MongoDB also has an edge in terms of low TCO for scalability and performance. Its auto-sharding capabilities enable massive horizontal scalability. It also supports OOTB memory-mapped files that is of tremendous help with the prominence of 64-bit computing and tons of available RAM. 
On the negative side, I am a bit concerned about the integrity of data in the above solution. Since there is no referential integrity, are we 100% confident on the accuracy of data? We would still need to use data profiling, data cleansing and data matching tools to find out unique customers and remove duplicates. 
Metlife is using this customer hub only for agents and has not exposed this data to the customers as there are concerns about data integrity and accuracy. But what if we need to enable the customer to self-service all his policies from a single window on the organizations portal ? We cannot show invalid data to the customer. 
Also from a skills perspective, MongoDB needs specialized resources.Its easy to use and develop, but for performance tuning and monitoring you need niche skills. 

Gender Diversity in the Architecture Field

Being passionate about gender diversity, I have always been concerned about the under-representation of women in the software architecture field. Over the years, I have endeavored to motivate and inspire my female colleagues to take up leadership roles in the technology stream; but in vain.

I have often introspected on the reasons why women don’t take or don’t make it to senior leadership roles in the enterprise architecture domain. Popular opinions range between the polarized extremes of “lack of interest” to “lack of competence” or both. I strongly beg to differ on the false assumption that women lack the logical skills to make good architects. In my career, I have seen brilliant women intellectuals with very strong programming and design skills. Women also tend to have better “EQ” (Emotional Intelligence) than men in general and this tremendously helps in areas such as decision-making, stakeholder communication and collaboration, conflict management, etc. So the “lack of competence” excuse is only for lame male chauvinists.

 I have mixed opinions on the “lack of interest” argument. Today we have compelling scientific evidence that proves that there are fundamental differences in the way the brains of men and women are hardwired. If you are not convinced on this, please peruse the books of John Gray ( Many of his books were an eye-opener for me :). Considering these gender differences, in the way our brains are structured, can we make a generalized statement that most women are not passionate enough about cutting edge technology or software architecture? For e.g. when you get a new blue-ray player, or media server or any electronic gadget at home, who is the one to fiddle with it till all the functions are known? - the husband or wife? the son or daughter? Who watches family soap operas and who watches hi-tech action movies? Are men is general more interested in technology than women? Or is it because of lack of opportunities and gender bias? I don't have a clear answer, but I know for sure that mother nature has hardwired our brains differently. Family responsibilities and children upbringing is another challenge that must be forcing many women to make a choice on what's most important to them?

Maybe it’s time to change our preconceived notions about leadership and not equate it with aggressiveness and other ‘alpha-male’ characteristics? Lack of role models also proves to be detrimental in motivating women to pursue a technical career path in the architecture field. But this is a “chicken-n-egg” problem and an initial momentum is required to correct this.

Today’s world needs software architects with versatile skills and not just hard-core technical skills. We need architects who are better at brainstorming and collaboration, who can build on the ideas of others rather than aggressively push one’s own idea. In the Agile world, collaboration and communication is a key skill and women have a natural advantage in these areas.

What should be done to encourage more women to take up careers in software architecture and design field? What proactive steps can be taken to bridge this diversity gap? Your thoughts are welcome.

Friday, August 23, 2013

Agile Survey

VersionOne has released the results of the survey it conducted on Agile practices in the industry. The report is available at:

Quite an interesting read and worth a perusal. 

Tuesday, August 06, 2013

Ruminating on Single Page Applications

In the past, I had blogged about the categorization of UI frameworks and the design philosophy behind them.
Of late, the open source market has been flooded with a plethora of SPA (Single Page Application) JavaScript frameworks. Most of these frameworks implement the MVC, MVP or the MVVP patterns on the client side and integrate with the back-end services using REST & JSON.

We are evaluating the following frameworks, that have gained popularity over the past few months. IMHO, Gmail is still the best SPA application out there :) (based on knockout.js) (Google's baby for SPA) (based on backboneJS)

A good comparison of the various frameworks is given here. IMHO, so many frameworks cause a lot of confusion and developers spend a lot of time comparing the features and choosing the best fit.

In SPA, all of the HTML, JS and CSS is downloaded in the first request. All subsequent requests are AJAX requests and only retrieve data from services and update the UI. The browser loads the JS files, requests the data from servies and then generates the HTML DOM dynamically.
The obvious advantage of SPA is the high performance of the web application and the seamless look-n-feel of the website as that of an mobile app. For SPA, there are challenges around SEO and browser history (back button) that needs to be addressed within the app. 

Amazon S3 for websites

I was always under the impression that the Amazon AWS S3 service could be used only for storing any type of media content on the cloud. For e.g. images, JS, CSS, video files, etc.

But what surprised me is that we can host an entire static web site on an Amazon S3 bucket. No need to use a web server on an EC2 instance. Caveat Emptor: S3 does not support dynamic web sites.

The following links are handy to understand how to configure a S3 bucket for static web hosting. You can also pick up a good domain name for your static site using the domain name services provided by Amazon’s “Route 53″ DNS service. Amazon also offers a Content Delivery Network service(CloudFront CDN) that replicates content across Amazon’s network of global edge locations. So the S3 bucket would serve as the origin server and the CDN would provide the edge servers.

Tuesday, July 23, 2013

Financial Advisors and Social Media

A lot of financial advisors are using Social Media to interact with their customers and engage them. But as pointed out on this link, there are certain SEC regulatory constraints that financial advisors should be wary about while using social media.

First, all disclosures (earnings, operations, etc.) should be made available to all parties at the same time. We cannot share information only on social media and not through other channels.
The second challenge is around handling negative comments on social media sites. How to officially respond? Can anyone put comments on the Facebook page?
The third important facet is that of the customers privacy. If an advisor responds to a customers question on twitter or facebook and inadvertently discloses some kind of financial information about the customer, then it could be considered as a violation of privacy by SEC.

Also from a regulatory perspective, it could be required to store all communication and retain them for future. There are a number of players in the market that provide services for achieving all social media interactions. E.g.

Sunday, July 14, 2013

How to get geographical location from IP address?

There are a lot of free services available that would enable you to roughly get the geographical address of your ISP provider from your IP address. To get the exact street address, it would be required to contact the ISP provider and get further details.

There is a popular IP address mapping software used by enterprises called Quova, that has been renamed to Neustar IP Intelligence now. More information can be found at this link.

IP geo-location software can be used to detect fraud and target advertising. Almost all online e-commerce stores use such services or products. 

Street address normalization for MDM solutions

Quite often we need to verify street addresses or normalize them to check for duplicate customers. The following article gives a good overview of the various techniques we can use for this problem context.

There are also a lot of commercial and open source software for address verification and geo-coding. For e.g.

Saturday, July 13, 2013

Ruminating on "Headless" concepts

Today afternoon, within a span of 2 hours, I have heard the term 'headless' being put ahead of multiple words. For e.g. headless system, headless app, headless service and finally headless UI testing !

A headless system is essentially a system with no monitor and IO components (mouse, keyboard, etc.).
A headless app is an application that does not have a UI to interact with - analogous to background demon threads, etc. A headless service only has backend logic and no frontend UI.

So essentially the team headless is used to address the concept of not having any user interface. So what does headless UI testing mean?
Check out PhantomJS - a headless testing Webkit that makes this possible. A good article summarizing this concept is available here

Thursday, July 11, 2013

Importance of Geocoding for business

Geocoding is the process of finding out the geographical coordinates (latitude/longitude) from street address, post code, etc. A lot of organizations are interested in geocoding their customer addresses, because it enables them to serve the customer better. For e.g.
  • A healthcare provider can use the geocoding information of its customers, to help them locate the nearest physician or pharmacy. 
  • An insurance firm can use geocoding information to find out the actual physical location of an insured property and determine the underwriting risk for floods, earthquakes, etc.
  • E-commerce sites usually have a find-a-nearby-store option that enables customers to find out the nearest store to pick up their goods from, based on their GPS coordinates. 
Thus geocoding can help a business in answering many questions that would help it drive growth. For e.g.
  • What geographical area to most of our customers come from?
  • Are there geographical areas where we have not penetrated? If yes, Why?
  • Is our sales force aligned with our customer territories? 

Householding and Hierarchy Management

Found this rather interesting article on householding concepts in MDM. Many organizations struggle to define business rules to identify customers belonging to a same home or household.

Another interesting read is this blog post on how "" is using hierarchy management to link VIPs/actors together. The graphs on are interactive and worth a perusal if you are a hollywood buff :)

Wednesday, July 10, 2013

Ruminating on Digital Transformation Initiatives

A lot of organizations are embarking on multi-year, multi-million dollar value, digital transformation initiatives.
Many large organizations grew by M&A and this has resulted in multiple brands and disparate web properties with inconsistent user experience. Consolidation of the various web domains/properties and brands is the primary business driver behind digital initiatives. The philosophy of having "One Face to the Customer" is at play here.

For e.g. an insurance firm may have different LOBs across life, auto & property and retirement. In large organizations, each LOB operates as a separate entity and have their own IT teams. In such cases, creating a shared services team for digital initiatives makes sense.

Each LOB could have their own web presence and different corporate branding. Many a times, end customers do not realize that the different products/policies that they have bought belong to the same insurance firm. The customers are always routed to different websites and have separate logins for each site.

For the insurance firm's perspective, there is no 360-degree view of the customer. This severely limits their ability to service customers and aggressively cross-sell or up-sell to them.  To resolve this business challenge, organizations should embrace a paradigm shift in their thought process - from being product/policy centric to being customer centric. From an IT perspective, this could entail creating a Customer Hub (MDM) and having a consolidated customer self-service portal that would serve as a single window for servicing all of the customer policies or products.

Having a customer MDM solution would also enable organizations to run better analytics around demographic information and past customer behaviour. This in turn, would help in delivering a more personalized user experience and fine grained marketing.

Another important driver for digital transformations is the need to support multi-channel delivery. Today content delivery to end users on mobiles, tablets are considered as table-stakes. Defining and executing an effective mobile strategy is of paramount importance in any digital initiative.

Organizations are also actively looking at leveraging Social Media and Gamification techniques to engage better with customers. It's also important to choose a powerful Content Management tool that would enable faster go-to-market for digital content changes; controlled and owned by business, rather than IT. 

Thursday, June 20, 2013

WS-Security Username Token Implementation using WCF

The following article on microsoft site is an excellent tutorial for beginners looking to use open standards such as WS-Security to secure their WCF services. Perusal highly recommended.

WS-Security with Windows Communication Foundation

Tuesday, June 18, 2013

Contracts in REST based services

Traditionally REST based services did not have formal contracts between the service consumer and service provider. There used to be a out-of-band agreement between them on the context of the message being passed.

Also the service provider (e.g. Amazon) would publish some API libraries and sample code across popular languages such as Java. C#.NET, etc. Most developers would easily understand how to use the service by looking at the examples.

Sometime back, there was a debate on InfoQ on the topic of having standards for describing contracts for REST based services. There were interesting differences of opinion on this.

There was a standard defined called WADL that was the equivalent of WSDL for REST based services. Apache CXF supports WADL, but I have not seen many enterprises embracing this. Also WADL supports only XML payloads. What about JSON payloads?

I like the DataContract abstraction in .NET WCF. Using WCF configuration, we can specify where the binding should happen as XML or JSON in a REST service. 

Monday, June 17, 2013

Ruminating on Claims based Identity

Most folks still stick with RBAC (Role Based Access Control) mechanisms for enabling security in their applications. A Claims based Identity solution is more comprehensive than RBAC and offers much more flexibility in implementing security.

In RBAC, typically the onus of authenticating users and checking permissions lies on the application itself. In Claims based solutions, the security constraints of the application are decoupled from the application business logic. The application receives a security token from a STS (Security Token Service) it trusts and thus does not have to worry about authenticating the user or extracting  security related info regarding the user. All the required information is available in the STS security token as a set of claims.

Thus a Claims based Identity solution decouples of application from the complexities of authentication and authorization. Thus the application is isolated from any changes to the security policies that need to be applied.

The following articles are of great help to any newbie in understanding the fundamentals of Claim based Identity solutions.

A Guide to Claims Based Identity - An excellent guide to understand some fundamental concepts around tokens, claims and STS.

Microsoft Windows Identity Foundation (WIF) Whitepaper for Developers - A very good article around WIF basics and also includes sample code to extend IPrinciple objects and intercept security token processing.

Claims Based Architectures - One of the best online articles that explains how Web SSO and thick client SSO can be implemented using Claims. 

Tuesday, June 11, 2013

Ruminating on Data Masking

A lot of organizations are interested in 'Data Masking' and are actively looking out for solutions around the same. IBM and Informatica Data Masking tools are leaders in Gartner's magic quadrant.

The need for masking data is very simple - How do we share enterprise data that is sensitive with the development teams, testing teams, training teams and even the offshore teams?
Besides masking data, there are other potential solutions for the above problem - i.e. using Test Data Creation tools and UI playback tools. But data masking and subsetting continue to remain popular means of scrambling data for non-production use.

Some of the key requirements for any Data Masking Solution are:
  1. Meaningful Masked Data: The masked data has to be meaningful and realistic. It should be capable of applying and satisfying all the business rules. For e.g. post codes, credit card numbers, SSN, bank account numbers, etc. E.g. if we change DOB, should we also change 'Age'. 
  2. Referential Integrity: If we are scrambling primary keys then we need to ensure that the relationships are maintained. One technique is to make sure that the same scramble functions are applied to all of the related columns. Sometimes, if we are masking data across databases, then we would need to ensure integrity across databases.
  3. Irreversible Masking: The masked data should be irreversible and it should be impossible to  recreate sensitive data. 
A good architecture strategy for building a data-masking solution is to design a Policy driven Data Masking Rule Engine. The business users can then define policies for masking different data-sets.

A lot of data masking tool vendors are now venturing beyond static data masking. Dynamic Data Masking is a new concept that masks data in real time. Also there is a growing demand for masking data in unstructured content such as PDF, Word or Excel files.

Wednesday, June 05, 2013

Data Privacy Regulations

As architects, we often have to design solutions within the constraints of data privacy regulations such as HIPAA, PCI, UK Data Protection Act, SOX, etc.

The exact data privacy requirements differ from one regulatory act to the other. But there are some common themes or patterns, as defined below, that would help us to structure our thoughts, when we think of protecting sensitive data using technology solutions.
  • Data at Rest: Protecting all data at rest in databases, files, etc. Most databases offer TDE features. Data in flat files need to be encrypted; either by using file encryption of disk volume encryption. Another important aspect is data on portable mobile/tablet devices. Also data on portable media such as USB, CDs, DVDs needs to be considered. 
  • Data in Motion: Use secure protocols such as Secure-FTP, HTTPS, VPN, etc. Never use public FTP servers. All remote access to IT systems should be secure and encrypted.
  • Data in Use: Data in OLTP databases that is created and updated regularly. E.g. Online data entry using portals, data entry in excel sheets, data in generated reports, etc. 
  • Data that is archived:  Data could be achieved either in an online archive or offline archive. Need to protect the data as per the privacy requirements. Here is the link to an interesting HIPAA violation incident. 
Besides Data Security, most of these Regulatory Acts also cover rules around physical security, network security, etc. 

Tuesday, May 21, 2013

Keeping pace with technology innovations in the Travel industry

I have been following the following 2 sites for the last few months to keep myself updated on the interesting trends happening in the Travel Industry - especially around technology innovations.

Found another interesting strategy presentation on the Thomas Cook website that is worth a perusal. The group seems to be on track to deliver results based on a sound business strategy. Found their strategy of exclusive 'Concept Hotels' quite intriguing. 

Friday, April 19, 2013

Portals vs Web Apps

I have often debated on the real value of Portal servers (and the JSR 268 portlet specification). IMHO, portal development should be as simple and clean as possible and I personally have always found designing and developing portlets to be very complex comparatively.

Kai Wähner has a good article on Dzone that challenges the so-called advantages of portal servers. Jotting down some of the excerpts from the article and also sharing my thoughts.
Let's start by dissecting the advantages of portals one-by-one.

  • SSO:  With so many proven solutions and open standards for SSO, I think there is little value is utilizing the SSO capabilities of a portal server.
  • Aggregation of multiple applications on a single page: This can easily be achieved using iFrames or any other MashUp technology. For e.g. In SharePoint, we have a page-viewer web part that renders any remove web page as a IFrame.
  • Uniform appearance: Just need a good CSS3 developer to create some good style-sheets. Also all web application frameworks have the concept of Master Page and page templates.
  • Personalization: Depending on the complexity of personalization, we can achieve it using role based APIs or some custom development. 
  • Drag and Drop Panels: Again easily done using JQuery UI widgets (pure-javascript). Just check out the cool
  •  Unified Dashboard: Again can be done using IFrames or JS components from Ext-JS or JQuery

Hence I feel we really need to think hard and ask the right questions before we blindly jump on the portal bandwagon and spend millions of dollars on commercial portal servers.
This link also lists down some questions that are handy during the decision making process.

Marketing folks of portal servers often tout on the personalization features of Portal servers. I would like to remind them that the most personalized website in the world - "Facebook" runs on PHP :)

Tuesday, April 16, 2013

Web Application Performance Optimization Tips

The following link on Yahoo Developer network on Web App Performance is timeless ! I remember having used these techniques around 8 yrs ago and all of them are still valid. A must read for any web application architect.

Another cool utility provided by Yahoo for optimizing image file size on a web page is "".
Just upload any image to this site and it would optimize the image size and allow the new image file to be downloaded for your use. 

Tuesday, April 02, 2013

Ruminating on Availability and Reliability

High availability is a function of both hardware + software combined. In order to design a highly available infrastructure, we have to ensure that all the components are made highly available and not just the database or app servers. This includes the network switches, SSO servers, power supply, etc.

The availability of each component is calculated and then we typically multiply the availabilities of all components together to get the overall availability, usually expressed as a percentage.

Common patterns for high availability are: Clustering & load-balancing, data replication (near real time), warm standby servers, effective DR strategy, etc. From an application architecture perspective availability would depend on effective caching, memory management, hardened security mechanisms, etc.

Application downtime occurs not just because of hardware failures, but could be due to lack of adequate testing (including unit testing, integration testing, performance testing, etc.) It's also very important to have proper monitoring mechanisms in place to proactively detect failures, performance issues, etc.

So how is availability typically measured? It is expressed as a percentage; for e.g. 99.9% availability.
To calculate the availability of a component, we need to understand the following 2 concepts:

Mean Time Between Failure (MTBF): It is defined as the average length of time the application runs before failing. Formula: Total Hours Ran / No. of failures (count)

Mean Time To Recovery (MTTR): It is defined as the average length of time needed to repair and restore service after a failure. Formula: Hours spend on repair / Failure Count

Formula: Availability = (MTBF / (MTBF + MTTR)) X 100

Using the above formula, we get the following percentages:

3 nines (99.9% availability) represents about ~ 9 hours of service outage in a single year. 
4 nines (99.99% availability) come to ~ 1 hour of outage in a year. 
5 nines (99.999% availability) represents only about 5 minutes of outage per year.

Monday, March 25, 2013

Ruminating on Server Side Push Techniques

The ability to push data from the server to the web browser has always been a pipe-dream for many web architects. Jotting down the various techniques that I used in the past and the new technologies on the horizon that would enable server side push.
  • Long Polling (Comet): For the last few years, this technique has been most popular and is used behind the scenes by multiple ajax frameworks, such as DoJo, WebSphere Ajax toolkit, etc. The fundamental concept behind this technique is for the server to hold on to the request and not respond till there is some data. Once the data is ready, push the data to the browser as the HTTP Response. After getting the response, the client would again make a new poll request and wait for the response. Hence the term - "long polling". 
  • Persistent Connections / Incomplete Response: Another technique in which the server never ends the response stream, but always keeps it open. There is a  special MIME type called multipart/x-mixed-replace, that is supported by most browsers expect IE :) This MIME type enables the server to keep the response stream open and send data in deltas to the browser.
  • HTML 5 WebSockets: The new HTML 5 specification brings to us the power of WebSockets that enable full-duplex bidirectional data flow between browsers and servers. Very soon, we would have all browsers/servers supporting this. 

Monday, March 18, 2013

What is the actual service running behind svchost.exe

I knew that a lot of Windows Services (available as DLLs) run under the host process svchost.exe during start-up. But is there any way to find out what is the actual mapping service behind each svchost.exe? Sometimes, the svchost.exe process occupies a lot of CPU/memory resources and we need to know the actual service behind it.

The answer is very simple on Windows 7. Press "Ctr+Shift+Esc" to open the Task Manager.
Click on "Show processes from all users". Just right click on any svchost.exe process and in the context menu, select "Go To Service". You would be redirected to the Services Tab, wherein the appropriate service would be highlighted.

Another nifty way is to use the following command on the cmd prompt:
tasklist /svc /fi "imagename eq svchost.exe"

Tuesday, March 12, 2013

Behind the scenes..using OAuth

Found the following cool article on the web that explains how OAuth works behind the scenes..

OAuth 2.0 is essentially an authentication & authorization framework that enables a third-party application to obtain limited access to any HTTP service (web application or web service). It essentially is a protocol specification (a token-passing mechanism) that allows users to control which applications have access to their data without revealing their passwords or other credentials.Thus it can also be used for delegated authentication as mentioned here.

OAuth is also very useful when you are exposing APIs that third party applications may use. For e.g. all Google APIs can now be accessed using OAuth 2.0 protocol specification. In fact, for web-sites and mobile apps running on Android/iOS, Google has released a solution called as Google+ Sign-In for delegating authentication to Google. More information is available here:

The basic steps for any application to use OAuth is to first register/create a Client ID (client key) on the OAuth Authorization Server (e.g. Google, Facebook) along with a secret. (This is the crux of the solution, which I had missed in my earlier understanding :) Since the application is registered with the Service Provider, it can make requests now for access to services.) Then create a request token that would be authorized. Finally create a new pair of access tokens that would be used to access the services.
To understand these concepts, Google has also made a cool web app called OAuth PlayGround, where developers can play around with OAuth requests.

A good illustration for OAuth is provided on the Magento website here

Thursday, March 07, 2013

Jigsaw puzzles in PowerPoint and Visio

Found this cool tutorial on the web that can be used to make jigsaw puzzles in PowerPoint or Visio. One of my friends actually used this technique to create a good visualization on technology building blocks.

Monday, March 04, 2013

Long file names on Windows

Just spend the last 30 mins in total frustration on the way Windows 7 handles long file names. I was essentially trying to copy "LifeRay Social Office" portal folder structure from one location to the other.

On my Windows 7 desktop, the copy command from Explorer won't just work ! No error message, no warning, just that the window disappears. I did a remote desktop to the server and tried to copy from there. On the Windows Server 2000 box, I atleast got an error message - "Cannot copy file". But that's it, no information on why copy did not work.

I debugged further and tried to copy each individual file and only then did I get a meaningful error message - "The file Name(s) would be too long for the destination folder." So essentially the total path (string-length) of the files were long enough for Windows to go awry.

A quick google search showed that this is a core windows problem. Windows Explorer (File Explorer in Windows 8 and Windows Server 2012) uses ANSI_API calls which is limited to 260 characters in the paths. There are some hot fixes available as a patch on windows, but did not try them yet.

So what are the options then? MS has released a tool called as RoboCopy that can handle this problem. Another popular tool is LongPathTool. In my case, fortunately I had JDK installed on my box. I used the jar command to inflate/deflate the folder structure between copying and it worked like a charm :) Strangely WinZip on Windows 7 did not work as it threw some weird error long file names.

There is another headache due to long file names. You also cannot delete such directories form the windows explorer !. I tried using the rmdir command from the command prompt and thankfully that worked !!!

Monday, February 11, 2013

Ruminating on Visualization Techniques

The following link contains a good illustration of the various kinds of visualization techniques one can use to communicate ideas or clarify the business value behind the data.

We are also experimenting with a new cool JS library called D3.js. Some pretty good visualization samples are available here.

This library can be used for basic charting and also can be used for impressive visualizations. We found this tutorial to be invaluable in understanding the basics of D3. 

Anscombe's Quartet

We often use statistical properties such as "average", "mean", "variance", "std. deviation" during performance measurement of applications/services. Recently a friend of mine pointed out that only relying on calculated stats can be quite misleading. He pointed me to the following article on Wikipedia.'s_quartet

Anscombe's quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed.
By just looking at the data sets, its impossible to predict that the graphs would be so different. Only when we plot the data points on a graph, we can see the way the data behaves. Another testimony to the power of data visualization !

Useful Command Line Tools for Windows 7

Jotting down some Win 7 commands that I often use for quick access to information. One can directly type these commands in the 'Run' window.

  • msconfig: Get to know the programs that are configured to start during boot. Disable those programs that you are not interested in.
  • msinfo32: Quick summary of your system information. Give detailed info on hardware resources, OS details, system properties, etc.
  • control: Quick access to the Control Panel
  • eventvwr: Quick access to the Event Viewer
  • perfmon: Useful tool to monitor the performance of your system using performance counters.
  • resmon: Great tool to check out the resource utilization of CPU, Memory and Disk IO.
  • taskmgr: Quick access to Task Manager
  • cmd: Opens the command prompt
  • inetcpl.cpl : Opens the internet settings for proxy, security etc. 

Ruminating on Big Data

Came across an interesting infodeck on Big Data by Martin Fowler. There is a lot of hype around Big Data and there are tens of pundits defining Big Data in their own terms :) IMHO, right now we are at the "peak of inflated expectations" and "height of media infatuation" in the hype cycle.

But I agree with Martin on the fact that there is considerable fire behind the smoke. Once the hype dies down, folks would realize that we don't need another fancy term, but actually need to rethink about the basic principles of data-management.

There are 3 fundamental changes that would drive us to look beyond our current understanding around Data Management.
  1. Volume of Data: Today the volume of data is so huge, that traditional data management techniques of creating a centralized database system is no longer feasible. Grid based distributed databases are going to become more and more common.
  2. Speed at which Data is growing: Due to Web 2.0, explosion in electronic commerce, Social Media, etc. the rate at which data (mostly user generated content) is growing is unprecedented in the history of mankind.  According to Eric Schmidt (Google CEO), every two days now we create as much information as we did from the dawn of civilization up until  2003. Walmart is clocking 1 million transactions per hour and Facebook has 40 billion photos !!! This image would give you an idea on the amount of Big Data generated during the 2012 Olympics. 
  3. Different types of data: We no longer have the liberty to assume that all valuable data would be available to us in a structured format - well defined using some schema. There is going to be a huge volume of unstructured data that needs to be exploited. For e.g. emails, application logs, web click stream analysis, messaging events, etc. 
These 3 challenges of data are also popularly called as the 3 Vs of Big Data (volume of data, velocity of data and variety of data). To tackle these challenges, Martin urges us to focus on the following 3 aspects:
  1. Extraction of Data: Data is going to come from a lot of structured and unstructured sources. We need new skills to harvest and collate data from multiple sources. The fundamental challenge would be to understand how valuable some data could be? How do we discover such sources of data?
  2. Interpretation of Data: Ability to separate the wheat from the chaff. What data is pure noise? How to differentiate between signal and noise? How to avoid probabilistic illusions?
  3. Visualization of Data: Usage of modern visualization techniques that would make the data more interactive and dynamic. Visualization can be simple with good usability in mind. 
As this blog entry puts it in words - "Data is the new oil ! Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value."

NoSQL databases are also gaining popularity. Application architects would need to consider polyglot persistence for datasets having different characteristics. For e.g. columnar data stores (aggregate oriented), graph databases, key-value stores, etc.