Monday, February 13, 2012

Ruminating of Data Virtualization

The industry is flooded with confusing terms when it comes to understanding 'Data Virtualization'. We have IaaS (Information as a service), Data Services, EII (Enterprise Information Integration), Data Federation, etc. and so on! The point is that there are no industry standard definitions for these analyst-coined terms and there is a lot of overlap between them.

Rick Lans tries to clear the cloud with some simple definitions here. Another interesting post by Barry Devlin throws more light on the concept of data virtualization.

The core concept behind data virtualization is to create an abstraction layer (Data Access Layer) that hides the complexities of the underlying disparate data sources and provides a unified view of the enterprise data to the applications. This can be implemented using "SOA style" Data Services or creating a virtual data layer that can be queried using SQL-like semantics. More info can be found at these links: Link1 & Link2

RedHat has a nice whitepaper explaining the concept of Data Services in a SOA environment. This post explains the benefits of data virtualization. Composite Software is a leader in data virtualization techniques and has shared a couple of interesting case studies that demonstrate the use of their data virtualization platform.

One thought that came to my mind was regarding the challenges in accessing NoSQL data from the data virtualization layer. While some type of NoSQL datastores such as XML documents, Key/Value pairs can be exposed as a relational SQL view, it may not be possible to have a uniform query interface for unstructured data. All NoSQL data stores will expose some kind of Java API that can be used for querying. Would it be possible to create a common set of meta-data for both structured and unstructured data?
In such scenarios, IMHO, the only strategy for data virtualization is to use Data Services.