The industry is flooded with confusing terms when it comes to understanding 'Data Virtualization'. We have IaaS (Information as a service), Data Services, EII (Enterprise Information Integration), Data Federation, etc. and so on! The point is that there are no industry standard definitions for these analyst-coined terms and there is a lot of overlap between them.
Rick Lans tries to clear the cloud with some simple definitions
here. Another interesting
post by Barry Devlin throws more light on the concept of data virtualization.
The core concept behind
data virtualization is to create an abstraction layer (Data Access Layer) that hides the complexities of the underlying disparate data sources and provides a unified view of the enterprise data to the applications. This can be implemented using "SOA style" Data Services or creating a virtual data layer that can be queried using SQL-like semantics. More info can be found at these links:
Link1 &
Link2
RedHat has a nice
whitepaper explaining the concept of
Data Services in a SOA environment. This
post explains the benefits of data virtualization. Composite Software is a leader in data virtualization techniques and has shared a couple of interesting
case studies that demonstrate the use of their
data virtualization platform.
One thought that came to my mind was regarding the challenges in accessing NoSQL data from the data virtualization layer. While some type of
NoSQL datastores such as XML documents, Key/Value pairs can be exposed as a relational SQL view, it may not be possible to have a uniform query interface for unstructured data. All NoSQL data stores will expose some kind of Java API that can be used for querying. Would it be possible to create a
common set of meta-data for both structured and unstructured data?
In such scenarios, IMHO, the only strategy for data virtualization is to use Data Services.