Wednesday, July 20, 2022

Ruminating on Data Lakehouse

 In my previous blog posts, we had discussed about Data Lakes and Snowflake architecture

Since Snowflake combines the abilities of a traditional data warehouse and a Data Lake, they also market themselves as a Data Lakehouse

Another competing opensource alternative that is headed by the company databricks is called Delta Lake. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes.

A good comparison between Snowflake and databricks Delta Lake is available here: https://www.firebolt.io/blog/snowflake-vs-databricks-vs-firebolt

Enterprises who are embarking on their data platform modernization strategy can ask the following questions to arrive at a best fit choice:

  • Does the data platform have separation of compute from storage? This will enable it to scale horizontally as your data volumes and processing needs increase.
  • Does the data platform support cloud native storage? Most of the cloud native storage services from the hyperscalers (e.g. AWS S3, Google Big Query, Azure Data Lake) have been battle tested for scalability. 
  • What are the usecases you want to run on your data platform? - e.g. Reporting/BI, Streaming Data Analytics, Low latency dashboards, etc.