(Big)Data in DataLake vs. DataWarehouse
The Data Lake vs. Data Warehouse in Big Data !
Big Data use cases are in evolution from all over the verticals like Insurance, Healthcare, Manufacturing, Financial, Retail and more. Customers are using Big Data to improve top & bottom line revenue with business values. With this data driven era, enterprise readiness and data management needs are becoming increasingly vital. Hadoop & NoSQL are must environments for data management. And Data lake is becoming new repository that is becoming a single source of truth, which address the Big Data challenges like Volume, Variety & Velocity.
Is Big Data == Data Lake ?
It’s false, yes Big Data is not equal to Data Lake. Let’s get the global terminology definitions of Big Data and Data Lake. If exiting Data system has facing any of the problems like Volume, Velocity, Varieity, then system might have Big Data problem. We have lot and lot number of tools to solve the Data Mass, Data Speed, Data Variety out of which the defacto is Hadoop. It designed for distributed storage and parallel processing. Big Data is not recent, which is 10+ years old coined by Roger Magoula, Director O’Reilly Media.
Data Lake is a terminology to designate the vital component of the big data analytics pipeline in Big Data world. The whole idea is to have a single store for all of the raw data that all data applications might need to analyze or to engineer the data. Many from the data systems currently using Hadoop to work on the data in the lake, but the concept is bigger than just Hadoop. If it’s single store to pull together all data from app/systems wants to analyze, then it’s a notion of data warehouse or data mart. But we have large distinction between the data lake and the data warehouse. The data lake stores raw data, in the same form the data source provides, here there is no definition of schema at all. Each each data source can use whatever schema it likes. It’s up to the data consumers to make schema of that data for their own purposes.
DataLake vs. DataWarehoues
Top 10 Astonishing Things in Data Lake
- Store Massive Data Sets
- Mix Disparate Data Sources
- Ingest Bulk Data
- Ingest High-Velocity Data
- Apply Structure to Unstructured/Semi-Structured Data
- Make Data Available for MPP SQL Analysis
- Achieve Data Integration
- Improve Machine Learning & Predictive Analytics
- Deploy Real-Time Automation at Scale
- Achieve continuous Innovation at Scale
To conclude data lake is a large data storage repository that holds data in its native format until it is desired. And in simple data lake is the evolution of an Enterprise Data Warehouse (EDW) into an active repo for structured, semi-structured, and unstructured data that retains all features against which we can run all our data analyzing & process. The other way to define data lake is formed by the joining NoSQL & Hadoop. It’s primary landing zone for disparate sources like click streams, weblogs, sensor data etc. Data lake helps business to take more holistic business decisions.
As always, I am happy to answer any questions you have. Do reply to know more.