2nd Version of Data Lake vs. Data Warehouse
Tons of thanks for all 800+ views, 100+ likes, and 15+ comments for (Big) Data in Data Lake vs. Data Warehouse. And all the comments and suggestions are deep motivation behind 2nd Version of Data Lake. To name few Ricky Barron, Winston Sucher, Vinay, Ben Sharma, and Sanjay Pande.
The Data Lake Architecture,
Four functions of Data Lake,
- Ingestion – Scalable, Extensible to capture streaming batch data. Provide capability to business logic, filters, validation, data quality, routing, and business requirements.
- Storage/Retention – Depending on the requirements data is placed into Hadoop HDFS, Hive, HBase, Elastic Search, or in-memory. Metadata management. Policy based data retention is provided.
- Data Processed – Processing is provided for both and near-real-time use cases. Provisions workflows for repeatable data processing. Provide late data arrival handling.
- Access – Dashboard and applications that provides valuable business insights. Data will be made available to consumers using API, MQ Feed and DB access.
Data Lake vs. Data Warehouse,
Reference – http://www.kdnuggets.com/, Architecting Data Lake eBook, and communities
Interesting? Please subscribe to our blogs at www.dataottam.com to keep yourself trendy on Big Data, Analytics, and IoT.
And as always please feel free to suggest or comment firstname.lastname@example.org