Pleased to share the Single Slider On Data Lake vs. Data Warehouse includes defintion, key properties, use cases & user groups. To conclude, for enterprise data driven organization both data ware house & data lake plays vital role. In nutshell Data Lake + Data Warehouse = Business Value ! Thanks for your time (TREASURED) and engaging. As […]
We don’t have any fixed definition for big data. Basically it refers to the technologies which we are used to extract, store, transform and access in organizations. It can extract structured, unstructured and semi structured data. Currently the data is multiplying at a rapid speed and it becomes difficult for organizations to have innovative & customer centric […]
Thank you for your valuable time & it’s much appreciated. This time i like to share the blog called “Quick Card On – Apache Hive Joins !” – a handy Apache Hive Joins reference card or cheat sheet. An SQL JOIN clause is used to combine rows from two or more tables, based on a common […]
Hadoop compression techniques bring us more benefits in the Hadoop I/O operations, such as space savings and processing speeds. We’ve lot compression formats and algorithm, with pros and cons. Here nothing new is added, just consolidated to have it handy to use it in production implementations. All techniques exhibit a space/time trade-off. We’ve options from […]
This time i go with a blog called, The 10 Distributed SQL Query Engine for Big Data! A Much Thank for your time, it’s truly appreciated! Data…Data…Data…Yep, it’s everywhere starting from Software to Salt stores which is tagged as Big Data. But who is the friend who can help us to get the insights/values from the data […]
Metadata in generic refers to data about data and which is kernel of the data processing & storing. With regards to Hadoop ecosystem it could be many things as listed below, Ref: Hadoop Application Architectures – Mark Grover, Ted Malaska, Jonathan Seidman & Gwen Shapira. Thanks much to all Analytics & Big Data community.
Tons of Thanks for your time. Big Data is catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software tools. It’s flooded with glossary of new words like Hadoop Distributed File System, NoSQL, NewSQL. Happy to share the list […]
Tons of thanks for your valuable time, this time we like to share with you the details on how data movement is happening in the big data ecosystem. It’s named as “The Data Movement in Big Data Ecosystem”. Ingesting data in to Hadoop is so vital from systems like RDBMS, Mainframes, logs, machine-generated data, event data […]
Many thanks for your cherished time, this time we like to share with you the details on what is 3 S’s of Spark as we all know the 3 V’s of Big Data is Volume, Variety & Velocity. And even added with kernel V’s like Veracity & Values. Big Data is defined as a collection of […]
Spark began life in 2009 as a project within the AMPLab at the University of California, Berkeley. More specifically, it was born out of the necessity to prove out the concept of Mesos, which was also created in the AMPLab. Spark was first discussed in the Mesos white paper titled Mesos: A Platform for Fine-Grained Resource […]
The below tips are not written by me (Kumar Chinnakali). It is actually learnt from mammothdata.com and felt it could help our big data community, where Apache Spark is currently changing the world of Analytics & Big Data. Mamothdata team, tons of thanks for sharing with us. Spark is written in Scala, so new features […]
Thanks for your time; I definitely try to value yours. In part 1 – we discussed about Apache Spark libraries, Spark Components like Driver, DAG Scheduler, Task Scheduler, and Worker. Now in Part 2 -we will be discussing on Basics of Spark Concepts like Resilient Distributed Datasets, Shared Variables, SparkContext, Transformations, Action, and Advantages of […]
The Artistic Guide to Big Data: Hadoop/Spark We love...
Is the Docker and Container are same? I thrilled in this new...
25 Free Must-Read Books in New Year 2018 on Open Source,...
Apache Spark is Superstar; but it’s Supernova on Azure for...