This time i go with a blog called, The 10 Distributed SQL Query Engine for Big Data! A Much Thank for your time, it’s truly appreciated! Data…Data…Data…Yep, it’s everywhere starting from Software to Salt stores which is tagged as Big Data. But who is the friend who can help us to get the insights/values from the data […]
Tons of thanks for your valuable time, this time we like to share with you the details on how data movement is happening in the big data ecosystem. It’s named as “The Data Movement in Big Data Ecosystem”. Ingesting data in to Hadoop is so vital from systems like RDBMS, Mainframes, logs, machine-generated data, event data […]
Many thanks for your cherished time, this time we like to share with you the details on what is 3 S’s of Spark as we all know the 3 V’s of Big Data is Volume, Variety & Velocity. And even added with kernel V’s like Veracity & Values. Big Data is defined as a collection of […]
Spark began life in 2009 as a project within the AMPLab at the University of California, Berkeley. More specifically, it was born out of the necessity to prove out the concept of Mesos, which was also created in the AMPLab. Spark was first discussed in the Mesos white paper titled Mesos: A Platform for Fine-Grained Resource […]
The below tips are not written by me (Kumar Chinnakali). It is actually learnt from mammothdata.com and felt it could help our big data community, where Apache Spark is currently changing the world of Analytics & Big Data. Mamothdata team, tons of thanks for sharing with us. Spark is written in Scala, so new features […]
Thanks for your time; I definitely try to value yours. In part 1 – we discussed about Apache Spark libraries, Spark Components like Driver, DAG Scheduler, Task Scheduler, and Worker. Now in Part 2 -we will be discussing on Basics of Spark Concepts like Resilient Distributed Datasets, Shared Variables, SparkContext, Transformations, Action, and Advantages of […]
Computes an approximate histogram of a numerical column using a user-specified number of bins. The output is an array of (x,y) pairs as Hive struct objects that represents the histogram’s bin centers(x value) & the histogram height(y value). Even though this function creates a histogram with non- uniform bin widths but to some extent its […]
Big Data Meets Microsoft Azure ! For Big Data & Cloud...
How to Ingest HDFS in JSON format using Apache Sqoop ?...
The 4 Key Concepts in the Anatomy of an Apache Spark Job!...
The 1-2-3-4-5-6-7-8-9 of Cognitive Computing ! Dear Data...