Big Data Meets Microsoft Azure ! For Big Data & Cloud Community members this post on “Big Data, Meet Azure” is all about doing big on public cloud Azure. And sure, we no need definition for Big Data and Cloud Computing, but in a line; I would like to called both as Super Nova for […]
How to Ingest HDFS in JSON format using Apache Sqoop ? by NS Saravanan In current project use lambda architecture, so Data from sources system extracted in two ways, Real time streaming OR speed layer Batch process or Bach Layer Speed layer implemented using Attunity > Kafka > Spark streaming . The out of Spark stream […]
The 4 Key Concepts in the Anatomy of an Apache Spark Job! For Big Data & Cloud Community members Apache Spark is Awesome to handle any workloads such as Batch, Streaming, Real-Time, and Ad-hoc. However, to fine tune and optimize of our Apache Spark Applications we need to have a grip on the Apache Spark […]
What is Beyond Classic Hadoop? Is it Spark and Flink? In this blog, we will explore the two new big data friends to Hadoop, and they are Apache Spark and Apache Flink. And if we take the Hadoop improvements with the parallel processing MapReduce; speed is very first focus. However, MapReduce is designed and developed for […]
A First Look at Big Data Apache Flink! There is abundance of interest in learning how to analyze streaming data in large-scale systems, partly because there are situations in which the time-value of data makes real-time analytics so eye-catching. But gathering in-the-moment insights made possible by very low latency applications is just one of the […]
3 Solutions for Big Data’s Small Files Problem ! In this we will be discussion on the efficient solutions to the “small files” problem. And what is a small file in a Big Data Hadoop environment? In the Hadoop world, a small file is a file whose size is much smaller than the HDFS block […]
Self-Learn Yourself Scala in 21 Blogs – #6 Blog 6 – Recursion and Tail Recursion in Functional Programming. Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4, #5). In this blog let’s understand the recursion and tail recursion in functional programming. Recursion is frequently used […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 2 In Part -1, we have discussed about the Spark solution to Secondary for larger data sets. Now let’s deep dive in Choice #2 Choice #2: If we have smaller data set then choice will fit, like read and buffer all of the […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Relationship between MapReduce, Spark, YARN, and HDFS ! In Big Data era Hadoop is the de facto standard for developing of big data applications by using MapReduce framework. And Hadoop is composed of one or more master nodes and any number of slave nodes depends up on the data needed. Hadoop simplifies distributed applications by […]
Self-Learn Yourself Apache Spark in 21 Blogs – #8 In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series. Apache Spark can load from any input sources like […]
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to process the data, and how Cassandra to […]
Big Data Meets Microsoft Azure ! For Big Data & Cloud...
How to Ingest HDFS in JSON format using Apache Sqoop ?...
The 4 Key Concepts in the Anatomy of an Apache Spark Job!...
The 1-2-3-4-5-6-7-8-9 of Cognitive Computing ! Dear Data...