What is Beyond Classic Hadoop? Is it Spark and Flink? In this blog, we will explore the two new big data friends to Hadoop, and they are Apache Spark and Apache Flink. And if we take the Hadoop improvements with the parallel processing MapReduce; speed is very first focus. However, MapReduce is designed and developed for […]
A First Look at Big Data Apache Flink! There is abundance of interest in learning how to analyze streaming data in large-scale systems, partly because there are situations in which the time-value of data makes real-time analytics so eye-catching. But gathering in-the-moment insights made possible by very low latency applications is just one of the […]
3 Solutions for Big Data’s Small Files Problem ! In this we will be discussion on the efficient solutions to the “small files” problem. And what is a small file in a Big Data Hadoop environment? In the Hadoop world, a small file is a file whose size is much smaller than the HDFS block […]
Self-Learn Yourself Scala in 21 Blogs – #6 Blog 6 – Recursion and Tail Recursion in Functional Programming. Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4, #5). In this blog let’s understand the recursion and tail recursion in functional programming. Recursion is frequently used […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 2 In Part -1, we have discussed about the Spark solution to Secondary for larger data sets. Now let’s deep dive in Choice #2 Choice #2: If we have smaller data set then choice will fit, like read and buffer all of the […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Relationship between MapReduce, Spark, YARN, and HDFS ! In Big Data era Hadoop is the de facto standard for developing of big data applications by using MapReduce framework. And Hadoop is composed of one or more master nodes and any number of slave nodes depends up on the data needed. Hadoop simplifies distributed applications by […]
Self-Learn Yourself Apache Spark in 21 Blogs – #8 In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series. Apache Spark can load from any input sources like […]
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to process the data, and how Cassandra to […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
Looking For College Projects ?
Blog 2 – Lets’ get started with Scala Just type Scala in your environment to get the Scala interpreter and if everything is fine we will prompt with scala>. If you have problem with installation please follow the link, which has step by step explanations. So we are good to explore the Scala commands. Now […]
The Bot 101 [ Part 1 ] For me bot is new word, on first time...
Getting Started with Google Cloud Platform ! Last month got...
PocketGear on Getting Started with Google Cloud Platform !...
Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop...