Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 2 In Part -1, we have discussed about the Spark solution to Secondary for larger data sets. Now let’s deep dive in Choice #2 Choice #2: If we have smaller data set then choice will fit, like read and buffer all of the […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Understand Kappa Architecture in 2 minutes What is Kappa Architecture ? Kappa architecture makes all the data processing in Near Real Time or Streaming mode, which in simple terms removing the batch layer from Lambda Architecture makes it a Kappa Architecture, to know quickly about lambda Architecture visit Understand Lambda Architecture in 2 minutes. Evolution […]
Relationship between MapReduce, Spark, YARN, and HDFS ! In Big Data era Hadoop is the de facto standard for developing of big data applications by using MapReduce framework. And Hadoop is composed of one or more master nodes and any number of slave nodes depends up on the data needed. Hadoop simplifies distributed applications by […]
Understand Lambda Architecture in 2 minutes What is Lambda Architecture ? Lambda architecture which provides us a combined solution of realtime data with batch data. What is the Need for Lambda Architecture ? lambda Architecture was implemented mainly due to the Latency provided by the Map reduce paradigm, where the batch views was created on […]
Self-Learn Yourself Apache Spark in 21 Blogs – #8 In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series. Apache Spark can load from any input sources like […]
Self-Learn Yourself Scala in 21 Blogs – #5 Blog 5 – Does functional programming matters and what are monads? Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4). In this blog let’s understand for Scala developers does the functional programming matters and also what is […]
11 Key Tuning Checklists for Apache Hadoop! Apache Hadoop is a well know and de-facto framework for processing large big data sets through distributed & parallel computing. YARN(Yet Another Resources Negotiator) allowed Hadoop to evolve from a simple MapReduce engine to a big data ecosystem that can run heterogeneous (MapReduce and non-MapReduce) apps concurrently. This results […]
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to process the data, and how Cassandra to […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
Blog 4 – OOP to Functional Programming We are already using functional programming using Scala with the previous blog series of Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3). Let’s start with defining functional programming which is a programming paradigm that models computation as the evaluation of expressions and expressions are built using functions […]
Blog 3 – Functional Programming & Data Structures In Scala Functional programming and functional data structures are very interesting and powerful. Actually it supports both types data structures called immutable & mutable. And let us discuss on two more vital concepts called type parameterization and higher-order functions. The type is similar to Java generics; it […]
The Bot 101 [ Part 1 ] For me bot is new word, on first time...
Getting Started with Google Cloud Platform ! Last month got...
PocketGear on Getting Started with Google Cloud Platform !...
Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop...