Understand Kappa Architecture in 2 minutes What is Kappa Architecture ? Kappa architecture makes all the data processing in Near Real Time or Streaming mode, which in simple terms removing the batch layer from Lambda Architecture makes it a Kappa Architecture, to know quickly about lambda Architecture visit Understand Lambda Architecture in 2 minutes. Evolution […]
Understand Lambda Architecture in 2 minutes What is Lambda Architecture ? Lambda architecture which provides us a combined solution of realtime data with batch data. What is the Need for Lambda Architecture ? lambda Architecture was implemented mainly due to the Latency provided by the Map reduce paradigm, where the batch views was created on […]
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to process the data, and how Cassandra to […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
8 Breaking Changes in Apache Flink 1.0.0 ! Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying […]
Looking For College Projects ?
Self-Biearn Yourself Scala in 21 Blogs – #1 Blog 1 – Scala the basics Thanks to the communities like LinkedIn, hadoop, Spark, Apache Software, Yahoo and more…from dataottam. As a new learning and sharing initiative we the dataottam team launched “Self-Learn Yourself Scala in 21 Blogs”. Scala is something Object-Oriented meets functional to have best […]
What is RDD, Actions, and Transformations ? In Blog 6, we will see The RDD, and RDDs Input with Hands-on. Click to have quick read on the other blogs in this learning series. Hey, my dear friends. Before getting in to more deep dive into let’s have a look at who are the Spark Core Maintainers […]
How to have our basic statistics (Mean, Median, SD, Var, Cor, Cov) computed using R language? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: In statistics Mean, Median, […]
In this new year 2016, we should be excited that Apache Spark community have released and announced the availability of Apache Spark 1.6, which is the 7th release on the 1.x line. Committers – Contributors to Spark had crossed 1000, which is doubled. Patches – Apache Spark 1.6 version includes & covers 1000 patches. Run […]
Best wishes to you this holiday, and Happy New Year, from all of us at dataottam. This blog introduces Spark’s core abstraction for working with data, the RDD (Resilient Distributed Dataset). An RDD is simply a distributed collection of elements or objects (Java, Scala, Python, and user defined functions) across the Spark cluster. In Spark […]
As of this writing, Drill is a very active Apache incubating project led by MapR with six to seven companies actively participating, and more than 250+ people currently on the Drill mailing list. The goal of Drill is to create an interactive analysis platform for Big Data using a standard SQL-supporting relational database management system […]
Top 12 excuses for why our big data isn’t paying off...
The Bot 101 [ Part 4 ] Dear Bot community members, thanks...
The List of 10+ Bot Platform for Developer and Architects!...
Top 150 Big Data & Cloud Computing Terminologies for...