What is the best big data solution for working with all databases from Splunk ! The answer is Splunk DB Connect! In this blog we will see how the Splunk DB connect helps us to integrate all the databases from Splunk. Splunk DB Connect is the best solution for working with databases from Splunk. It […]
The 7 Habits Of Successful Big Data and NoSQL Projects by Ben Lorica ! Let’s have firstname.lastname@example.org
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Understand Kappa Architecture in 2 minutes What is Kappa Architecture ? Kappa architecture makes all the data processing in Near Real Time or Streaming mode, which in simple terms removing the batch layer from Lambda Architecture makes it a Kappa Architecture, to know quickly about lambda Architecture visit Understand Lambda Architecture in 2 minutes. Evolution […]
Understand Lambda Architecture in 2 minutes What is Lambda Architecture ? Lambda architecture which provides us a combined solution of realtime data with batch data. What is the Need for Lambda Architecture ? lambda Architecture was implemented mainly due to the Latency provided by the Map reduce paradigm, where the batch views was created on […]
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to process the data, and how Cassandra to […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
8 Breaking Changes in Apache Flink 1.0.0 ! Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying […]
Looking For College Projects ?
Self-Biearn Yourself Scala in 21 Blogs – #1 Blog 1 – Scala the basics Thanks to the communities like LinkedIn, hadoop, Spark, Apache Software, Yahoo and more…from dataottam. As a new learning and sharing initiative we the dataottam team launched “Self-Learn Yourself Scala in 21 Blogs”. Scala is something Object-Oriented meets functional to have best […]
What is RDD, Actions, and Transformations ? In Blog 6, we will see The RDD, and RDDs Input with Hands-on. Click to have quick read on the other blogs in this learning series. Hey, my dear friends. Before getting in to more deep dive into let’s have a look at who are the Spark Core Maintainers […]
How to have our basic statistics (Mean, Median, SD, Var, Cor, Cov) computed using R language? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: In statistics Mean, Median, […]
Big Data Meets Microsoft Azure ! For Big Data & Cloud...
How to Ingest HDFS in JSON format using Apache Sqoop ?...
The 4 Key Concepts in the Anatomy of an Apache Spark Job!...
The 1-2-3-4-5-6-7-8-9 of Cognitive Computing ! Dear Data...