Big Data Splunk’s Best & Better Practices ! Introduction to Splunk We see servers, devices, apps, logs, traffic, and clouds. We see data, big data, and fat data everywhere. Splunk offers the leading platform for Operational Intelligence. It enables the curious to look closely at what others ignore which is called machine data and find […]
A First Look at Big Data Apache Flink! There is abundance of interest in learning how to analyze streaming data in large-scale systems, partly because there are situations in which the time-value of data makes real-time analytics so eye-catching. But gathering in-the-moment insights made possible by very low latency applications is just one of the […]
3 Solutions for Big Data’s Small Files Problem ! In this we will be discussion on the efficient solutions to the “small files” problem. And what is a small file in a Big Data Hadoop environment? In the Hadoop world, a small file is a file whose size is much smaller than the HDFS block […]
The 9 Key steps to implement Big Data DevOps ! Per WiKi Definition: DevOps (a clipped compound of development and operations) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes. Per Gene Kim(author of The […]
Tuning Handbook of Apache Kafka! We all know the power and advantages of Apache Kafka. It is publish-subscribe messaging system which basically has three major components Apache Kafka Consumer Apache Producer Apache Kafka Broker This doc is all about how we can achieve maximum throughput while planning to have Kafka in production or in POCs. […]
Top 11 Apache Hadoop YARN Frameworks Part of the core Hadoop project, YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. YARN is the foundation […]
The Pyramid of Internet of Things (IoT) Alright, what is Internet of Things (IoT) ? How does it differ from Internet of Everything? What is M2M ? All the above queries would be running in your mind if you’re a beginner/newbie to this child protocol. So, the simplest answer is “They all are the same”. […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 2 In Part -1, we have discussed about the Spark solution to Secondary for larger data sets. Now let’s deep dive in Choice #2 Choice #2: If we have smaller data set then choice will fit, like read and buffer all of the […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Understand Kappa Architecture in 2 minutes What is Kappa Architecture ? Kappa architecture makes all the data processing in Near Real Time or Streaming mode, which in simple terms removing the batch layer from Lambda Architecture makes it a Kappa Architecture, to know quickly about lambda Architecture visit Understand Lambda Architecture in 2 minutes. Evolution […]
Relationship between MapReduce, Spark, YARN, and HDFS ! In Big Data era Hadoop is the de facto standard for developing of big data applications by using MapReduce framework. And Hadoop is composed of one or more master nodes and any number of slave nodes depends up on the data needed. Hadoop simplifies distributed applications by […]
The 8th Habit of Highly Effective Big Data Programmers ! Last week I read a book called “The Seven Habits of Highly Effective Big Data Programmers” by Rekha Joshi which is interesting. Happy to share with the community which I have encouraged from the book. Let’s understand first what Big Data is. Just by listening the […]
The Bot 101 [ Part 1 ] For me bot is new word, on first time...
Getting Started with Google Cloud Platform ! Last month got...
PocketGear on Getting Started with Google Cloud Platform !...
Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop...