The 9 Key steps to implement Big Data DevOps ! Per WiKi Definition: DevOps (a clipped compound of development and operations) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes. Per Gene Kim(author of The […]
Tuning Handbook of Apache Kafka! We all know the power and advantages of Apache Kafka. It is publish-subscribe messaging system which basically has three major components Apache Kafka Consumer Apache Producer Apache Kafka Broker This doc is all about how we can achieve maximum throughput while planning to have Kafka in production or in POCs. […]
Self-Learn Yourself Scala in 21 Blogs – #6 Blog 6 – Recursion and Tail Recursion in Functional Programming. Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4, #5). In this blog let’s understand the recursion and tail recursion in functional programming. Recursion is frequently used […]
Top 11 Apache Hadoop YARN Frameworks Part of the core Hadoop project, YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. YARN is the foundation […]
The Pyramid of Internet of Things (IoT) Alright, what is Internet of Things (IoT) ? How does it differ from Internet of Everything? What is M2M ? All the above queries would be running in your mind if you’re a beginner/newbie to this child protocol. So, the simplest answer is “They all are the same”. […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Relationship between MapReduce, Spark, YARN, and HDFS ! In Big Data era Hadoop is the de facto standard for developing of big data applications by using MapReduce framework. And Hadoop is composed of one or more master nodes and any number of slave nodes depends up on the data needed. Hadoop simplifies distributed applications by […]
The 8th Habit of Highly Effective Big Data Programmers ! Last week I read a book called “The Seven Habits of Highly Effective Big Data Programmers” by Rekha Joshi which is interesting. Happy to share with the community which I have encouraged from the book. Let’s understand first what Big Data is. Just by listening the […]
Self-Learn Yourself Apache Spark in 21 Blogs – #8 In this blog let us discuss on How to loading data, what is Lambdas, How to do Transforming Data and more on Transformations. And want to have quick read on the other blogs in this learning series. Apache Spark can load from any input sources like […]
Self-Learn Yourself Scala in 21 Blogs – #5 Blog 5 – Does functional programming matters and what are monads? Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4). In this blog let’s understand for Scala developers does the functional programming matters and also what is […]
11 Key Tuning Checklists for Apache Hadoop! Apache Hadoop is a well know and de-facto framework for processing large big data sets through distributed & parallel computing. YARN(Yet Another Resources Negotiator) allowed Hadoop to evolve from a simple MapReduce engine to a big data ecosystem that can run heterogeneous (MapReduce and non-MapReduce) apps concurrently. This results […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
The Artistic Guide to Big Data: Hadoop/Spark We love...
Is the Docker and Container are same? I thrilled in this new...
25 Free Must-Read Books in New Year 2018 on Open Source,...
Apache Spark is Superstar; but it’s Supernova on Azure for...