Data Lake Architecture Considerations & Composition In our last blog we saw the key benefits of Data Lake, but let’s deep dive in to the internals of a Data Lake via discussing the key considerations and compositions. Architecture Considerations: Take in any solution considerations it is practical difficult to arrives with a one-size-fit-all architecture; hence […]
How to have our basic statistics (Mean, Median, SD, Var, Cor, Cov) computed using R language? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: In statistics Mean, Median, […]
In Blog 5, we will see Apache Spark Languages with basic Hands-on. Click to have quick read on the other blogs of Apache Spark in this learning series. With our cloud setup of our Apache Spark now we are ready to develop big data Spark applications. And before getting started with building Spark applications let’s […]
Celebrate the Big Data Problems – #2 How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, […]
In Blog 4, we will see what are Apache Spark Core and its ecosystem and Apache Spark on AWS Cloud. Click to have quick read on blog 1, blog 2, and blog 3 in this learning series. Apache Spark has many components including Spark Core which is responsible for Task Scheduling, Memory Management, Fault Recovery, […]
By this blog we will share the titles for learning Apache Spark, Basics on Hadoop which is one of the big data tool, and motivations for Apache Spark which is not replacement of Apache Hadoop, but its friend of big data. Blog 1 – Introduction to Big Data Blog 2 – Hadoop, Spark’s Motivations Blog […]
The term Data Lake has been gaining popularity recently as most of the enterprises have incorporated it into their analytics software’s. Every word and phrase that is used to describe Data Lake have provided us much useful information about how we interpret it. So we at dataottam decided to understand the various ways Data Lake […]
We have received many requests from friends who are constantly reading our blogs to provide them a complete guide to sparkle in Apache Spark. So here we have come up with learning initiative called “Self-Learn Yourself Apache Spark in 21 Blogs”. We have drilled down various sources and archives to provide a perfect learning path […]
Best wishes for the New Year from whole team of dataottam. In this blog let’s discuss on why and how Apache Hadoop YARN came about. YARN’s requirements emerged and evolved from the practical needs of long-existing cluster deployments of Hadoop, both small and large, and we discuss how each of these requirements ultimately shaped YARN. […]
As of this writing, Drill is a very active Apache incubating project led by MapR with six to seven companies actively participating, and more than 250+ people currently on the Drill mailing list. The goal of Drill is to create an interactive analysis platform for Big Data using a standard SQL-supporting relational database management system […]
Is Apache Hadoop the only option to implement Big Data? Yes, Hadoop is not only the options to big data problem. Hadoop is one of the solutions. The HPCC (High Performance Computing Cluster) Systems technology is an open source data driven and intensive processing and delivery platform developed by LexisNexis Risk Solutions. HPCC Systems incorporates […]
Thanks to Zaloni and Creating a Data-Driven Organization, Carl Anderson. The fantastic book, very well narrated in this book and I like to share our learning with our big data & IoT community. Many organizations think that simply because they generate a lot of reports or have many dashboards, they are data-driven. Although those activities […]
The Artistic Guide to Big Data: Hadoop/Spark We love...
Is the Docker and Container are same? I thrilled in this new...
25 Free Must-Read Books in New Year 2018 on Open Source,...
Apache Spark is Superstar; but it’s Supernova on Azure for...