Celebrate the Big Data Problems – #4 What are the possible ways of command level searching in Linux? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: Search in […]
Data Lake Architecture Considerations & Composition In our last blog we saw the key benefits of Data Lake, but let’s deep dive in to the internals of a Data Lake via discussing the key considerations and compositions. Architecture Considerations: Take in any solution considerations it is practical difficult to arrives with a one-size-fit-all architecture; hence […]
What is RDD, Actions, and Transformations ? In Blog 6, we will see The RDD, and RDDs Input with Hands-on. Click to have quick read on the other blogs in this learning series. Hey, my dear friends. Before getting in to more deep dive into let’s have a look at who are the Spark Core Maintainers […]
A Data Lake has flexible definition, to make this statement true the dataottam team took initiative and released a eBook called “The Collective Definition of Data Lake by Big Data Community”, which contains many definitions from various business savvy and technologist. And in nutshell Data Lake is a data store and processing data system, where […]
In Blog 5, we will see Apache Spark Languages with basic Hands-on. Click to have quick read on the other blogs of Apache Spark in this learning series. With our cloud setup of our Apache Spark now we are ready to develop big data Spark applications. And before getting started with building Spark applications let’s […]
Celebrate the Big Data Problems – #2 How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, […]
Self-Learn Yourself IoT in 21 Blogs – #1 – In this we will be seeing What is IoT ? Why do we need it? Significance & Impact on Modern life? Time to Greet the New Clone that it set to rule the world ! Hello All ! Well, this is my first blog for dataottam and […]
Big Data is problem statement and it can be solved with one of the tools like Apache Hadoop. But having Apache Hadoop as infra to do our proof of concepts, proof of values is little challenging. Hence we brought 3 click ideas to have your Apache Hadoop installed. What is Perquisite? Ubuntu 14.04 Internet Connection […]
In Blog 4, we will see what are Apache Spark Core and its ecosystem and Apache Spark on AWS Cloud. Click to have quick read on blog 1, blog 2, and blog 3 in this learning series. Apache Spark has many components including Spark Core which is responsible for Task Scheduling, Memory Management, Fault Recovery, […]
In this Blog 3 – We will see what is Apache Spark’s History and Unified Platform for Big Data, and like to have quick read on blog 1 and blog 2. Spark was initially started by Matei at UC Berkeley AMPLab in 2009, and open sourced in 2010 under a BSD license. In 2013, the […]
In this new year 2016, we should be excited that Apache Spark community have released and announced the availability of Apache Spark 1.6, which is the 7th release on the 1.x line. Committers – Contributors to Spark had crossed 1000, which is doubled. Patches – Apache Spark 1.6 version includes & covers 1000 patches. Run […]
Best wishes to you this holiday, and Happy New Year, from all of us at dataottam. This blog introduces Spark’s core abstraction for working with data, the RDD (Resilient Distributed Dataset). An RDD is simply a distributed collection of elements or objects (Java, Scala, Python, and user defined functions) across the Spark cluster. In Spark […]
The List of 10+ Bot Platform for Developer and Architects!...
Top 150 Big Data & Cloud Computing Terminologies for...
The Bot 101 [ Part 3 ] Dear Bot community members, thanks...
Top 5 Focuses to Improve Cloud-Native Application...