We should be excited that Apache Hive community have released the largest release and announced the availability of Apache Hive 2.0.0. It brings great and exciting improvements in the category of new functionality, Performance, Optimizations, Security, and Usability. Let us explore the features in detail below; HBase to store Hive Metadata – The current metastore […]
Blog 2 – Lets’ get started with Scala Just type Scala in your environment to get the Scala interpreter and if everything is fine we will prompt with scala>. If you have problem with installation please follow the link, which has step by step explanations. So we are good to explore the Scala commands. Now […]
Self-Biearn Yourself Scala in 21 Blogs – #1 Blog 1 – Scala the basics Thanks to the communities like LinkedIn, hadoop, Spark, Apache Software, Yahoo and more…from dataottam. As a new learning and sharing initiative we the dataottam team launched “Self-Learn Yourself Scala in 21 Blogs”. Scala is something Object-Oriented meets functional to have best […]
Self-Learn Yourself Apache Spark in 21 Blogs – #7 Key Concepts of Resilient Distributed Datasets (RDDs) and more… In this blog how do we create the RDDs and what operations can we perform with RDDs. Have quick read on the other blogs in this learning series. In simple RDD (Resilient Distributed Dataset); if data in […]
Celebrate the Big Data Problems – #4 What are the possible ways of command level searching in Linux? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: Search in […]
Data Lake Architecture Considerations & Composition In our last blog we saw the key benefits of Data Lake, but let’s deep dive in to the internals of a Data Lake via discussing the key considerations and compositions. Architecture Considerations: Take in any solution considerations it is practical difficult to arrives with a one-size-fit-all architecture; hence […]
What is RDD, Actions, and Transformations ? In Blog 6, we will see The RDD, and RDDs Input with Hands-on. Click to have quick read on the other blogs in this learning series. Hey, my dear friends. Before getting in to more deep dive into let’s have a look at who are the Spark Core Maintainers […]
TCP/IP Layer-wise IoT Protocols Hello !! Hello everyone !! Thanks a lot for your valuable response for the previous blog. In this post, I will be explaining the basics of TCP (Transmission Control Protocol)/IP (Internet Protocol) stack and the respective IoT protocols associated with each layer. Anyone who has prior knowledge on TCP/IP stack can […]
How to have our basic statistics (Mean, Median, SD, Var, Cor, Cov) computed using R language? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework. Context: In statistics Mean, Median, […]
A Data Lake has flexible definition, to make this statement true the dataottam team took initiative and released a eBook called “The Collective Definition of Data Lake by Big Data Community”, which contains many definitions from various business savvy and technologist. And in nutshell Data Lake is a data store and processing data system, where […]
In Blog 5, we will see Apache Spark Languages with basic Hands-on. Click to have quick read on the other blogs of Apache Spark in this learning series. With our cloud setup of our Apache Spark now we are ready to develop big data Spark applications. And before getting started with building Spark applications let’s […]
Celebrate the Big Data Problems – #2 How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ? The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, […]
The Bot 101 [ Part 1 ] For me bot is new word, on first time...
Getting Started with Google Cloud Platform ! Last month got...
PocketGear on Getting Started with Google Cloud Platform !...
Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop...