Google Cloud 101-The Gist of GCP for the Cloud Community Tons of thanks to Stockholm Google Cloud OnBoard team which inspires me to share this below. HenryTheOwl second edition “Google Cloud 101-The Gist of GCP for the Cloud Community” is trying to answer you all about, what is Google Cloud all about ?, Why we […]
The Artistic Guide to Big Data: Hadoop/Spark We love community. Hence in this new year 2018 as a first initiative, we are sharing the coffee char ideas on the “The Artistic Guide to Big Data: Hadoop/Spark”. The idea is all about bringing artistic touch of explanation to big data concepts. We are not talking of […]
Apache Spark is Superstar; but it’s Supernova on Azure for Big Data Analytics Initiatives Dear Cloud & Data Community, Happy Christmas! In this post am happy to share with you all on the facts about Apache Spark, especially how it’s so special and super nova, when it’s spin on as the Azure HDInsight. Big Data […]
Big Data Meets Microsoft Azure ! For Big Data & Cloud Community members this post on “Big Data, Meet Azure” is all about doing big on public cloud Azure. And sure, we no need definition for Big Data and Cloud Computing, but in a line; I would like to called both as Super Nova for […]
How to Ingest HDFS in JSON format using Apache Sqoop ? by NS Saravanan In current project use lambda architecture, so Data from sources system extracted in two ways, Real time streaming OR speed layer Batch process or Bach Layer Speed layer implemented using Attunity > Kafka > Spark streaming . The out of Spark stream […]
The 4 Key Concepts in the Anatomy of an Apache Spark Job! For Big Data & Cloud Community members Apache Spark is Awesome to handle any workloads such as Batch, Streaming, Real-Time, and Ad-hoc. However, to fine tune and optimize of our Apache Spark Applications we need to have a grip on the Apache Spark […]
What is Beyond Classic Hadoop? Is it Spark and Flink? In this blog, we will explore the two new big data friends to Hadoop, and they are Apache Spark and Apache Flink. And if we take the Hadoop improvements with the parallel processing MapReduce; speed is very first focus. However, MapReduce is designed and developed for […]
A First Look at Big Data Apache Flink! There is abundance of interest in learning how to analyze streaming data in large-scale systems, partly because there are situations in which the time-value of data makes real-time analytics so eye-catching. But gathering in-the-moment insights made possible by very low latency applications is just one of the […]
3 Solutions for Big Data’s Small Files Problem ! In this we will be discussion on the efficient solutions to the “small files” problem. And what is a small file in a Big Data Hadoop environment? In the Hadoop world, a small file is a file whose size is much smaller than the HDFS block […]
Self-Learn Yourself Scala in 21 Blogs – #6 Blog 6 – Recursion and Tail Recursion in Functional Programming. Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4, #5). In this blog let’s understand the recursion and tail recursion in functional programming. Recursion is frequently used […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 2 In Part -1, we have discussed about the Spark solution to Secondary for larger data sets. Now let’s deep dive in Choice #2 Choice #2: If we have smaller data set then choice will fit, like read and buffer all of the […]
Scalable Apache Spark Solution to Big Data Secondary Sort Problem! – Part 1 In Big Data era the secondary sort problem is relates to sorting values associated with a key in the reduce phase. It can be called as value-to-key conversion. The secondary sorting technique will help us to sort the values in ascending or […]
Google Cloud 101-The Gist of GCP for the Cloud Community...
I/O of the Google BigQuery Execution Dear cloud community...
Go Big on the Cloud with 10 Proven Best Practices Cheers my...
The Artistic Guide to Big Data: Hadoop/Spark We love...