Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop ecosystem in the public cloud means, which it is running Hadoop clusters on hardware offered by a cloud service provider. And this practice is business as usual compared with running Hadoop clusters on our own hardware, called on-premises clusters or “on-prem”. But installing […]
Ten Fascinating Things from Google Cloud Next 2017 ! In San Francisco last week at Pier 48, Google Cloud Platform (GCP) executives are holding a user conference to introduce products and services they hope will help make the case for choosing Google in the cloud. I missed this year, but thanks to internet of world, where […]
Cloud computing is the buzz and hype, and however the cloud computing is mandate for all the IT and data people. And however before we can decide on any cloud model, we need to determine what the ideal cloud service model is for our business. And it will help us to cut through all the […]
Top 10 Cloud Computing Worst Practices !
High Level Framework of Big Data Graph Databases! In Big Data world, it was very much clear that the connected data to store and processing the data was first challenge. And the first ideation is to replace and leverage the tabular SQL Semantic with the graph-centric model. And then the graph is new to big […]
Comparing Architecture Characteristics in Big Data Context! In this blog we’ll explore the differences between microservices and SOA in terms of the defining characteristics of the architecture pattern. In Big Data world, Apache Hadoop has come a long way in its relatively short lifespan. From its beginnings as a reliable storage pool with integrated batch […]
Requirement To take a backup of our Cluster data for disaster recovery Approach We are going to use the Glacier Storage provided by AWS. About Glacier Storage Glacier is designed to address the shortcomings of a number of traditional archive solutions, like TAPE and DISK archiving none of which is completely satisfactory. Glacier leverages the […]
Intra Cluster copying using DISTCP Step 1 : Get to know your namenode information of both the clusters using the below command hdfs getconf -namenodes Step 2 : Verify the accessibility to HDFS on both your cluster using the below command hdfs dfs -ls hdfs://Namenode1:8020/data/file.txt hdfs dfs -ls hdfs://Namenode2:8020/data/ Once Successful move to Step 3 […]
Here let us see what kind of data organizations wants to ingest into Hadoop for their Business or Analytics Insights. Basically Large volume of data and unstructured data are strong candidates for Hadoop. Clickstream data : Clickstream data is the stream of clicks someone performs when visiting a website. This information can be used for […]
Self-Learn Yourself Scala in 21 Blogs – #7 Missed the previous blogs have a quick look with Self-Learn Yourself Scala in 21 Blogs (#1, #2, #3, #4, #5, #6). In this blog let’s understand evaluation strategies in Scala programming. There are two common evaluations strategies in scala. Call by Value and call by name. The […]
The 9 Key steps to implement Big Data DevOps ! Per WiKi Definition: DevOps (a clipped compound of development and operations) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes. Per Gene Kim(author of The […]
Top 11 Apache Hadoop YARN Frameworks Part of the core Hadoop project, YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. YARN is the foundation […]
Top 12 excuses for why our big data isn’t paying off...
The Bot 101 [ Part 4 ] Dear Bot community members, thanks...
The List of 10+ Bot Platform for Developer and Architects!...
Top 150 Big Data & Cloud Computing Terminologies for...