Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop ecosystem in the public cloud means, which it is running Hadoop clusters on hardware offered by a cloud service provider. And this practice is business as usual compared with running Hadoop clusters on our own hardware, called on-premises clusters or “on-prem”. But installing […]
What is Beyond Classic Hadoop? Is it Spark and Flink? In this blog, we will explore the two new big data friends to Hadoop, and they are Apache Spark and Apache Flink. And if we take the Hadoop improvements with the parallel processing MapReduce; speed is very first focus. However, MapReduce is designed and developed for […]
Sqoop Use Cases Introduction: Sqoop was originally developed by Cloudera You can import data from relational Database to HDFS as well export it back to relational database from HDFS Sqoop supports many RDBMS and not limited to just MySQL etc. it also supports Legacy systems like Mainframes DB2. Sqoop Use cases: ELT: Extract Load Transform […]
Intra Cluster copying using DISTCP Step 1 : Get to know your namenode information of both the clusters using the below command hdfs getconf -namenodes Step 2 : Verify the accessibility to HDFS on both your cluster using the below command hdfs dfs -ls hdfs://Namenode1:8020/data/file.txt hdfs dfs -ls hdfs://Namenode2:8020/data/ Once Successful move to Step 3 […]
Here let us see what kind of data organizations wants to ingest into Hadoop for their Business or Analytics Insights. Basically Large volume of data and unstructured data are strong candidates for Hadoop. Clickstream data : Clickstream data is the stream of clicks someone performs when visiting a website. This information can be used for […]
Today emerging big data technology firm focused on helping enterprises build breakthrough software solutions powered by disruptive enterprise software trends like Machine learning and data science, Cyber-security, Enterprise IOT, and Cloud. So Hadoop is one of the proven software in big data space, but is it only Hadoop. Nope we have many more technologies which […]
It’s very clear that every stake holders from Business to IT teams are traction towards Big Data, but the first and foremost challenge is getting the right tool fitment. And the power of open source brings us more additional and potential tools often. Wish, the study on Hadoop Distribution will help us on the first […]
The Data Lake vs. Data Warehouse in Big Data ! Big Data use cases are in evolution from all over the verticals like Insurance, Healthcare, Manufacturing, Financial, Retail and more. Customers are using Big Data to improve top & bottom line revenue with business values. With this data driven era, enterprise readiness and data management […]
Team, tons & thousands of thanks for reading and engaging ! This time am pleasure to share with you all my learning’s in Data Import, Export from Hadoop’s file system; which is core component to pump the data to Database, Warehouse, Analytics and Business. We titled as “Heart of the Hadoop is HDFS“. It’s no […]
Team, this time i go with the title called “Top 3 methods of skipping big data’s bad data using Hadoop !“ which describes about how to get corrupt records out from the large data sets which has different format of data. While doing our analysis if the corrupt records are in small percentage we can ignore or […]
Team thanks for reading & engaging ! This time am planned to share with you the my learning on Hadoop Schedulers; titled “Simplified Hadoop Schedulers Overview !” With the help of choosing suitable scheduler, we can make the response times faster for all smaller jobs and also for all the production jobs it’s guaranteed with SLA’s (Service […]
Hadoop compression techniques bring us more benefits in the Hadoop I/O operations, such as space savings and processing speeds. We’ve lot compression formats and algorithm, with pros and cons. Here nothing new is added, just consolidated to have it handy to use it in production implementations. All techniques exhibit a space/time trade-off. We’ve options from […]
Big Data Meets Microsoft Azure ! For Big Data & Cloud...
How to Ingest HDFS in JSON format using Apache Sqoop ?...
The 4 Key Concepts in the Anatomy of an Apache Spark Job!...
The 1-2-3-4-5-6-7-8-9 of Cognitive Computing ! Dear Data...