BlinkDB a project being developed by the Berkeley University where the evolution of Spark started is a massively parallel interactive Query Engine processing tens of TB of data with response time of just a blink of an eye. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by […]
It’s very clear that every stake holders from Business to IT teams are traction towards Big Data, but the first and foremost challenge is getting the right tool fitment. And the power of open source brings us more additional and potential tools often. Wish, the study on Hadoop Distribution will help us on the first […]
Apache Spark: 4+ years old Suited for sophisticated analytics at lighting speed Runs 1oo times faster in memory Runs 10 times faster in disk Supports in-memory processing Suits for interactive computing at blazzing fast speeds Supports developer with Java, Python & Scala API It runs on existing Hadoop cluster Compatible with HDFS, HBase and any […]
Apache Spark is fast and general engine for big data processing with libraries for SQL, streaming, advanced analytics. RDD is great abstraction for data sets, Immutable collection of data, which stands for Resilient Distributed Storage In Spark all work is expressed in following – Creating new RDDs, Transforming existing RDDs, Calling Operations on RDDs (eg.val […]
Spark SQL is a spark interface for both structured and semi-structured data Loads data from a variety of structured sources like Hive Tables, JSON and Parquet columnar storage Spark SQL allows to query data using SQL, both in internal & external to Spark core engine It provides robust integration between SQL and Python/Java/Scala code Spark SQL […]
Spark Streaming is Sparks module for applications such are benefits from data as soon as it lands/arrives from various sources. E.g. page view in real time, train a machine learning model, automatically detect anomalies. Developer can use a API which is very similar to batch jobs, also we can reuse the same API skills and […]
The Data Lake vs. Data Warehouse in Big Data ! Big Data use cases are in evolution from all over the verticals like Insurance, Healthcare, Manufacturing, Financial, Retail and more. Customers are using Big Data to improve top & bottom line revenue with business values. With this data driven era, enterprise readiness and data management […]
Team, tons & thousands of thanks for reading and engaging ! This time am pleasure to share with you all my learning’s in Data Import, Export from Hadoop’s file system; which is core component to pump the data to Database, Warehouse, Analytics and Business. We titled as “Heart of the Hadoop is HDFS“. It’s no […]
Team, this time i go with the title called “Top 3 methods of skipping big data’s bad data using Hadoop !“ which describes about how to get corrupt records out from the large data sets which has different format of data. While doing our analysis if the corrupt records are in small percentage we can ignore or […]
Team thanks for reading & engaging ! This time am planned to share with you the my learning on Hadoop Schedulers; titled “Simplified Hadoop Schedulers Overview !” With the help of choosing suitable scheduler, we can make the response times faster for all smaller jobs and also for all the production jobs it’s guaranteed with SLA’s (Service […]
Big Data gives new insights into what people do on their own, and on a massive scale. Thick Data reveals motivations, intent, emotions that might not be obvious from Big Data. It’s not mine, learnt from Data-informed Product Design book by Pamela Pavliscak. Tons of Thanks to O’Reilly team. Much Thanks for your valuable time.
Pleased to share the Single Slider On Data Lake vs. Data Warehouse includes defintion, key properties, use cases & user groups. To conclude, for enterprise data driven organization both data ware house & data lake plays vital role. In nutshell Data Lake + Data Warehouse = Business Value ! Thanks for your time (TREASURED) and engaging. As […]
The Bot 101 [ Part 1 ] For me bot is new word, on first time...
Getting Started with Google Cloud Platform ! Last month got...
PocketGear on Getting Started with Google Cloud Platform !...
Top 10 Reasons to Run Hadoop in the Public Cloud ! Hadoop...