Process your Query at the blink of an eye
BlinkDB a project being developed by the Berkeley University where the evolution of Spark started is a massively parallel interactive Query Engine processing tens of TB of data with response time of just a blink of an eye.
BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.
The Two Key ideas which BlinkDB uses are
- A adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time
- A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements.
Benchmark performance has been evaluated by running BlinkDB against Hive on Spark and Hive on Mapreduce and the results BlinkDB demonstarated its efficiency 200x faster than HIve with an error of (2-10)%, mainly because of its dynamic sample selection strategy.
Please comment below if you feel BlinkDB would replace all the query processing engines in the Bigdata Ecosphere.
For more information please visit : http://blinkdb.org/