Why is Spark better than Map Reduce ?
As the size of the data increases the usual processing time of the engines also increases there is always new tools comming up tackling this problem starting with mapreduce and now Spark and something replacing spark in the near future. But sticking to our title, Spark process data in-memory while map reduce pushes the data to the disk for processing this is one main reason of the many other why Spark scores over map reduce. Spark is very easy to use, easy I mean its very easy unlike the mapreduce which has a lot of boilerplate work to be done, a simple logic to be written in mapreduce requires 100+ lines of code which could be accomplished in less then 20 lines in Spark.
Not to stop with this any advanced programming involving graphing or machine learning could be easily done by Spark than in mapreduce. Nowadays most of the machines are integrated with large memory processing and more than 80% of the business scenarios will fall into this category making Spark the ultimate match for it compared to Mapreduce.
Spark also has an advanced DAGexecution engine that supports cyclic data flow making it 100x faster than Map Reduce. What more, in Spark you write various applications using various programming language like Java, Python , Scala and R. Spark framework itself has been developed using Scala. Spark can be run against any hadoop data sources like cassandra, Hbase, Hive, Tachyon etc..
Hadoop Map Reduce we had to use many complimentory tools like Storm for Streaming, Giraph for Graphs, Hive for SQL etc with Spark all came together as a generalized abstraction bringing the bigdata pipeline to one piece of unit Spark Core containing Streaming Lib, SQL processing even Machine Learning and Graph computation making it much stronger.
Just to compare the LOC of various components compared to Apache Spark
Map Reduce + Impala + Storm + Giraph = 340000 LOC approx.
Spark (Spark Core + SQL + Streaming + GraphX) = 80000 LOC approx.
This clearly shows Spark framework itself has been designed keeping simplicity and performance in mind.
Thanks for Reading.