The order of birth, hadoop belongs to the first generation of open source big data processing platform, while spark belongs to the second generation. Spark belongs to the next generation is certainly better than the first generation of hadoop in the comprehensive evaluation.
Calculation of different spark and hadoop in the underlying ideas of distributed computing, in fact, is extremely similar, that is, the mapreduce distributed computing model: the operation is divided into two phases, phase 1-map, responsible for the data from the upstream to pull the respective operation, and then shuffle the results of the operation, and then the data will be used as the basis for the operation, and then the results of the operation will be used for the operation. Then the results of the operation will be shuffle to the downstream reduce, reduce and then each of the data read through the shuffle for the aggregation of operations spark and hadoop in the specific implementation of distributed computing, and there are differences; hadoop in the mapreduce computing framework, an operation job, a map-reduce process; while spark's a job, you can cascade multiple map-reduce process.
Different platforms spark and hadoop difference is that spark is a computing platform, and hadoop is a composite platform (contains computing engine, also contains distributed file storage system, also contains distributed computing resource scheduling system), so, spark and hadoop to compare, mainly than computing this piece of big data technology development to the current stage. Currently at this stage, hadoop is mainly its computing part of the decline, and spark is currently in the day, the relevant technology demand, offer good to get.