1, Spark framework is written in the Scala language, sophisticated and elegant. To become a Spark master, you must read the source code of Spark, you must master Scala,;
2, although it is said that nowadays Spark can be used multi-language Java, Python, etc. for application development, but the fastest and best-supported development API is still and will always be the Scala way of API, so you must Master Scala to write complex and high-performance Spark distributed programs;
3, especially to master Scala's trait, apply, functional programming, generic, inverse and covariant, etc.;
Second stage: master the Spark platform itself to provide developers with API
1, master Spark RDD development mode, master various kinds of development API. RDD-oriented development model in Spark, and master the use of various transformation and action functions;
2, master the wide dependency and narrow dependency as well as the lineage mechanism in Spark;
3, master the computation process of RDD, such as the division of the Stage, the basic process of submitting a Spark application to the cluster, and the basic working principle of the Worker nodes. Worker nodes, etc.
Stage 3: Deepen Spark Kernel
This stage is mainly through the Spark framework source code reading to deepen the Spark kernel part:
1, through the source code to master Spark's task submission process;
2, through the source code to master the Spark cluster's task scheduling;
3, especially to master the DAGScheduler, TaskScheduler and Worker nodes inside the work of each step of the details;
Fourth class: master the use of the core framework based on Spark
Spark as a cloud computing big data era of the master, in real-time streaming processing, graph technology, machine learning, and the development of the Spark cluster. Processing, graph technology, machine learning, NoSQL queries and other aspects have significant advantages, we use Spark most of the time in the use of its frameworks such as Shark, Spark Streaming, etc.:
1, Spark Streaming is a very good real-time stream processing framework to master its DStream, Transformation and checkpoint;
2, Spark's offline statistical analysis, Spark 1.0.0 version of Shark based on the introduction of Spark SQL, offline statistical analysis of the function of the efficiency of the significant improvement, you need to focus on mastering;
3, for the Spark machine learning and GraphX and so on to master.
The fifth class: to do business-level Spark project
Through a complete representative Spark project to run through all aspects of Spark, including the project's architectural design, analysis of the technology used, development and implementation, operation and maintenance, and complete mastery of each stage and details, so that you can start from the beginning to the end of the project. This will allow you to face most Spark projects in the future.
Sixth class: provide Spark solutions
1, thoroughly master every detail of the Spark framework source code;
2, according to different business scenarios need to provide Spark solutions in different scenarios;
3, according to the actual needs of the Spark framework on the basis of the secondary development, to create their own Spark framework;
3, according to the actual needs of the Spark framework based on secondary development, to create the Spark framework. Spark framework on the basis of the actual needs of secondary development, to create their own Spark framework;
?