Current location - Loan Platform Complete Network - Big data management - What to learn about big data
What to learn about big data

Big Data learn the following:

Stage 1: JavaSE Basic Core

1. In-depth understanding of the Java object-oriented thinking

2. Mastery of the development of commonly used basic API

3. Proficiency in the use of collection frameworks, IO streams, exceptions

4. Be able to develop based on JDK8

5. Proficiency in the use of MySQL, master SQL syntax

Phase 2: Hadoop ecosystem architecture

1. Linux system installation and operation

2. Proficiency in Shell script syntax

3. Idea, Maven and other development tools

4. Hadoop composition, installation, architecture and source code in-depth analysis, as well as proficiency in the use of APIs

5. Hive's installation and deployment, internal architecture, proficiency in the use of its development requirements, as well as enterprise-level tuning

6. Zookeeper's internal principles, the election mechanism, and the big Zookeeper's internal principles, election mechanisms, and applications in the big data ecosystem

7. Flume's architectural principles, component customization, monitoring and building, and proficiency in the use of Flume development needs

8. Kafka's installation and deployment, as well as the principles of the framework, with a focus on mastering Kafka's partitioning allocation strategy, data reliability, data consistency, data disorder, zero-copy principle, efficient reading and writing principle, consumption strategy, rebalancing and other content.

9. Coordinate the Hadoop ecosystem under the Hadoop, Flume, Zookeeper, Kafka, DataX, MaxWell and many other frameworks, to build a data collection system, proficiency in the framework structure and enterprise-level tuning

Stage 3: Spark Ecosystem Architecture

1. Spark's introductory installation and deployment, Spark Core part of the basic API use proficiency, RDD programming advanced, the use of accumulators and broadcast variables and principles of mastery, mastery of Spark SQL programming and how to customize the function, Spark's kernel source code details (including deployment, startup, task scheduling, memory management, etc.). Spark kernel source code details (including deployment, startup, task scheduling, memory management, etc.), Spark's enterprise-level tuning strategy

2. DophineScheduler installation and deployment, skilled use of workflow scheduling execution

3. Understand the theory of data warehouse modeling, fully familiar with the e-commerce industry, the data analysis index system, a rapid mastery of a variety of big data technology frameworks, to understand the understanding of a variety of Data Warehouse Technology Module

4. HBase and Phoenix deployment and use, the principle of architectural explanations and enterprise-level optimization

5. Development tools Git & Git Hub proficiency in the use of

6. Redis, the basic configuration of the introduction, explanation, jedis proficiency

7. Introduction to ElasticSearch installation deployment and tuning

8. Fully understand the construction and use of the user image management platform, the design of the user image system ideas, as well as the design process and application of the label, a preliminary understanding of machine learning algorithms

9. Project combat. Close to the actual processing of big data scenarios, multi-dimensional design of combat projects, to be able to more widely grasp the needs of big data solutions, the whole process of participating in the project to create a short period of time to improve the level of students in the field, the various commonly used frameworks to strengthen the cognition, and rapidly accumulate practical experience

Phase 4: Flink eco-system architecture

1.Master the basic architecture of Flink and the idea of streaming data processing, skilled use of Flink a variety of Soure, Sink processing data, skilled use of the basic API, Window API, state functions, Flink SQL, Flink CEP complex event processing

2.Use Flink to build real-time warehouse projects, skilled Use Flink framework to analyze and calculate a variety of indicators

3.ClickHouse installation, use and tuning

4.Project combat. Close to the actual processing of big data scenarios, multi-dimensional design of practical projects, to be able to more widely grasp the needs of big data solutions, the whole process involved in the project to create a short time to improve the students' level of combat, the various commonly used frameworks to strengthen the cognition, and rapidly accumulate practical experience

5. Optional mastery of recommendation and machine learning projects, familiar with and use of systematic filtering algorithms and content-based recommendation algorithms, etc.

6. Adopting a full set of big data products from the AliCloud platform to reconstruct e-commerce projects, familiar with offline counting warehouse, real-time indicators of AliCloud solutions