We've given you a brief overview of what you need to know about some of the basic skills required of a Big Data Operator in our previous article. The following IT training to understand, in the learning of big data in different learning stages need to know what knowledge.
Data storage stage: SQL, oracle, IBM and so on have related courses, according to the different companies, learn the development tools of these enterprises, basically can be qualified for this stage of the position.
Data mining cleaning and screening: big data engineers, to learn JAVA, Linux, SQL, Hadoop, data serialization system Avro, data warehouse Hive, distributed database HBase, data warehouse Hive, Flume distributed logging framework, Kafka distributed queuing system course, Sqoop data migration. pig development, Storm real-time data processing. Learn the above can basically get started big data engineers, if you want to have a better starting point, it is recommended that the early learning scala programming, Spark, R language and other basic now more professional skills inside the enterprise.
Data analysis: on the one hand, to build a data analysis framework, such as determining the analysis of ideas need to marketing, management and other theoretical knowledge; there are also data analysis conclusions for the analysis of the proposed guiding significance of the analysis of the proposal.
Product Adjustment: After analyzing the data to the boss and PM after consultation with the product update, and then handed over to the programmer for modification (FMCG category to adjust the shelves of goods).
Then come to understand the big data need to master those technologies
Hadoop core
(1) Distributed Storage Cornerstone: HDFS
Introduction to HDFS introductory demonstration of the composition and analysis of the principle of work: the data block, NameNode, DataNode, data writing and reading process, data replication, HA program, File types, HDFS commonly used settings JavaAPI code demonstration
(2)Distributed Computing Fundamentals: MapReduce
MapReduce introduction, programming model, JavaAPI introduction, programming case introduction, MapReduce tuning
(3)Hadoop Cluster Resource Manager: YARN
YARN basic architecture resource scheduling process scheduling algorithms YARN on the computational framework