Big Data Fundamentals, Big Data Platform Knowledge, and Big Data Scenario Applications.
Big data basics have three main parts: math, statistics and computers;
Big data platform knowledge: is the basis of big data development, often to build Hadoop, Spark platform;
Currently, a big data engineer's monthly salary is easily more than 10,000, and an engineer with several years of experience has a salary ranging from 400,000 to 1,600,000 yuan. The salary of an engineer with a few years of experience ranges from 400,000 to 1,600,000 yuan, while the more top-notch big data technology talents easily earn more than a million dollars a year.
Engaged in the big data, need to master what technology
1, Java programming
Java language is the foundation, you can write Web applications, desktop applications, distributed systems, embedded systems applications, etc. The Java language has a lot of advantages, and its cross-platform ability to win the favor of many engineers.
2, Linux basic operating commands
Big data development is generally carried out in the Linux environment. Big data engineers use commands mainly in three areas: view processes, including CPU, memory; troubleshooting, locate the problem; troubleshooting the cause of system slowdowns and so on.
3, Hadoop
Hadoop in the most used HDFS clusters and MapReduce framework. HDFS stores data and optimizes the access process.
MapReduce facilitates engineers to write applications.
4, HBase
HBase can be random, real-time reading and writing of big data, more suitable for unstructured data storage, the core is a distributed, column-oriented Apache HBase database. HBase as Hadoop's data to see, its application, architecture and advanced use of big data development is very important.
5, Hive
Hive as a data warehouse tool for Hadoop, which facilitates data aggregation and statistical analysis.
6, ZooKeeper
ZooKeeper is an important component of Hadoop and Hbase, which can be coordinated as a distributed application.ZooKeeper's functions are mainly: configuration maintenance, domain name service, distributed synchronization, component services.
7, Phoenix
Phoenix is an open source sql engine, is written in Java language.
8, Avro and Protobuf
Avro, Protobuf is suitable for data storage data serialization system , there are richer types of data structures , can be communicated between many different languages.
9, Cassandra
Apache Cassandra is running on the server or cloud infrastructure can provide the perfect platform for data database, with high performance, scalability, high linearity.
Cassandra supports replication between data centers, low latency, and immunity to power outages. Its data model has column indexes, high-performance views, and built-in caching.
10, Kafka
Kafka can be clustered to provide real-time messages distributed publish-subscribe messaging system with high throughput, mainly using Hadoop's parallel loading to unify online and offline message processing.
11, Spark
Spark is designed for large-scale data processing and fast general-purpose computing engine, which provides a comprehensive, unified framework for managing a variety of different nature of the data set and data sources of big data processing needs, big data development needs to master the Spark foundation, SparkJob, Spark RDD, Spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming and Spark ML and other related knowledge.
12, Flume
Flume is a massive log processing system, with high availability, high reliability, distributed features, you can collect, aggregate and transmit logs.Flume can be customized to the data sender to collect the data, but also can be a simple processing of the data to write to the data receiver.
In addition to the skills of the big data industry mentioned here, if you want to have a longer and smoother development, you need to continue to cultivate their own skills.