Current location - Loan Platform Complete Network - Big data management - What is the main thing to learn in the major of Big Data?
What is the main thing to learn in the major of Big Data?
There are three main parts that you need to learn in order to learn big data development:

Big Data Fundamentals, Big Data Platform Knowledge, and Big Data Scenario Applications.

Big data basics have three main parts: math, statistics and computers;

Big data platform knowledge: is the basis of big data development, often to build Hadoop, Spark platform;

Currently, a big data engineer's monthly salary is easily more than 10,000, and an engineer with several years of experience has a salary ranging from 400,000 to 1,600,000 yuan. The salary of an engineer with a few years of experience ranges from 400,000 to 1,600,000 yuan, while the more top-notch big data technology talents easily earn more than a million dollars a year.

Engaged in the big data, need to master what technology

1, Java programming

Java language is the foundation, you can write Web applications, desktop applications, distributed systems, embedded systems applications, etc. The Java language has a lot of advantages, and its cross-platform ability to win the favor of many engineers.

2, Linux basic operating commands

Big data development is generally carried out in the Linux environment. Big data engineers use commands mainly in three areas: view processes, including CPU, memory; troubleshooting, locate the problem; troubleshooting the cause of system slowdowns and so on.

3, Hadoop

Hadoop in the most used HDFS clusters and MapReduce framework. HDFS stores data and optimizes the access process.

MapReduce facilitates engineers to write applications.

4, HBase

HBase can be random, real-time reading and writing of big data, more suitable for unstructured data storage, the core is a distributed, column-oriented Apache HBase database. HBase as Hadoop's data to see, its application, architecture and advanced use of big data development is very important.

5, Hive

Hive as a data warehouse tool for Hadoop, which facilitates data aggregation and statistical analysis.

6, ZooKeeper

ZooKeeper is an important component of Hadoop and Hbase, which can be coordinated as a distributed application.ZooKeeper's functions are mainly: configuration maintenance, domain name service, distributed synchronization, component services.

7, Phoenix

Phoenix is an open source sql engine, is written in Java language.

8, Avro and Protobuf

Avro, Protobuf is suitable for data storage data serialization system , there are richer types of data structures , can be communicated between many different languages.

9, Cassandra

Apache Cassandra is running on the server or cloud infrastructure can provide the perfect platform for data database, with high performance, scalability, high linearity.

Cassandra supports replication between data centers, low latency, and immunity to power outages. Its data model has column indexes, high-performance views, and built-in caching.

10, Kafka

Kafka can be clustered to provide real-time messages distributed publish-subscribe messaging system with high throughput, mainly using Hadoop's parallel loading to unify online and offline message processing.

11, Spark

Spark is designed for large-scale data processing and fast general-purpose computing engine, which provides a comprehensive, unified framework for managing a variety of different nature of the data set and data sources of big data processing needs, big data development needs to master the Spark foundation, SparkJob, Spark RDD, Spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming and Spark ML and other related knowledge.

12, Flume

Flume is a massive log processing system, with high availability, high reliability, distributed features, you can collect, aggregate and transmit logs.Flume can be customized to the data sender to collect the data, but also can be a simple processing of the data to write to the data receiver.

In addition to the skills of the big data industry mentioned here, if you want to have a longer and smoother development, you need to continue to cultivate their own skills.