Big Data Collection and Management is a profession that systematically helps companies master the solutions to various typical problems in big data applications, such as data management, system development, massive data analysis and mining.
1, the current state of the industry: now more and more industries are optimistic about the application of big data, big data or related data analytics solutions used in the Internet industry, such as Baidu, Tencent, Taobao, Sina and other companies have become the standard. And in traditional industries like telecommunications, finance, and energy, more and more users are trying or considering how to use big data solutions to improve their business.
2. Curriculum: The Big Data program will systematically help companies master the solutions to various typical problems in Big Data applications from the three main levels of Big Data applications (i.e., data management, system development, and massive data analysis and mining), including the implementation and analysis of collaborative filtering algorithms, the running and learning of classification algorithms, the construction and benchmarking of distributed Hadoop clusters, the construction of distributed Hbase clusters, and the benchmarking of distributed Hbase clusters. Hbase cluster construction and benchmarking, the realization of a Mapreduce-based, parallel algorithms, the deployment of Hive and the realization of a data operation, etc., to actually enhance the ability of enterprises to solve practical problems.
3, the core technology:
(1) big data and Hadoop ecosystem. Detailed introduction to analyze the principles and applications of distributed file system HDFS, cluster file system ClusterFS and NoSQL Database technology; distributed computing framework Mapreduce, distributed database HBase, distributed data warehouse Hive.
(2) Relational database technology. Detailed introduction to the principles of relational databases, mastering the construction, management, development and application of typical enterprise-level database.
(3) Distributed Data Processing. Detailed introduction to analyze the Map/Reduce computing model and Hadoop Map/Reduce technology principles and applications.
(4) Massive Data Analysis and Data Mining. Detailed introduction to data mining techniques, data mining algorithms-Minhash, Jaccard and Cosine similarity, TF-IDF data mining algorithms-clustering algorithms; and specific applications of data mining techniques in the industry.
(5) Internet of Things and Big Data. Detailed introduction to big data applications in the Internet of Things, automatic interpretation of remote sensing images, querying, analyzing and mining of time series data.
(6) File System (HDFS). Detailed introduction to HDFS deployment, high performance based on HDFS provides high throughput data access.
(7) NoSQL. Detailed introduction to the principles, architecture and typical applications of NoSQL non-relational database systems.