Big Data Key Technology Explanation
Big data technology is the technology to quickly obtain valuable information from various types of data. A large number of new technologies have emerged in the field of big data, which have become powerful weapons for big data collection, storage, processing and presentation.Big data processing key technologies generally include: big data acquisition, big data pre-processing, big data storage and management, big data analysis and mining, big data presentation and application (big data retrieval, big data visualization, big data application, big data security, etc.). First, big data collection technology
Data collection refers to various types of structured, semi-structured (or called weakly structured) and unstructured massive data obtained by means of RFID radio frequency data, sensor data, social network interaction data, and mobile Internet data, which is fundamental to the big data knowledge service model. The focus should be on breaking through distributed high-speed and highly reliable data crawling or collection, high-speed data full image and other big data collection technologies; breaking through high-speed data parsing, conversion and loading, and other big data integration technologies; and designing quality assessment models and developing data quality technologies.
Big data collection is generally divided into big data intelligent perception layer: mainly including data sensing system, network communication system, sensing adaptation system, intelligent identification system and software and hardware resources access system, to achieve intelligent identification, positioning, tracking, access, transmission, signal conversion, monitoring, preliminary processing and management of structured, semi-structured, unstructured massive data, etc. . It must focus on overcoming intelligent identification, sensing, adaptation, transmission, access and other technologies for big data sources. Basic Support Layer: Provide the virtual servers, databases of structured, semi-structured and unstructured data, and IOT network resources and other basic support environments required by the big data service platform. Focus on tackling distributed virtual storage technology, big data acquisition, storage, organization, analysis and decision-making operation of visual interface technology, big data network transmission and compression technology, big data privacy protection technology.
Second, big data preprocessing technology
Mainly complete the analysis of the received data, extraction, cleaning and other operations. 1) extraction: because the acquired data may have a variety of structures and types, the data extraction process can help us to transform these complex data into a single or easy to deal with the configuration, in order to achieve the purpose of rapid analysis. 2) cleaning: cleaning: because the acquired data may have a variety of structures and types, the data extraction process can help us to transform these complex data into a single or easy to deal with configuration, in order to achieve the purpose of rapid analysis and processing. 2) Cleaning: for big data, not all valuable, some data is not what we care about, while some other data is completely wrong interference, so the data should be filtered through the "denoising" to extract valid data.
Third, big data storage and management technology
Big data storage and management to use the memory to store the collected data, the establishment of the corresponding database, and management and call. Focus on solving complex structured, semi-structured and unstructured big data management and processing technology. It mainly solves several key problems such as storable, representable, processable, reliable and effective transmission of big data. Develop reliable distributed file system (DFS), energy-efficiency optimized storage, computation into storage, big data de-redundancy, and efficient and low-cost big data storage technology; breakthrough in distributed non-relational big data management and processing technology, heterogeneous data data fusion technology, data organization technology, and research on big data modeling technology; breakthrough in big data indexing technology; breakthrough in big data movement, backup, replication, and other technologies; develop big data visualization technology. ; develop big data visualization technology.
Developing new database technology, the database is divided into relational database, non-relational database, and database caching system. Among them, non-relational databases mainly refer to NoSQL databases, which are divided into: key-value databases, column-store databases, graph-store databases, and document databases and other types. Relational databases include traditional relational database systems and NewSQL databases.
Develop big data security technologies. Improve data destruction, transparent encryption and decryption, distributed access control, data auditing and other technologies; breakthroughs in privacy protection and inference control, data authenticity identification and forensics, data holding integrity verification and other technologies.
Fourth, big data analysis and mining technology
Big data analysis technology. Improvement of existing data mining and machine learning technologies; development of data network mining, specific group mining, graph mining and other new data mining technologies; breakthrough object-based data connection, similarity connection and other big data fusion technologies; breakthrough user interest analysis, network behavior analysis, emotional semantic analysis and other domain-oriented big data mining technologies.
Data mining is the process of extracting information and knowledge implicit in a large amount of incomplete, noisy, fuzzy, and randomized data from actual applications, which people do not know beforehand, but are potentially useful. Data mining involves many technical approaches with various classifications. According to the mining task can be divided into classification or predictive model discovery, data summarization, clustering, association rule discovery, sequence pattern discovery, dependency or dependency model discovery, anomaly and trend discovery, etc.; according to the mining object can be divided into relational databases, object-oriented databases, spatial databases, spatial databases, temporal databases, textual data sources, multimedia databases, heterogeneous databases, legacy databases, and the Globes Web According to the mining method, it can be roughly divided into: machine learning methods, statistical methods, neural network methods and database methods. Machine learning can be subdivided into: inductive learning methods (decision trees, rule induction, etc.), based on example learning, genetic algorithms, etc.. Statistical methods can be subdivided into: regression analysis (multiple regression, autoregression, etc.), discriminant analysis (Bayesian discriminant, Fisher's discriminant, nonparametric discriminant, etc.), clustering analysis (systematic clustering, dynamic clustering, etc.), exploratory analysis (principal meta-analysis, correlation analysis, etc.) and so on. Neural network methods can be subdivided into: forward neural networks (BP algorithm, etc.), self-organizing neural networks (self-organizing feature mapping, competitive learning, etc.) and so on. Database methods are mainly multi-dimensional data analysis or OLAP methods, in addition to attribute-oriented induction methods.
From the point of view of mining tasks and mining methods, focusing on breakthroughs: 1. Visualization and analysis. Data visualization is the most basic function for both ordinary users and data analysis experts. Data visualization allows the data to speak for itself, allowing users to visualize the results.2. Data mining algorithms. Imaging is the translation of machine language to people, and data mining is the machine's native language. Segmentation, clustering, isolated point analysis and a variety of various algorithms allow us to refine the data, mining value. These algorithms must be able to cope with the volume of big data, but also have a high processing speed.3. Predictive analytics. Predictive analytics allows analysts to make forward-looking judgments based on the results of image-based analytics and data mining.4. Semantic engine. Semantic engines need to be designed with enough artificial intelligence to be sufficient to actively extract information from the data. Language processing techniques include machine translation, sentiment analysis, opinion analysis, intelligent input, Q&A systems, etc.5. Data quality and data management. Data quality and management is a management best practice that ensures a predefined quality of analytics through standardized processes and machine processing of data.
V. Big data display and application technology
Big data technology can be hidden in the massive amount of data in the information and knowledge excavation, for human socio-economic activities to provide the basis for improving the operational efficiency of various fields, and greatly improve the degree of intensification of the entire social economy. In China, big data will focus on the application of the following three major areas: business intelligence, government decision-making, public **** services. For example: business intelligence technology, government decision-making technology, telecommunications data information processing and mining technology, grid data information processing and mining technology, meteorological information analysis technology, environmental monitoring technology, police cloud application system (road monitoring, video surveillance, network monitoring, intelligent transportation, anti-telecommunication fraud, command and control, and other public security information systems), large-scale genetic sequence analysis and comparison technology, Web information mining technology, multimedia data parallelization processing technology, multimedia data processing, and public service. technology, multimedia data parallelization processing technology, film and television production rendering technology, and other various industries of cloud computing and massive data processing application technology.
The above is what I shared with you about the analysis of key technologies of big data, more information can be concerned about the Global Ivy to share more dry goods