Big data technology, the technology to quickly obtain valuable information from various types of data. A large number of new technologies have emerged in the field of big data, which have become powerful weapons for big data collection, storage, processing and presentation.
The key technologies of big data processing generally include: big data acquisition, big data preprocessing, big data storage and management, big data analysis and mining, big data presentation and application (big data retrieval, big data visualization, big data application, big data security, etc.).
I. Big Data Collection Technology
Data refers to various types of structured, semi-structured (or called weakly structured) and unstructured massive data obtained by means of RFID radio frequency data, sensor data, social network interaction data, and mobile Internet data, which is the root of the big data knowledge service model. The focus should be on breakthroughs in distributed high-speed and highly reliable data crawling or collection, high-speed data full image and other big data collection technologies; breakthroughs in high-speed data parsing, conversion and loading and other big data integration technologies; and the design of quality assessment models and the development of data quality technologies.
The Internet is a magical big network, big data development and software customization is also a model, here to provide the most detailed offer, if you really want to do, you can come here, the beginning of this hand technique of the number is a one-eighty-seven in the middle of the three children zero last one-four-twenty-five-zero, in accordance with the order of combinations can be found, I would like to say that, unless you want to do or to understand this aspect of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents of the contents, if just come along for the ride, don't come along.
Big data collection is generally divided into big data intelligent perception layer: mainly including data sensing system, network communication system, sensing adaptation system, intelligent identification system and hardware and software resources access system, to achieve the structured, semi-structured, unstructured massive data intelligent identification, positioning, tracking, access, transmission, signal conversion, monitoring, preliminary processing and management. It must focus on overcoming intelligent identification, sensing, adaptation, transmission, access and other technologies for big data sources. Basic Support Layer: Provide the basic support environment such as virtual servers, databases of structured, semi-structured and unstructured data, and IOT network resources required by the big data service platform. It focuses on tackling distributed virtual storage technology, visualization interface technology for big data acquisition, storage, organization, analysis and decision-making operations, network transmission and compression technology for big data, and privacy protection technology for big data.
Two, big data preprocessing technology
Mainly complete the analysis of the received data, extraction, cleaning and other operations.1) Extraction: Because the acquired data may have a variety of structures and types, the data extraction process can help us to transform these complex data into a single or easy to deal with the configuration, in order to achieve the purpose of rapid analysis and processing.2) Cleaning: For big data, not all of them are valuable. Not all valuable, some data is not what we care about, while some other data is completely wrong interference, so the data should be filtered through the "denoising" to extract valid data.
Three, big data storage and management technology
Big data storage and management to use the memory to store the collected data, the establishment of the corresponding database, and management and call. Focus on solving complex structured, semi-structured and unstructured big data management and processing technology. It mainly solves several key problems such as storable, representable, processable, reliable and effective transmission of big data. Develop reliable distributed file system (DFS), energy-efficiency optimized storage, computation into storage, big data de-redundancy and efficient and low-cost big data storage technology; breakthrough in distributed non-relational big data management and processing technology, data fusion technology of heterogeneous data, data organization technology, research on big data modeling technology; breakthrough in big data indexing technology; breakthrough in big data movement, backup, replication and other technologies. ; develop big data visualization technology.
Development of new database technology, database is divided into relational database, non-relational database and database caching system. Among them, non-relational databases mainly refer to NoSQL databases, which are divided into: key-value databases, column-store databases, graph-store databases and document databases and other types. Relational databases include traditional relational database systems and NewSQL databases.
Development of big data security technology. Improve data destruction, transparent encryption and decryption, distributed access control, data auditing and other technologies; breakthroughs in privacy protection and inference control, data authenticity identification and forensics, data holding integrity verification and other technologies.
Four, big data analysis and mining technology
Big data analysis technology. Improvement of existing data mining and machine learning techniques; development of data network mining, specific group mining, graph mining and other new data mining techniques; breakthroughs in object-based data connectivity, similarity connectivity and other big data fusion techniques; breakthroughs in user interest analysis, network behavior analysis, emotional semantic analysis and other domain-oriented big data mining techniques.
Data mining is the process of extracting information and knowledge implied in a large amount of incomplete, noisy, fuzzy, and random practical application data, which people do not know beforehand, but are potentially useful. Data mining involves many technical approaches with various classifications. According to the mining task can be divided into classification or predictive model discovery, data summarization, clustering, association rule discovery, sequence pattern discovery, dependency or dependency model discovery, anomaly and trend discovery, etc.; according to the mining object can be divided into relational databases, object-oriented databases, spatial databases, spatial databases, temporal databases, textual data sources, multimedia databases, heterogeneous databases, legacy databases, and the Globes Web According to the mining method, it can be roughly divided into: machine learning methods, statistical methods, neural network methods and database methods. Machine learning can be subdivided into: inductive learning methods (decision trees, rule induction, etc.), based on example learning, genetic algorithms, etc.. Statistical methods can be subdivided into: regression analysis (multiple regression, autoregression, etc.), discriminant analysis (Bayesian discriminant, Fisher's discriminant, nonparametric discriminant, etc.), cluster analysis
(systematic clustering, dynamic clustering, etc.), exploratory analysis (principal meta-analysis, correlation analysis, etc.) and so on. Neural network methods can be subdivided into:forward neural networks (BP algorithm, etc.), self-organizing neural networks (self-organizing feature mapping, competitive learning, etc.) and so on. Database methods are mainly multidimensional data analysis or OLAP methods, in addition to attribute-oriented induction methods.
From the perspective of mining tasks and mining methods, focus on breakthroughs:
1. Visualization and analysis. Data visualization is the most basic function for both common users and data analysis experts. Data visualization allows data to speak for itself, allowing users to visualize the results.
2. Data mining algorithms. Imaging is the translation of machine language to human, and data mining is the machine's native language. Segmentation, clustering, isolated point analysis and a variety of various algorithms allow us to refine the data and mine the value. These algorithms must be able to cope with the volume of big data and still have high processing speed.
3. Predictive analytics. Predictive analytics allows analysts to make some forward-looking judgments based on the results of image-based analysis and data mining.
4. Semantic engine. Semantic engines need to be designed with enough artificial intelligence to be sufficient to proactively extract information from data. Language processing technologies include machine translation, sentiment analysis, opinion analysis, intelligent input, question and answer systems, etc.
5. Data quality and data management. Data quality and management is the best practice of management, through standardized processes and machine processing of data can ensure that a preset quality of analysis results.
Six, big data display and application technology
Big data technology can be hidden in the massive data in the information and knowledge mining out for human socio-economic activities to provide the basis for improving the operational efficiency of various fields, greatly improving the degree of intensification of the entire socio-economic. In China, big data will focus on the application of the following three major areas: business intelligence, government decision-making, public **** services. For example: business intelligence technology, government decision-making technology, telecommunications data information processing and mining technology, grid data information processing and mining technology, meteorological information analysis technology, environmental monitoring technology, police cloud application system (road monitoring, video surveillance, network monitoring, intelligent transportation, anti-telecommunication fraud, command and control, and other public security information systems), large-scale genetic sequence analysis and comparison technology, Web information mining technology, multimedia data parallelization processing technology, multimedia data processing, and public service. technology, multimedia data parallelization processing technology, film and television production rendering technology, other various industries of cloud computing and massive data processing application technology.