Current location - Loan Platform Complete Network - Big data management - From three directions to predict the future trend of big data development
From three directions to predict the future trend of big data development
From three directions to predict the future trend of big data development

The development of technology, so that the world every day a steady stream of data, with the concept of big data was put forward, this technology is gradually developed into an industry, and is constantly optimistic. So how is the future development of the big data industry? Three directions to predict the future trend of big data technology development:

(a) social networking and Internet of Things technology to expand the data collection technology channels

After the construction of industry informatization, health care, transportation, finance and other fields have accumulated a lot of internal data, constituting a "stock" of big data resources; and the development of the mobile Internet and the Internet of Things, has greatly expanded the data collection technology channels. The development of the mobile Internet and the Internet of Things (IoT) has greatly enriched the collection channels of big data, and the data from external social networks, wearable devices, the Internet of cars, the Internet of Things, and the data from the government's public information platforms will become the main body of the incremental data resources of big data. Currently, the deep popularization of the mobile Internet provides a rich source of data for big data applications.

In addition, the rapidly developing Internet of Things (IoT) will become an increasingly important provider of big data resources. Relative to the existing Internet data clutter and low value density, through wearable, car networking and other kinds of data collection terminals, directed collection of data resources more utilization value. For example, intelligent wearable devices after several years of development, smart bracelets, wristbands, watches and other wearable are maturing, smart keychains, bicycles, chopsticks and other equipment layer, foreign Intel, Google, Facebook, domestic Baidu, Jingdong, millet and other layout.

Enterprise internal data is still the main source of big data, but the demand for external data is getting stronger. Currently, 32% of the enterprises through external purchase of the acquired data; only 18% of the enterprises use the government open data. How to promote the construction of big data resources, improve the quality of data and promote cross-border integration and circulation is one of the key issues to promote the further development of big data applications.

Overall, all industries are committed to developing incremental resources by actively expanding technical channels for emerging data collection on top of utilizing stock resources. Social media, IoT, etc. have greatly enriched the potential channels for data collection, and theoretically, data acquisition will become easier and easier.

(ii) Distributed storage and computing technology has solidified the technical foundation of big data processing

Big data storage and computing technology is the foundation of the entire big data system.

In terms of storage, the file system (GFS) proposed by Google and others around 2000, and the subsequent Hadoop's distributed file system HDFS (Hadoop Distributed File System) laid the foundation of big data storage technology.

Compared with traditional systems, GFS/HDFS physically combines compute and storage nodes to avoid the I/O throughput constraints that tend to form in data-intensive computation, while the file systems of such distributed storage systems also use a distributed architecture to achieve high concurrent access capabilities.

In terms of computing, Google in 2004 publicized the MapReduce distributed parallel computing technology, is a representative of the new distributed computing technology. A MapReduce system consists of inexpensive general-purpose servers, and the total processing power of the system can be linearly expanded by adding server nodes (Scale Out), which has huge advantages in cost and scalability.

(C) deep neural networks and other emerging technologies to open up a new era of big data analytics technology

Big data data analytics technology, generally divided into online analytical processing (OLAP, OnlineAnalytical Processing) and data mining (Data Mining) two categories.

OLAP technology, generally based on a series of user assumptions, interactive data set query, correlation and other operations on multi-dimensional data sets (generally using SQL statements) to verify these assumptions, represents the idea of deductive reasoning approach.

Data mining techniques, generally active in the massive data to find models, automatic development of patterns hidden in the data (Pattern), represents the idea of inductive approach.

Traditional data mining algorithms are mainly:

(1) Clustering, also known as cluster analysis, is a statistical analysis method to study the classification problem (samples or indicators), for the similarity and difference of the data will be divided into a set of data into several categories. There is a great deal of similarity between data belonging to the same category, but very little similarity between data from different categories, and very little correlation of data across categories. Enterprises can perform customer clustering through the use of cluster analysis algorithms, in the absence of clear behavioral characteristics of customer groups in the case of customer data from different dimensions of clustering, and then clusters of customers for feature extraction and analysis, so as to capture the characteristics of the customer to recommend the appropriate products and services.

(2) classification, similar to clustering, but with different purposes, classification can use the pre-generated model of clustering, but also through empirical data to find out a set of data objects **** the same point, the data will be divided into different classes, the purpose of which is to map the data items to a given category through the classification model, the representative algorithm is CART (Classification and Regression Tree). Enterprises can categorize various business data such as users, products, services, etc., build classification models, and then predictively analyze the new data to make it fall into the existing classes. The classification algorithms are more mature and the classification accuracy is higher, which has a very good predictive ability for the accurate positioning of customers, marketing and services, and helps enterprises to make decisions.

(3) regression, reflecting the characteristics of the attribute values of the data, through the function expresses the relationship of data mapping to discover the relationship between the attribute values at a glance. It can be applied to the prediction of data series and the study of correlations. Enterprises can use the regression model to analyze and predict the market sales situation and make corresponding strategic adjustments in time. In risk prevention, anti-fraud and other aspects can also be early warning through regression modeling.

Traditional data methods, whether traditional OLAP technology or data mining technology, are difficult to cope with the challenges of big data. The first is the low efficiency of implementation. Traditional data mining techniques are based on centralized development of the underlying software architecture, which is difficult to parallelize, and thus inefficient in handling data above the terabyte level. The second is that the accuracy of data analysis is difficult to improve with the increase in data volume, especially difficult to cope with unstructured data.

Only a very small portion (about 1% of the total data volume) of numerical data has been analyzed and mined in-depth (e.g., regression, classification, clustering) in all digital data of human beings; large Internet enterprises have conducted shallow analysis (e.g., sorting) of semi-structured data such as webpage indexes and social data, and unstructured data such as voice, pictures, and videos, which account for nearly 60% of the total volume, have been difficult to be analyzed effectively. effectively analyzed.

So, the development of big data analytics technology needs to make breakthroughs in two aspects, one is to carry out high-efficiency in-depth analysis of the huge volume of structured and semi-structured data, mining hidden knowledge, such as understanding and identifying semantics, emotions, and intentions from the text pages composed of natural language; and the other is to analyze the unstructured data, and transform the massive and complex multi-source voice, image, and video data into machine-recognizable and useful data. The second is to analyze unstructured data, transforming massive and complex multi-source voice, image and video data into machine-recognizable information with clear semantics, and then extracting useful knowledge from it.

At present, big data analytics represented by emerging technologies such as deep neural networks have been developed to a certain extent.

Neural network is an advanced artificial intelligence technology, with its own self-processing, distributed storage and a high degree of fault tolerance and other characteristics, is very suitable for dealing with non-linear and those with fuzzy, incomplete, not rigorous knowledge or data, is very suitable for solving the problem of big data mining.

The typical neural network model is mainly divided into three categories: the first category is a feed-forward neural network model used for classification prediction and pattern recognition, which is mainly represented by functional networks and perceptual machines; the second category is a feedback neural network model used for associative memory and optimization algorithms, which is represented by Hopfield's discrete model and continuous model. The third category is the self-organizing mapping method for clustering, represented by ART model. However, although there are various models and algorithms for neural networks, there are no uniform rules for what models and algorithms to use in domain-specific data mining, and it is difficult for people to understand the learning and decision-making process of the network.

With the increasing integration of the Internet and traditional industries, mining and analyzing web data has become an important part of demand analysis and market forecasting.

Web data mining is a comprehensive technique to discover the hidden input-to-output mapping process from document structure and usage set.

Currently, the research and application of more is the PageRank algorithm, PageRank is an important element of Google algorithm, in September 2001 was awarded a U.S. patent to one of the founders of Google, Larry Page (Larry Page), named. PageRank measures the value of a website based on the number and quality of external and internal links. The concept was inspired by the phenomenon in academic research that the more frequently a paper is cited, the higher the authority and quality of the paper is generally judged to be.

It should be noted that the industry and enterprise characteristics of data mining and analysis are strong, in addition to some of the most basic data analysis tools, there is a lack of targeted, generalized modeling and analysis tools. Various industries and enterprises need to build specific data models based on their own business. The strength of the ability to build data analysis models has become the key to victory for different enterprises in the big data competition.