Current location - Loan Platform Complete Network - Big data management - What is the relationship between big data and data mining?
What is the relationship between big data and data mining?

Data mining is based on database theory, machine learning, artificial intelligence, modern statistics of the rapid development of the interdisciplinary, in many fields have applications. It involves a lot of algorithms, derived from machine learning neural networks, decision trees, but also based on statistical learning theory of support vector machines, classification regression trees, and correlation analysis of many algorithms. Data mining is defined as finding meaningful patterns or knowledge from large amounts of data.

Big data has three important characteristics: large volume of data, complex structure, and fast data update rate. Due to the development of Web technology, the automatic preservation of data generated by web users, and the continuous collection of data by sensors, as well as the development of mobile Internet, the speed of automatic data collection and storage is accelerating, the amount of data around the world is expanding, and the storage and computation of data exceeds the capacity of individual computers (small and mainframe computers), which poses a challenge to the implementation of data mining techniques (in general, the The implementation of data mining is based on a single small or mainframe computer, which can also perform parallel computation).Google proposed a distributed storage file system, which developed the later concepts of cloud storage and cloud computing.

Big data needs to be mapped into small units for computation and then all the results are integrated in what is known as the map-reduce algorithmic framework. Calculations performed on individual computers still require the use of some data mining techniques, the difference being that some of the original data mining techniques are not always easily embedded in the map-reduce framework, and some algorithms need to be adapted.

In addition, the increased processing power of big data poses new challenges for statistics. Statistical theories are often built on samples, whereas in the era of big data, it may be possible to get the whole, and no longer a non-putback sample of the whole.