The exam is coming soon. In order to prepare for the exam, I have carefully prepared for you the "2020 Intermediate Economist Economic Basics Examination Preparation Knowledge Points: Data Mining". Continue to pay attention to this site. You will continue to get more exam information!
2020 Intermediate Economist Economic Basics Preparation Knowledge Points: Data Mining
Data Mining
The related concepts of data mining are as follows:
1 .Meaning: The process of extracting information and knowledge that is hidden but potentially valuable from a large amount of incomplete, noisy, fuzzy, and random practical application data. It includes the following meanings:
(1) The data source must be real, massive, and noisy.
(2) What is discovered is the knowledge that the user is interested in.
(3) The knowledge discovered is acceptable, understandable, and applicable.
(4) It does not require the discovery of knowledge that is universally applicable, but only supports the discovery of specific problems.
2. Starting point and core tasks: Data mining starts from solving practical problems; the core task is to explore data relationships and characteristics.
3. Type
(1) Guided learning or supervised learning
Supervised learning is to learn and model the concept of target requirements by exploring data and Build models to achieve effective interpretation from observed variables to target requirements.
(2) Unsupervised learning or unsupervised learning
Unsupervised learning does not have clear identification variables to express the target concept, and the main task is to explore the intrinsic connections and structures between data.
4. Commonly used algorithms
(1) Classification
1) Meaning: Determine which predetermined category the target object belongs to in order to achieve potential prediction needs in the future . Classification technology is a type of supervised learning, which is a method of building a classification model using training data of known categories.
2) Practical application: distinguishing spam in the mail system, identifying risky customers among loan customers, etc.
3) Commonly used methods: decision tree classification, Bayesian classification, correlation classification, support vector machine, neural network, etc.
(2) Cluster analysis
1) Meaning: Divide a set of data into several categories according to differences and similarities, so that the similarity of similar data is as large as possible and different The data similarity of classes should be as small as possible, and the data correlation across classes should be as low as possible. Clustering is a type of unsupervised learning. The classes to be divided are unknown, and cluster analysis is based on observational learning to determine the relationship between data.
2) Practical application: used for customer segmentation, text classification, structural grouping, behavior tracking and other issues.
3) Commonly used methods: partition-based method, layer-based method, density-based method, grid-based method and model-based method.
(3) Correlation analysis
1) Meaning: It is to mine and extract the correlations and correlations that appear repeatedly in the data set, so that other data can be predicted based on the occurrence of one data item. item appears.
2) Practical application: Beer diaper case. Data mining found that men who buy beer in large supermarkets often buy children’s diapers at the same time. Based on this discovery, the supermarket puts beer and diapers together, with two results. Product sales increased significantly.
3) Commonly used methods: Shopping basket analysis, the purpose is to discover the connection rules between different products in transaction data, allowing marketers to formulate better marketing strategies.
(4) Trend and evolution analysis
Trend and evolution analysis includes data change trends, sequence pattern analysis, periodic analysis and similarity analysis, etc. Statistical regression analysis methods are often used to analyze such problems.