Data analysis is the process of extracting valuable information from data, the process requires a variety of data processing and categorization, only to master the correct data classification methods and data processing mode, in order to play a half-hearted effect, the following is the data analysts introduced by the Beijing North Blue Bird essential 9 kinds of data analysis mode of thinking:
1.classification
Classification is A basic way of data analysis, data according to its characteristics, can be divided into different parts and types of data objects, and then further analysis, to be able to further mine the essence of things.
2. Regression
Regression is a widely used method of statistical analysis, which can be used to determine the causal relationship between variables by specifying the dependent and independent variables, establish a regression model, and solve the parameters of the model based on the measured data, and then evaluate whether the regression model is able to fit the measured data well, and if it is able to fit the data well, it can be used to further predict based on the independent variables. If it can fit well, further prediction can be made according to the independent variables.
3. Clustering
Clustering is a classification method that divides data into aggregate classes based on their intrinsic properties, with elements in each aggregate class having the same characteristics as much as possible, and the characteristics of different aggregate classes differing as much as possible from each other, unlike categorical analysis, in which the classes are unknown, and therefore cluster analysis is also known as unguided or unsupervised learning.
Data clustering is a technique for analyzing static data and is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics.
4. Similarity Matching
Similarity matching is a method of calculating the degree of similarity between two pieces of data, which is usually measured as a percentage. Similar matching algorithms are used in many different computational scenarios in areas such as data cleansing, user input error correction, recommendation statistics, plagiarism detection systems, automated scoring systems, web search, and DNA sequence matching.
5. Frequent itemset
Frequent itemset is the collection of items that occur frequently in instances, such as beer and diapers. Apriori algorithm is a frequent itemset algorithm that mines association rules, and its core idea is to mine the frequent itemset through two phases of candidate set generation and downward closure detection of episodes, and it is now widely used in the fields of business, network security, and so on.
6. Statistical description
Statistical description is based on the characteristics of the data, with certain statistical indicators and indicator system, indicating the information fed back by the data, is the basic processing of data analysis, the main methods include: the calculation of the average and variance indicators, graphical representations of the distribution pattern of the information.
7. Link prediction
Link prediction is a method of predicting the relationship that should exist between the data, link prediction can be divided into prediction based on node attributes and prediction based on the structure of the network, link prediction based on the attributes of the nodes include analysis of node qualification attributes and attributes of nodes between the relationship between the information, the use of node information, knowledge sets and similarity between nodes and other methods to get the hidden relationship between the nodes. to get the hidden relationships between nodes. Compared with link prediction based on node attributes, network structure data is easier to obtain. A major viewpoint in the field of complex networks suggests that the traits of individuals in a network are not as important as the relationships between individuals. Therefore link prediction based on network structure has received increasing attention.
8. Data Compression
Data compression is a technical method to reduce the amount of data to reduce the storage space and improve its transmission, storage and processing efficiency without losing the useful information, or to reorganize the data according to a certain algorithm to reduce the redundancy of the data and the space for storage. Data compression is divided into lossy compression and lossless compression.
9. Causal analysis
Causal analysis is the use of the cause and effect of the development and change of things to predict the method, the use of causal analysis of market forecasting, mainly using regression analysis, in addition to the calculation of the economic model and the output of the input and output analysis is also more commonly used methods.