The trend now is that we are entering an era of big data, with data we should analyze? What are the methods of data analysis?
A, descriptive statistics
Descriptive statistics is a summary of statistical methods, revealing the characteristics of the data distribution . It mainly includes data frequency analysis, data concentration trend analysis, data dispersion analysis, data distribution and some basic statistical graphics .
1, missing value filling: commonly used methods are removal, averaging, decision tree method.
2, normality check: many statistical methods require values to obey or close to the normal distribution, so in the data analysis before the need for normality check. Commonly used methods: non-parametric test of the K-volume test, P-P charts, Q-Q charts, W test, dynamic difference method.
Two, regression analysis
Regression analysis is one of the most widely used data analysis methods. According to the observed data to establish the appropriate dependence between variables, analyze the data of the internal law.
1. One-way linear analysis
Only one independent variable x is related to the variable y, x and y must be continuous variables, the variable y or its difference must follow a normal distribution.
2. Multiple Linear Regression Analysis
Conditions for use:To analyze the relationship between multiple independent variables x and y, x and y must be continuous variables, and the variable y or its difference must follow a normal distribution.
3. Logistic regression analysis
Linear regression model requires that the variables are continuously normally distributed, and the independent variables are linearly related to the variables, but logistic regression model does not require the distribution of the variables, and is generally used in the case of discrete variables.
4. Other regression methods: non-linear regression, order regression, Probit regression, weighted regression, etc..
Three, analysis of variance
Conditions for use: a variety of samples must be independent of each other random samples, a variety of samples from the normal distribution of the overall variance of each equal.
1. One-way ANOVA: a test only one influencing factor, or more than one influencing factor, only analyze the relationship between a factor and the response variable.
2. Multifactorial with interaction analysis of variance:An experiment with multiple influences, analyzing the relationship between multiple influences and the response variable while considering the relationship between multiple influences
3. Multifactorial without interaction analysis of variance:Analyzing the relationship between multiple influences and the response variable, but with no or ignored influence relationship between influences
4. .Facilitator's Gap Prayer:Traditional gap analysis has the obvious disadvantage of not being able to control the random factors present in the analysis, which reduces the accuracy of the analysis results. Coordinated gap analysis is mainly to exclude the influence of coordinated variables, the main effect of the revised analysis of variance, combined with linear regression and analysis of variance of the analysis method .