Data Mining - DEA (Exploratory Data Analysis)

? Exploratory data analysis refers to the exploration of data that already exists with as few a priori assumptions as possible, which can be done by means of mapping, tabulation, mathematical fitting, and calculating feature quantities to explore the structure and patterns of the data. Especially for the big data era now facing a variety of messy "dirty data", usually so that we do not know where to start to understand the data set at hand, this time exploratory analysis is very effective.

Discrete: Discrete data is the equivalent of categorical data, such as the number of students in a class, the result of a dice roll, gender, race, etc.

Continuous: Discrete data is the equivalent of categorical data.

Continuous: that is, inside the value field is continuous values, this kind of variable is generally ordered, such as height (can be any value within the range of human height), the length of the leaf, the weight of the dog, etc..

1. Getting the most out of your intuition about the data

2. Uncovering the underlying structure

3. Extracting important variables

4. Removing outliers

5. Testing potential hypotheses

6. Creating a preliminary model

7. Deciding on the optimal factor settings

1. What are the typical values (mean, median)?

2. What is the uncertainty of the typical value?

3. What is a good distributional fit for a set of data?

4. What is the quantile of the data?

5. Is an engineering modification useful?

6. Does a factor have an effect?

7. What is the most important factor?

8. Are measurements from different laboratories equal?

9. What is the best function to correlate a response variable with a set of factor variables?

10, What is the best factorial setting?

11, Can we separate signal from noise in time-correlated data?

12, Can we extract any structure from multivariate data?

13, Are there outliers in the data?

Reference:

/fjssharpsword/article/details/79152012

/a358463121/article/details/55003356

Write in the words of the study after the first contact with the knowledge of data mining, first time writing a web article, the layout is a bit messy (embarrassing), I hope I can learn and make good friends in this data mining course organized by datawhale.

How many people are infected with the new coronavirus in Huaihua?

Do you want to apply for Telecom broadband? Which one did you guys choose?

Qingdao company (business) registration information how to query?

Where did the people from Xinhua San go

Table tennis national second team list

What website to check the number of transactions and price of the house

Attractive headlines about the digital economy?

Unemployment claims Zheli Office shows the audit through, why WeChat shows the audit does not pass

How about Changsha Talent Big Data Co.

How about Shanghai Maixian E-Commerce Co., Ltd.?