The main idea of data cleaning is to fill in missing values, smooth noise data, smooth or delete abnormal values, and solve data inconsistency? Clean up? Data. If users think that the data is messy, they are unlikely to believe the mining results based on these data, that is, the output results are unreliable.
2. Data integration
Data analysis tasks mostly involve data integration. Data integration combines data from multiple data sources and stores them in a consistent data store, such as a data warehouse. These sources may include multiple databases, data sources, or common files.
3. Data protocol
Data reduction technology can be used to obtain a reduced representation of data sets, which is much smaller, but still closely maintains the integrity of the original data. In this way, mining on the reduced data set will be more effective and produce the same (or almost the same) analysis results.
4. Data conversion
Data transformation includes normalization, discretization and sparseness of data to achieve the purpose of mining.