(1) Structured data, simply put, is the database. It is easier to understand when combined with typical scenarios, such as enterprise ERP and financial systems; medical HIS databases; educational one-card; government administrative approvals; and other core databases. What storage solutions are needed for these applications? Basically, it includes high-speed storage application needs, data backup needs, data **** enjoyment needs and data disaster recovery needs.
(2) unstructured database refers to its field length variable, and each field of the record can be repeated or non-repeatable sub-field composition of the database, with which you can not only deal with structured data (such as numbers, symbols, and other information) and more suitable for dealing with unstructured data (full-text text, images, sound, film, television and hypermedia and other information).
(3) Data cleansing refers to the final procedure of finding and correcting recognizable errors in the data file, including checking data consistency, dealing with invalid values and missing values, and so on. Unlike questionnaire review, post-entry data cleaning is generally done by computer rather than manually.
Principle of data cleaning
Data cleaning (data cleaning), simply put, is to remove errors and inconsistencies from the data source, that is, the use of relevant techniques such as mathematical statistics, data mining or predefined cleaning rules, etc., from the data to detect and eliminate erroneous data, incomplete data and duplicated data, etc., so as to improve the quality of data. The development of business knowledge and cleaning rules depends to a considerable extent on the accumulation and comprehensive judgment of auditors. Therefore, auditors should evaluate the quality of audit data according to the following criteria.
(i) Accuracy: the degree of agreement between the data values and the values assumed to be correct.
(ii) Completeness: the degree to which no values are missing from attributes that require values.
(iii) Consistency: the degree to which data satisfy a set of constraints.
(iv) Uniqueness: the uniqueness of a data record (and code value).
(v) Validity: the degree to which the data are maintained with sufficient rigor to meet the acceptance requirements of the classification guidelines.