Data quality management refers to a series of management activities such as identifying, detecting, measuring, early warning, and dealing with all kinds of data quality problems that may occur throughout the entire life cycle of the data, including data generation, acquisition, storage, ****enjoyment, maintenance, and application.
The purpose of data quality management is to provide a solid and reliable data foundation for the enterprise by improving the completeness, accuracy, and authenticity of the data, enhancing the value of the data, and playing a positive and effective role in the enterprise's day-to-day operation, precision marketing, management decision-making, risk management, and other fields.
Evaluation dimensions of data quality
How to judge the strengths and weaknesses of data quality? What are the dimensions from which data quality can be assessed? In practice, we believe that it can generally be assessed through data quality assessment dimensions. Data quality assessment dimensions are one of the characteristics of data quality, they provide a way to measure and manage the quality of data and standards. In a specific data quality project, the data quality dimensions that are most applicable to the business needs are selected for measurement to evaluate the quality of the data.
In GB/T36344-Information Technology Data Quality Evaluation Indicators, the National Standardization Administration Committee specifies the framework of data quality evaluation indicators.
Normality: the extent to which data conforms to data standards, data models, business rules, metadata, or authoritative reference data.
Integrity: the degree to which data elements are assigned values as required by data rules.
Accuracy: the degree to which the data accurately represents the true value of the real entity (actual object) it describes.
Consistency: the degree to which data does not contradict data used in other specific contexts.
Timeliness: the degree to which the data is correct over time.
Accessibility: the degree to which data can be accessed.
The International Data Management Association (DAMA) puts forward its data quality assessment framework in its DAMA Guide to the Data Management Body of Knowledge:
There are certain differences in national standards and international practices for data quality assessment indicators, and enterprises should build appropriate data quality assessment systems, dimensions, and indicators based on their own business practices and under internal management requirements.
Data quality is a key factor in the success of a company's business.
Causes of data quality problems
The consequences of data quality problems are obvious, so what are the root causes of data quality problems? The main factors affecting data quality are technical, business and management aspects, the following from these three aspects to analyze the causes.
Technical
There are data quality problems in the data source, for example: some data are collected from the production system, in the production system, these data are duplicated, incomplete, inaccurate and so on, and the collection process has not been cleaned up to deal with these problems, which is also relatively common.
The quality of the data collection process, such as: the collection parameters and process settings are not correct, the data collection interface is inefficient, resulting in data collection failures, data loss, data mapping and conversion failures.
Problems in the data transmission process, such as problems with the data interface itself, misconfigured parameters, and unreliable networks, can cause data quality problems in the data transmission process.
Problems in the data loading process, such as: data cleaning, conversion, loading rules configuration problems.
Data storage quality issues, such as: storage design is unreasonable, storage capacity is limited, artificial background adjustment of data, caused by data loss, data invalid, data distortion, record duplication.
The existence of data silos in the business system, chimney construction, and the serious problem of data inconsistency between systems.
Operational aspects
The business side of the data entry is not standardized, some common data entry problems, such as case, full and half-corners, units, etc.. When the business side of the input, the system does not embed the relevant data checking rules, resulting in entry by human factors, such as the amount of the contract should be entered, 100,000 yuan, 100,000 yuan, picking up ten thousand yuan and so on. The quality of manually entered data is closely related to the recording of data business personnel, recording data people work rigorously, seriously, the quality of data is relatively good, and vice versa is poor.
Management
Enterprise management thinking level does not recognize the importance of data quality, heavy system and light data, that the system is omnipotent, the data stored in the system should be of high quality.
There is no clear data management system within the enterprise, and there is no corresponding management department, so you can't find the person in charge of the data quality problems.
The data entry specification is not uniform, the same business departments in the processing of the same business, because the specification is not uniform, the human factor caused by data conflicts or contradictions.
Lack of top-to-bottom data planning, failure to set appropriate data quality management goals, and failure to formulate data quality-related policies, management and assessment systems.
Lack of effective data quality problem handling mechanism, from the discovery of data quality problems, assignment, processing, optimization, there is no unified process and system support, data quality problems can not do closed-loop management and assessment.
Data quality management solutions
In response to the above analysis of the causes of data quality problems from the technical, business, and management perspectives, it is necessary to carry out data quality monitoring to continuously improve data quality in the three areas of prevention and control, process monitoring, and supervision and management after the event.
Pre-control prevention
Establish data standards covering all business topics within the enterprise, and standardize the definition of indicators, the caliber of indicators, and the entry specifications covering each business field. For manually entered data, as far as possible, the use of non-open input means, such as drop-down menus, single checkboxes, time controls, labels (support for custom learning type), etc., must be open to the input part, to carry out the necessary timely calibration. In addition, for system reasons caused by data quality problems, we need to establish a data standards system, for the production system can be transformed, under the guidance of the data standards for transformation, for the system can not be transformed, through a number of technical means for cleaning and conversion, in the data generated by the link to control the quality of the data, so that the efficiency is bound to be the highest.
The establishment of internal data recognition system, data quality management department, the development of data quality monitoring process and assessment methods also help to improve the data quality of the prevention of ex-ante control mechanism.
In-the-fact process monitoring
In-the-fact data quality control, that is, in the process of data maintenance and use to monitor and deal with data quality. Through the establishment of a process-oriented control system for data quality, process-oriented control of each link of data creation, change, collection, processing, loading, application and so on. In this process, you can use the relevant modules in the data quality management tool to monitor the data quality of each node of the data flow, real-time early warning of data quality, data quality control from the source end of the data, support for automated verification of the system and manual review of the combination of management. In this process, the data quality management tool can also be embedded in the enterprise's data quality problem handling mechanism related to the process and approval flow, effectively assisting and monitoring data quality.
Supervision and management after the fact
For the data that has been stored in the data warehouse, quality issues have been identified, and it is time to use data quality control tools. When the data warehouse or data center is established, the key fields are named, formatted, and precision are unified according to data standards to exclude ambiguity in the data. According to the data standards, the data quality management tool to establish the appropriate rules model, for the import of historical data, you can run the rules model to find data quality issues, and in the platform for the whole process of tracking data quality issues.
Conclusion
Data quality management is an important part of enterprise data governance, and all the work of enterprise data governance is centered on the goal of improving data quality. To do a good job of data quality management, you should seize the key factors affecting data quality, set up quality management points or quality control points, and start from the source of the data to fundamentally solve the data quality problem.
The problem of data quality is an urgent issue for many organizations, and it's time for data governance. The improvement of data quality is not a quick fix, do a data rectification can solve all the data quality problems. For existing data, through the data quality management tools for calibration and cleaning, in addition to the need for data standards and data quality to establish a sound data quality control system, monitoring in all aspects of the regular inspection of data quality, to determine the solution and improve, and continuously improve the quality of data.