The process of data governance is a process from data planning, data collection, data storage management to data application from disorder to order, and it is also a process of building a standardized process. According to the characteristics of each process, we can summarize the data governance process into four words, namely "reason", "mining", "storage" and "use".
1. Rationality: combing business processes and planning data resources.
For enterprises, the daily real-time data will exceed the TB level. What data do you need to collect from users? Where, how and how to put so much data?
These problems need to be planned in advance and a set of processes from disorder to order. This process requires cross-departmental cooperation, including front-end, back-end, data engineer, data analyst, project manager and other roles.
2. Acquisition: ETL acquisition, de-duplication, desensitization, conversion, correlation and elimination of abnormal values.
The front-end and back-end will hand over the collected data to the data department, and the data department will extract, transform and load the data from the source end to the destination end through ETL tools to centrally store scattered and messy data.
3. Storage: high-performance storage and management of big data.
Where does so much business data exist? This requires a high-performance big data storage system, which puts data classification into its corresponding database to provide the greatest convenience for subsequent management and use.
4. Uses: instant query, report monitoring, intelligent analysis and model prediction.
The ultimate goal of data is to assist business decision-making, and the previous processes are all paving the way for the final query, analysis and monitoring.
This stage is the home of data analysts, who can use these standardized data for real-time query, establish index system and report system, analyze business problems and even predict models.