Current location - Loan Platform Complete Network - Big data management - What process should be followed for big data processing?
What process should be followed for big data processing?

The data governance process is the process of disorder to order from data planning, data collection, data storage and management to the data application of the entire process, but also the construction of standardized processes.

Based on the characteristics of each process, we can summarize the data governance process into four words, namely, "management", "collection", "storage", "use".

1. Reasoning: sorting out business processes and planning data resources

For enterprises, daily real-time data will be more than TB level, which data need to be collected from users, where to put so much data, how to put it, and in what way to put it?

These issues need to be planned in advance, and there needs to be a set of processes from disorganization to order, this process requires cross-departmental collaboration, including the participation of front-end, back-end, data engineers, data analysts, project managers and other roles.

2. Acquisition: ETL acquisition, de-duplication, desensitization, transformation, correlation, removal of outliers

The front and back ends will collect the data to the data department, the data department through the ETL tool will be from the source of the data through the process of extracting (extract), transforming (transform), loading (load) to the end of the process of the destination, with the purpose of the scattered and fragmented The purpose is to centralize the storage of scattered and fragmented data.

3. Storage: High-performance storage and management of big data

Where does so much business data exist? This requires a high-performance big data storage system, in which the data will be categorized into its corresponding library, for subsequent management and use to provide maximum convenience.

4. Use: instant query, report monitoring, intelligent analysis, model prediction

The ultimate purpose of the data is to assist the business decision-making, the previous processes are for the final query, analysis, monitoring to make the pad.

This stage is the home of the data analysts, who use these standardized data to make instant queries, the establishment of indicator systems and reporting systems, the analysis of business issues, and even model predictions.