1 Improve efficiency through data preprocessing, because of the preprocessing, so there will be redundant data
2 If you do not layer and the business rules of the business system changes, it will affect the entire process of data cleansing, the workload is huge
3 through the management of the layered to achieve step-by-step completion of the work, so that the processing of each layer of the logic is simple
Standard data warehouse tiering: ods (temporary storage layer), pdw (data warehouse layer), mid (data mart layer), app (application layer)
ods: the historical storage layer, which is isomorphic to the source system data, and the data granularity of this layer is the most fine-grained, this layer of tables are divided into two kinds of data, one is to store data that needs to be loaded at the current time, and the other is to store data after processing. for storing data after processing.
pdw: data warehouse layer, its data is clean data, it is consistent and accurate, that is, cleaned data, its data generally follow the third paradigm of the database, the data granularity is the same as the granularity of the ods, it will save all the historical data in the bi system
mid: data bazaar layer, it is subject-oriented organization of the data, usually the star and snowflake Data, from the data granularity will be, it is light summary level data, there is no longer a breakdown of the data, in terms of breadth, it contains all the business quantities. From an analytical point of view, it is probably the last few years
app: application layer, data granularity highly aggregated, but not necessarily cover all business data, just a subset of the mid layer data.
The purpose of the data warehouse is to build an analytics-oriented integrated data environment to provide decision support for the enterprise. The context of the data warehouse can also be understood as: data source, data warehouse, data application
Data warehouse can be understood as a platform for the management of intermediate integrated data
etl (extract extra, transform transfer, load load) is the pipeline of the data warehouse, which can also be considered as the blood of the data warehouse.
Data warehouse storage does not need to store all the raw data, because, for example, you store lengthy text data is completely unnecessary, but need to store the details of the data, because the demand is variable, and data warehouse is imported data must be sorted out and transformed so that it is theme-oriented, because the foreground database data is based on the otp operation of the organization of the optimization of the data, which may not be suitable for doing the analysis, the analysis. Theme-oriented organization is conducive to analysis.
Multidimensional data model is to say that you can multi-dimensional cross-query and segmentation, the application is generally based on online analytical process (online analytical process OLAP), for specific needs of the group's data mart will be based on the construction of a multi-dimensional data model
And the report display is to aggregate data and multi-dimensional analysis of the data displayed in the report to provide simple and intuitive data. Provide simple and intuitive data.
Metadata, also known as interpretive data, or data dictionary, records the definition of the models in the data warehouse, the mapping relationships between the various levels, and monitors the state of the data in the data warehouse and the status of the etl tasks. Metadata is generally stored and managed centrally through a metadata repository.