The data in the data center is not simply accumulated. The accumulation of raw data generated by various systems leads to very high usage costs. This type of data can only be used in certain departments with very good data technology foundations. , and there are often problems with different names and calibers, which makes the entire enterprise data unable to be truly used. The data center data system is based on the original data of the whole domain and carries out standard definition hierarchical modeling. The final result of the data system construction is a complete, standardized and accurate data system that can easily support data applications
The unified data warehouse layer stands from the business perspective, regardless of business system processes, and reorganizes data from the perspective of business integrity. The goal of the unified data warehouse layer is to build an enterprise data system that covers the entire domain and history. This data system can be used to restore the business operation status of the enterprise at any time. As long as this goal can be achieved, any modeling method including paradigm modeling, dimensional modeling, and entity modeling is acceptable
Features:
Concept:
Dimensions are the basis of dimensional modeling, and the core is to determine dimension attributes (query constraints-sql where conditions, grouping sql group statements and the basic source of report label generation);
The dimension table is relatively wide , is a flat non-standard table that contains a large number of fine-grained text attributes; the dimension table design process is as follows:
To build labels, you must first know which type of object to build labels for, that is, determine the object.
After summarizing the experience in building multiple label systems in multiple industries, objects can be divided into three categories: "people", "things" and "relationships". Among them, "person" includes natural persons, groups of natural persons, legal persons, groups of legal persons, etc., such as consumers, consumer associations, e-commerce companies, and e-commerce enterprise federations. These are subjects that can initiate actions on their own initiative. "Things" include items, objects, collections of items, etc., such as commodities, warehouses, etc., which are the objects given in actions. Relationship refers to a certain behavior, association, relationship between people, things, people and things, people and people, things and things at a certain time and moment, including behavioral relationships, belonging relationships, thinking relationships and other strong and weak relationships, such as Shopping, shipping, chatting, supervising and more. Therefore, this object recognition method can be used to map all things and relationships in the real world to the corresponding object classification one by one in this way.
The core meaning of the category system: to help users quickly find and manage data/tags
Root directory: people, objects, relationships
The essence of tags: one The measurement and description of entity objects in the objective world are the product of rigorous logical analysis and processing to guide the application value of data; data must be converted into labels that can help improve business to be valuable, otherwise it will be a data burden. ;The core link that the big data industry has been trying to explore is the commercial realization of data;
Labeling: The process of refining data and converting it into labels is called labeling
The two major aspects of label design Prerequisites:
Tags must be business needs, can reflect business value, help business personnel make business judgments, or can creatively awaken new business scenarios;
Required It is necessary to find out whether the labels refined and sorted out according to business needs have data feasibility, and whether there is original data that can be used to process them into labels. There should be no arbitrary and no landing point;
Some things that are easily confused in label design Concept:
Tag root directory: the object of the tag (person, object, relationship)
Tag category: splitting the object and the angle, level or process of the object
Tag: field-level description of the specific attributes, characteristics, information, and content of the object
Tag value: specific value of the object's attributes, characteristics, information, and content
Label design content, two categories are as follows:
Label fusion table organization method:
Vertical table: similar to K-V table, each row is a label for the object, eg, ID, label name, label Value
Horizontal table: ordinary two-dimensional table, each row represents an object and contains multiple labels
Comparison between vertical table and horizontal label:
Model stability: The vertical table is relatively stable. Adding new tags only requires adding records without modifying the model structure; the horizontal table is unstable. Only adding or modifying tag metadata will involve modification of the model;
Ease of use: Horizontal table Tables are easier to understand, and most data processing technologies are oriented to two-dimensional tables, which are highly usable; vertical tables are suitable for single-value queries, complex calculations are inconvenient, and usability is poor;
Performance: horizontal tables Adding labels only adds columns, the number of rows is the same as the number of objects, and the performance is relatively good; every time a label is added to the vertical table, one row is added corresponding to all objects, which is difficult to process;
The application data layer is built on The simple data assembly layer above the unified data warehouse layer and label data layer is not built independently for a specific business like the data mart. The construction and improvement of the application data layer is based on multiple similar business scenarios at the enterprise level. , has the characteristics of flexible and responsive data mart; there are no very standardized construction standards.