Current location - Loan Platform Complete Network - Big data management - Data Warehouse Modeling - Dimensions vs Relationships
Data Warehouse Modeling - Dimensions vs Relationships

Data management has been evolving from early spreadsheets, spiderweb systems to architectural data warehouses. Development to date to dimensional modeling and relational modeling, and with the development of the Internet, data from GB to PB framing, enterprise business iterative update is also changing rapidly, the preference for dimensional models gradually have a unified Internet number of warehouse modeling standards trend.

The number warehouse model is not divided into high and low, are a perspective to observe reality. Dimensional models take the transactions/realities that occur between entities and entities as the entry point, while relational modeling organizes data in terms of the relationships between entities and entities. In the current environment, the Internet prefers dimensional modeling, while traditional industries are more along the lines of relational modeling.

Individuals have experienced financial, Internet number warehouse construction, there are a number of 0 to 1 project experience, for the number warehouse construction is still in the continuous learning. If there are any errors, please point out more exchanges.

The star model, formed with the fact table as the core and multiple dimension tables as arms, is a typical implementation of dimensional modeling.

The fact table records the measurable events that occur in the business process, such as the amount of consumption in the order, the amount of discounts or the amount of inventory, etc. The fact table occupies the main storage in the actual business, such as the order table; and the dimension table is the textual environment related to the business process metrics, describing the "who, what, where, when, how, why", commonly used dimension table, which is the textual environment. "Commonly used dimension tables are date, product, user, address and so on. Generally dimension tables will be redundant information , there are more than 100 columns of dimension tables , such irregularity brings data organization on the simple .

Relational modeling, known as the "entity-relationship" model, exists in a "standardized" way that emphasizes non-redundancy between data to meet the 3NF. During the construction process, data is standardized down to the level of detail, for example, under the topic of users, there will be users with names, users with addresses, and users with other users. For example, under the topic of user, there will be user and name, user and age, user and residential address, etc. In the traditional industry, the mature relationship modeling has ls-ldm model, for the financial industry to form 10 themes.

Dimensional modeling : Data construction from the actual needs, generally oriented to the department/business to form an independent data mart, such an approach brings distinctive features, efficient. However, because of the demand-based start, often lead to frequent demand iteration brings higher maintenance costs, once the business process adjustment, the model may be the risk of starting over.

Relational Modeling : Enterprise-oriented modeling with strong abstraction. The construction of non-redundant data in a 3NF way makes the model highly flexible, but it is not as efficient as dimensional modeling because it is not directly demand-oriented. In addition to enterprise-oriented construction, the cycle is much longer compared to dimensional modeling, but there is an advantage: enterprise data integration is easier.

Within the enterprise, these two modeling approaches often co-exist, the construction of the basic data warehouse using relational modeling, technical elegance in exchange for data streamlining, to ensure a high degree of abstraction, a high degree of consistency, the requirements of the business stability; up the dimensional modeling is more appropriate, biased towards directly facing the business, relying on the redundancy of the data to bring the availability of the query to ensure efficiency. Both advantages complement each other

In the big data environment, data storage and development has changed a lot, once the dimensional modeling and relational modeling in the current scenario have their own shortcomings. How can data warehousing evolve and mature in a big data environment? Inmon et al. proposed the data vault model

data valult is a detail-oriented, historically traceable, and uniquely linked set of normalized tables that can be given to support one or more business functional areas; it is a centered, radial model that is designed to focus on the integration patterns of business keys. The data vault has three basic entities (structures)

In terms of modeling style, it uses a hybrid of a third paradigm approach and a dimensional modeling approach, with a unique combination of the two to meet business needs.