With the large-scale application of different management information systems (MIS) in different departments of the enterprise and the enterprise's data management constantly put forward new requirements, not only requires the realization of the traditional on-line transaction processing, but also more and more requirements for a variety of application systems can be accumulated in the enterprise as well as from the outside of the enterprise to obtain the basis of the rich information resources.
1, what is a data warehouse
The data warehouse is subject-oriented, integrated, with the characteristics of time, stable data collection, to support the decision-making process in business management. A data warehouse provides users with current and historical data for decision support, which is difficult or impossible to obtain in traditional operational databases.
Topic-oriented means that the data in the data warehouse is organized according to a certain topic domain. A topic is an abstract concept that refers to the key aspects that users care about when making decisions using a data warehouse, and a topic is usually associated with multiple operational information systems. Integrated means that the data in the data warehouse is extracted and cleaned on the basis of the original dispersed database data through systematic processing, aggregation, and organization of the obtained, must eliminate inconsistencies in the source data to ensure that the information within the data warehouse is about the entire enterprise consistent global information.
The architecture of the data warehouse is divided into data sources, data conversion, data warehouse, data mart and user parts. Data sources, including internal business data, legacy data, other business system data and related WEB data, etc.; data conversion is an important link in the construction of the data warehouse, mainly for the extraction of a variety of complex data sources, conversion, loading and other processing, as well as the realization of the data quality tracking and monitoring and metadata extraction and creation; data warehouse is mainly to achieve a variety of data organization, storage and management, etc.; data mart is a data mart, which is the most important part of the data warehouse, and is the most important part of the data warehouse. The data warehouse mainly implements the organization, storage and management of various data; data mart is a data warehouse system designed separately for different businesses, i.e., developers customize special data warehouse subsystems for different user groups within the enterprise. The user part, that is, the specific user-oriented application part, mainly refers to the data warehouse access and retrieval for the user to access the data warehouse or data mart function, which analysis and reporting for the user to use the data warehouse provides a set of tools to help users of the data warehouse or data marts for online analysis or data mining and so on.
2, data warehouse construction methods
2.1 Ordinary data warehouse construction methods. For the construction of ordinary data warehouse, enterprises in the construction of the entire system on the basis of a variety of factors, the implementation of the entire project in stages, step by step implementation, can be built in each stage of the construction of the basis of phased incorporation of different business systems, and gradually establish a comprehensive, the topic of the more complete, suitable for the department, the use of sub-units of the complete data warehouse system, so as to make the investment as soon as possible to get the The data warehouse system can be used by the departments and sub-units to make the investment profitable as soon as possible.
In the construction process of the data warehouse, the use of fuzzy mathematics can realize the semantic representation of data in the data warehouse, enrich the means of data processing, improve the ability to analyze and process. The construction of the data warehouse, generally take the first construction of data marts, and finally the integration of various data marts together to form a progressive model of the data warehouse; through the conceptual layer, the logical layer, the physical layer modeling, to determine the data marts of the relevant subject domains and its on-line analysis and processing. Constructing a data warehouse model is generally used as follows:
2.1.1 Star model: the star model is the most commonly used data warehouse design structure of the realization of the model. So that the data warehouse forms an integrated system that provides users with analytical service objects. The core of the model is the fact table, around the fact table is the dimension table. Various different dimension tables are connected through the fact table, and each dimension table is connected to the central fact table. [page] 2.1.2 Star System Model (also known as Snowflake Model): The Snowflake Model further standardizes the dimension tables in the Star Model. It is also an extension of the star model, where each dimension can be connected outward to multiple detailed category tables. In practice, the user's needs are varied, and the data source may be multiple fact tables, so multiple fact tables can be used to ****exist, and the galaxy model, also known as fact constellations, is related to each other through a common dimension table.
2.1.3 Coexistence of Atomic and Aggregate Data Models: Adhere to the coexistence of atomic and aggregate data models, and refine the atomic data as much as possible.
2.1.4 Setting up proxy keys: A proxy key is some field in a dimension table that has no business meaning, but is simply a number that is created when the data warehouse is loaded into the program.
2.2 Spatial data warehouse construction methods. With the GIS (geographic information system) in various industries in a wide range of applications, initially oriented to the transaction-oriented spatial database information system can not meet the needs of the information system began to shift from management to decision-making processing, spatial data warehouse is to meet this new demand for spatial information integration system proposed. Especially in the geographic information decision support system, spatial data warehouse system is particularly important.
The spatial data warehouse has the general characteristics of an ordinary data warehouse, but it has some specificity of its own. And spatial data warehouse is not a simple collection of spatial database. With the spatial database than, spatial data warehouse in addition to supporting databases, but also support data files, text files, applications and many other data sources; in addition to spatial data warehouse data in the time data, spatial data, attribute data and heterogeneous data and other kinds of data; secondly, spatial data warehouse also includes data processing rules, algorithms, etc.; again spatial data warehouse data is the original data processing, processing, integration and other transformations, it is the value-added and unification of data; spatial database also introduces the concept of time longitudinal, which is to manage data based on time, and it can intercept information on different time scales, from transient to regional time until the whole, spatial data warehouse is dependent on the time dimension of the data structure, which can be classified into different temporal granularity levels according to different needs in order to carry out a variety of complex trend analysis. Of course, it goes without saying that it also contains orientation data in the spatial dimension. Because of the difference between spatial data warehouse and common data warehouse, and it is not the same concept of spatial data warehouse at all, general spatial data warehouse is divided into four major functional modules with the following architecture, which are source data, data transformation tools, spatial data warehouse, and client-side analysis tools. Source data not only refers to those common spatial databases, but also includes files, web pages, knowledge bases, legacy systems and other data sources. Data transformation tools and have the same extraction and conversion functions as the ordinary data warehouse data transformation, but it also includes the unique spatial transformation and so on. Spatial data warehouse to three-dimensional, multi-dimensional way to organize and display data. But the most basic spatial and temporal dimensions are the basis for its reflection of the dynamic changes in the objective world, spatial data warehouse technology is the most critical point of time and spatial dimensions of data organization. At present, spatial data warehouse has become the country, inside and outside the GIS (geographic information system) research hot spots and has made great progress. To integrate spatial information into the existing data warehouse of the enterprise, under the premise that the original system will not be greatly changed, three modes are generally used to build the enterprise spatial data warehouse: (1) introduce spatial information as the spatial dimension in the multidimensional model; (2) introduce spatial information as the research theme; (3) include spatial information in both the dimension and dimension metrics. Therefore, it is impractical to compute and store all spatial metrics. A spatial index tree (e.g., R-tree) is generally used to construct grouping hierarchies at the finest spatial granularity as a hierarchy of spatial dimensions, and a spatial index tree is required for each spatial dimension.
3. Conclusion
In short, data warehouse construction is the key to data warehouse technology, data warehouse technology is a comprehensive technology and solutions based on data management and utilization, especially now spatial data warehousing in the GIS in a wide range of applications, it has become a new round of growth in the database market, and at the same time, become an important part of the next generation of information systems.