Current location - Loan Platform Complete Network - Big data management - Multidimensional big data extraction
Multidimensional big data extraction
Two-dimensional data analysis: 1 introduce

The concept of online knowledge processing (OLAP) was first put forward by E.F.todd, the father of relational database, in 1993. OLAP is an online data access and analysis for specific problems. Through fast, stable, consistent and interactive access to various possible forms of information observation (dimensional data), management decision makers are allowed to observe the data in depth. The goal of OLAP is to meet the specific query and report requirements of decision support or multidimensional environment, and its technical core is the concept of "dimension", so OLAP can also be said to be a collection of multidimensional data analysis tools.

Second, the multidimensional data structure of OLAP

The data distribution in multidimensional space is always sparse and uneven. At the location of the event, data is gathered and its density is very high. Therefore, the developers of OLAP system should strive to solve the problems of data sparsity and data aggregation in multidimensional data space. In fact, there are many ways to construct multidimensional data.

Hypercube structure

Hypercube structure refers to describing an object with three or more dimensions, each of which is perpendicular to each other. The measured values of data appear at the intersection of dimensions, and all parts of data space have the same dimension attributes.

This structure can be applied to multidimensional databases and OLAP systems oriented to relational databases, and its main feature is to simplify the operation of end users. There is a deformation of hypercube structure, that is, shrinking hypercube structure. This structure has higher data density and fewer data dimensions, and additional analysis dimensions can be added.

(B) Multi-cubic structure

In the cube structure, large data structures are divided into multiple multidimensional structures. These multidimensional structures are subsets of big data dimensions, which are divided into dimensions for specific applications, that is, the hypercube structure is changed into subcube structure. It has strong flexibility and improves the efficiency of data analysis.

Generally speaking, the multi-cube structure is more flexible, but the hypercube structure is easier to understand. Hypercube structure can provide high-level reports and multidimensional views. Multicube structure has good view turnover and flexibility. Multicube structure is a more effective method to store sparse matrix, which can reduce the amount of calculation. Therefore, complex systems and pre-established general applications tend to use multi-cube structure, which can better adjust the data structure to meet the needs of common applications.

Many products combine the above two structures. Their data physical structure is a multi-cube structure, but they use the hypercube structure to calculate, which combines the simplicity of the hypercube structure with the rotational storage characteristics of the hypercube structure.

Third, OLAP multidimensional data analysis

Multidimensional data analysis refers to slicing, dicing, rotating and drilling the data organized in multidimensional form, so as to analyze the data and enable the end users to observe the data in the data warehouse from multiple angles and sides, so as to deeply understand the information and connotation contained in the data. Multidimensional analysis caters to people's thinking mode because:

(1) slice

Definition 1: the action of selecting a one-dimensional member in a dimension of a multidimensional array becomes a slice, that is, selecting dimension I from a multidimensional array (dimension 1, dimension 2, …, dimension n, variable), and taking its one-dimensional member (set as "dimension member vi") to get a subset of the multidimensional array (dimension/kloc-0).

According to the definition of 1, a slice must be the original dimension minus 1. Therefore, the obtained slice is not necessarily a two-dimensional "plane", and its dimension depends on the dimension of the original multidimensional data, so the definition of slice is not easy to understand. There is also an intuitive definition.

Definition 2: The action of selecting a two-dimensional subset of a multidimensional array is called slicing, that is, selecting a multidimensional array (dimension 1, dimension 2, …, dimension n). Variable): dimension I and dimension J. If you take an interval or any dimension member from these two dimensions and all the other dimensions are taken as one dimension member, you will get a two-dimensional subset of the multidimensional arrays in dimension I and dimension J. This two-dimensional subset is called the slice of the multidimensional arrays in dimension I and Dimension J, and is expressed as (dimension I and Dimension J, variable).

According to definition 2, no matter how many dimensions there are, the result of data slicing must be a two-dimensional "plane". From another point of view, slicing is to select a dimension member in one or some dimensions, and select a certain interval of dimension members or umbrella dimension members in some two dimensions. As can be seen from definition 2:

The slice of 1. multidimensional array is ultimately determined by the member values of other dimensions in the array except the two dimensions of the plane where the slice is located.

2. Dimension is the angle of observing data, so the function or result of slicing is to abandon some observation angles, so that people can concentrate on observing data in two dimensions, because people's blank imagination is limited. Therefore, for multidimensional data space with multiple dimensions, it is very meaningful to slice the data. Compared with the definition 1, we can relate these two slice definitions. For an n-dimensional array, the result of the n-2 slice according to the definition 1 must correspond to the result of the slice according to the definition 2.

(2) dicing

Definition 1: The action of selecting dimension members of an interval in a certain dimension of a multidimensional array is called slicing, that is, defining the value interval of the multidimensional array in a certain dimension. Obviously, if you select only one dimension member in this interval, you will get a slice.

Definition 2: The action of selecting a three-dimensional subset of a multidimensional array is called dicing, that is, selecting three dimensions (dimension 1, dimension 2, …, dimension n, and variable) in a multidimensional array: dimension I, dimension J, and dimension R. Taking an interval or any dimension member from these three dimensions and taking a dimension member from all other dimensions at the same time, you will get the multidimensional array in dimensions I, J, and R.

(3) Rotation

Rotation not only changes the dimension direction of a report or page, for example, rotation may include exchanging rows and columns; Move the row dimension to the column dimension, or exchange the dimension in the page display with the dimension outside the page (make it a new row or column).

(4) Drilling

Drilling processing is to let users get more detailed data through navigation information in multi-layer data of data warehouse, and drilling is generally downward. Most OLAP tools allow users to drill into the data layer with better details in the data set, while more complete tools allow users to drill anywhere, that is, in addition to general downward drilling, drilling anywhere also includes upward drilling and cross drilling.

(5) Multi-view mode

It is found that getting the same information, the visualization brought by graphic display is sometimes not provided by simple data tables. OLAP system should display data in various formats, so that users can obtain the perfect angle of observed data.

Fourth, conclusion.

With the development of data warehouse, OLAP has also developed rapidly. Data warehouse focuses on the storage and management of decision-oriented data, while OLAP focuses on data analysis in data warehouse and transforms it into auxiliary decision-making information. An important feature of OLAP is multidimensional data analysis, which forms a mutual combination and complementary relationship with multidimensional data organization of data warehouse, and will help us solve complex problems in data processing.