Current location - Loan Platform Complete Network - Big data management - Who knows to visualize big data
Who knows to visualize big data

Big data, a collection of data that cannot be captured, managed and processed within a certain timeframe using conventional software tools, is a massive, high-growth, and diverse information asset that requires a new processing model in order to have stronger decision-making, insight discovery, and process optimization capabilities.

Why is Big Data created? Why use big data? Here to give you another common explanation:

In the beginning, the era of very little data, through the table tools, mysql and other relational databases (two-dimensional table database, data inserted row by row) will be able to solve the problem of data storage.

But with the rapid development of the Internet, the proliferation of products as well as users has generated massive amounts of data. Considering the long-term development, the company will analyze the products, user-related native data, buried data, etc., the traditional relational database will not be able to meet the needs, only through the rows, distributed and other databases to store these data (HBASE, hive, etc., to achieve clustering, and assigned to multiple hosts at the same time computing).

Awareness of data visualization

After the data, the analysis of data is the most critical link, the massive amount of data to allow users to view through the article-by-article is not feasible, the image is an effective way to solve the problem. A small amount of data can be generated through the table tool to generate charts, tables, tables of view for analysis, but the analysis of big data will need to rely on specialized visualization tools, common visualization tools include: Tableau, BDP, Davinci, QuickBI, there are a number of other.

Most of the commercial data visualization tools, although the calculation, charts and graphs show more powerful, but can not do real-time data generated quickly, the data is also mostly push (fixed range) of the way, and sometimes the data also need to be secondary processed to meet the rules of the visualization products (commercial products are more considerations of universality, and can not be applied to the data specifications of all enterprises).

In addition to this, many chart plug-ins are now open source (e.g., Echart, GoogleChart), as well as the industry's consideration of data security and other considerations, more and more companies are also beginning to carry out the deployment of data visualization of the privatization.

Data Visualization Implementation

The structural framework of a data visualization product (system) is mainly divided into three layers: data storage layer, data calculation layer, and data display layer.

1. Data Storage Layer

The data storage layer has been mentioned in the beginning. In the data visualization product (system), it supports the visualization of both regular data (MySQL, CSV, etc.) and big data (hive, HBASE, etc.) to meet the qualitative and quantitative analysis of daily analysts.

In consideration of data security factors, data storage will also be combined with rights management to realize that personnel in different roles can only access the specified data (future opportunities to share).

2. Data Calculation Layer

The calculation here is not the usual aggregation, sorting, grouping and other calculations, let's understand the workflow of data analysis before explaining:

Products/operations personnel to put forward the data requirements, such as "APP one week retention";

The analyst confirms the demand and needs to specify the current data. Analysts need to confirm the demand for clear analysis of the need for the fields and analysis mode;

Count warehouse personnel to provide collated tables (data model, multiple tables join synthesized intermediate table);

Analysts based on the data model for visual analysis.

The data model provided by the data warehouse is mainly divided into incremental, full-volume data, can not be directly analyzed for a longer range of data, as an example, January 1, January 2, two days have generated data, incremental, full-volume data storage method is as follows:

The above example of the "APP weekly retention Analysts have a lot of tasks every day, and a lot of the basic calculations (such as every day's next day retention) can be automated by the computer, which relies on the scheduling function (which you can understand as a tool that automatically runs formulas).

Through the above, we can get multi-table correlation, timing calculation is the main function of the calculation layer.

3. Data presentation layer

The data presentation layer is divided into two parts:

One part is the visualization of the watchers, which include: products, operations, senior executives and so on. According to the requirements of the demand side, the data will be presented with suitable charts, for example, trend-related with line charts, data details with tables, retention with funnel charts

The other part is the visualization of the chart maker, who is mainly an analyst. Let analysts use visualization to replace as many SQL statements as possible. In common visualization tools, you can quickly drag and drop fields from the data model into dimensions/metrics (which can be interpreted as X and Y axes).

Through the visualization product (system) structure learning, it is not difficult to see that the operation process to achieve data visualization includes: data connection (storage), making data model (calculation), making charts (display).

How to realize big data visualization system. Samsonite Magic Big Data Analytics Platform says that correct and appropriate visualization makes storytelling simple. It also connects the generation gap between languages and cultures from complex, boring data sets. So don't just show data, but tell stories with it.