Current location - Loan Platform Complete Network - Big data management - Data analysis structure and methodology
Data analysis structure and methodology
Data Analysis Architecture and Methods

I. Data Analysis in the past In today's various types of enterprises, the data analysis position has been basically popularized and recognized, the core task of this position is often to support the operation and marketing, the data within the enterprise, the customer's data to analyze and summarize the formation of the quantitative performance of the past work situation, as well as the customer's behavioral trends or characteristics, and so on.

If the data analysis position is recognized from a more macro perspective, every data analyst understands that the goal of the data analysis position is to discover potential patterns through data, which in turn helps predict the future, which is consistent with the goal of data mining. So why in most companies already have the basis of the data analysis position, but today is still repeatedly mentioned the concept of data mining, we need to look at the data analysis are what is not done content.

1 data dispersion

Most data analysis positions in the company's job setup is attached to a single business unit as a support post, only a few companies are data analysis as an independent department. The difference is that the former data analysis can analyze the content is limited to their own departmental output indicators, such as the complaint department only look at the complaint handling process data, the sales department only look at the sales process data, once involved in the need to analyze the various types of indicators in aggregate, this organizational structure will bring about a great deal of negative impact, due to the different departments have their own departmental indicators of the authority of the export and with the cooperation of other departments does not affect performance. Due to the different departments have their own departmental indicator export authority, and with other departments do not affect the performance of the task, so this cross-departmental data collection process is often very inefficient. The key to data analysis is to bring together more data and more dimensions to find the law, so the previous data analysis is to do the most basic comparative analysis and Pareto analysis, less use of algorithms to mine the data action, because the fewer the indicators and the fewer the dimensions of the algorithm will make the algorithms to play the effect of the worse.

2 indicators of dimensionality

In the past, digital management in the enterprise, more reflected in the daily operation and maintenance work, for the client's data collection, although from a long time ago has been carried out, the birth of the CRM system has been a long time, but has always been the client's data dimensions are very missing, the reason for this is that these ways of obtaining the data for the customer and the enterprise after the interaction to the end of the interaction between the customer and the enterprise. The reason is that the data obtained from these ways are mostly the data between the interaction between the customer and the enterprise and the end of the interaction, but this period of time is only a very small part of the customer's daily life, the customer's behavioral characteristics on microblogging, WeChat, the field of concern or brand, their own personality traits, etc., so it can be said that the real characteristics of a customer, habits, only through the interaction with the enterprise is impossible to know, so it is difficult to dig out the effective conclusions.

3 less use of algorithms

In the above constraints, it can be imagined that data analysts for the use of algorithms is bound to be less, because the data analysis relies on a large number of indicators, dimensions, and the amount of data, without these three conditions is difficult to play the value of algorithms, and in the exclusion of the algorithms, data analysts can only be more for the limited amount of data to do the most simple analysis methods, to reach a simple and easy to understand the conclusion of the analysis, the value of the enterprise can be imagined.

4 data analysis system is weak the current data analysis using excel, part of the data analysts can use to R or SPSS and other software, but when the amount of data to reach the level of TB or PB units, these software will consume a lot of time in computing, while the original database system in the export of data spent time is also quite long, so the analysis of large data volumes of analysis Therefore, for the analysis of large data volume, it is difficult to reach the requirements of the conventional system support.

Second, the technological revolution and data mining

Benefiting from the gradual increase in the impact of the Internet on people's lives, we find that the data is growing wildly. Today, nearly half of a person's day is spent in the Internet, on the one hand, these interactions using the Internet are able to be captured and recorded, on the one hand, due to the use of fragmented time, the opportunity for customers to interact with the enterprise has become more and more frequent, which further guarantees the richness of customer data. At the same time, with the support of big data technologies, today's systems allow for efficient analysis of these massive data volumes.

So data analysts are also able to start using more abstract algorithms to do richer analysis of data. So data analytics has officially entered the era of Data Analytics 2.0, which is the era of data mining.

Three, the data processing process

Data analysis is also the process of data processing, this process is composed of three key links: data collection, data analysis method selection, data analysis theme selection. These three key links show a pyramid shape, in which data collection is the bottom, and data analysis theme selection is the top.

Four, data collection

Data collection is how to record the data link. Two principles that need to be highlighted in this session are full volume rather than sampling, and multidimensional rather than unidimensional. Today's technological revolution and Data Analytics 2.0 is largely about these two dimensions.

1 full rather than sampling due to the speed of system analysis and data export speed constraints, in the non-big data system to support the company, do the data analysis of the staff is rarely able to do a complete full amount of data collection and analysis. In the future this will no longer be a problem.

2Multi-dimensional rather than uni-dimensional on the other hand lies in the dimensionality of the data, which is also mentioned in the previous. In short for customer behavior to achieve a comprehensive refinement of 5W1H, the interaction process of what time, what place, what people, because of what reason, what things to do a comprehensive record, and will be refined for each plate, time can be subdivided from the start time, end time, interruption time, cycle interval time; location can be subdivided from the local city, district, climate, and other geographic features, channels, etc.; people can be be subdivided from multi-channel registered accounts, family members, salary, personal growth stages, etc.; reasons can be subdivided from hobbies, life events, demand hierarchy, etc.; things can be subdivided from themes, steps, quality, efficiency, etc. Through these subdivided dimensions, increase the diversity of analysis, so as to dig the law.

Fifth, the data analysis method selection data analysis method is through what method to combine the data so as to show the law of the link. From the fundamental purpose, the task of data analysis lies in the abstraction of data to form conclusions that have business significance. Because the simple data is meaningless, directly look at the data is no way to find the law, only through the use of analytical methods of data abstract processing, people can see the law hidden behind the data.

Data analysis method selection is the core of the entire data processing process, generally from the analysis of the complexity of the method, I will be divided into three levels, that is, conventional analysis methods, statistical analysis methods and self-built models. The reason I make this distinction is based on two levels of consideration: the level of abstraction and the level of customization.

The degree of abstraction is that some data do not need to be processed, directly into a graphical way to present, will be able to show the business people need to show the significance of the business, but some of the business needs, directly into the data into a graphical way is difficult to see, the need to build a data model, will be more than one indicator or a number of dimensions of an indicator reorganization, and ultimately produce new data to the formation, then the result of the abstraction is the business people need to know. The result of this abstraction is the business conclusions that business people need. Based on this principle, it can be divided into conventional and non-conventional analysis methods.

Then another level is the degree of customization, to today's mathematical development has been a long time, some of the classic analytical methods have been precipitated, they can be generalized in the multi-purpose analytical purposes, applicable to a variety of business conclusions, these analytical methods belong to the general analytical methods, but some of the business needs are really rare, and it requires an analytical method can not be completely based on the general method, so it will form an independent analytical method. Therefore, independent analysis methods are formed, that is, specialized mathematical modeling, the mathematical models formed in this case are customized specifically for this business topic, and therefore cannot be applied to multiple topics, this type of analysis methods belong to the highly customized, and therefore, based on this principle, non-conventional analysis methods are subdivided into statistical analysis methods and self-built models category.

1 Conventional Analysis MethodsConventional analysis methods do not do abstract processing of data, mainly directly present raw data, mostly used for fixed indicators, and periodic analysis of the theme. Directly through the raw data to present the business sense, mainly through trend analysis and percentage analysis to present, its analysis method corresponds to the same ring and Pareto analysis of these two categories. The core purpose of Pareto analysis is to present the difference between the current period and the previous period, such as the trend of sales volume growth; while Pareto analysis is to present the ranking of each element of a single dimension, such as the ranking of the trend of the current period's sales volume growth in various cities and regions, as well as the conclusions of which cities and regions contributed to the top 80% of the growth volume. The conventional analysis method has become the most basic analysis method, and will not be described in detail here.

2Statistical analysis methodsStatistical analysis methods can be based on the laws of past data to deduce future trends, which can be divided into a variety of laws summarized in the way. According to the principle is mostly divided into the following categories, including guided learning algorithms with target conclusions, and unguided learning algorithms without target conclusions, and regression analysis.

The guided learning algorithm simply means that there is historical data in which a target conclusion has been given, and then analyze what happens when each variable reaches the target conclusion. For example, if we want to determine what level of indicators we need to reach to determine that the person suffering from heart disease, we can put a large number of heart disease indicators of people's data and no heart disease of normal people's indicators of data are input into the system, the target conclusion is whether or not there is a heart disease, the variable is the indicators of data, the system based on the data to calculate a function, the function can be appropriately described Various indicators of data and the final relationship between whether this is a heart attack, that is, when the various indicators to reach what the critical value, the person has a heart attack judgment, so that later come back to the patient, we can be based on the critical value of the indicators. The function in this case is the algorithm itself, which has a variety of algorithmic logic, including the common Bayesian classification, decision trees, random forest trees and support vector machines, etc., interested friends can look at the Internet to see how the logic of various algorithms.

Alternatively unguided learning algorithms, because they don't have a given target conclusion, are the result of combining all the data with similar attributes among the metrics separately to form clusters. For example, the most classic analysis of beer and diapers, business people want to understand the beer and what with what to sell together will be more acceptable, so you need to put all the purchase data in, and then calculated, the degree of correlation between the other goods and beer or distance, that is, at the same time purchased a beer in the crowd, have to buy what other goods, and then will output a variety of results, such as Diapers or beef or yogurt or peanuts and so on, each of which can be a clustering result, because there is no target conclusion, so these clustering results can be referred to, and then it is the goods placement personnel to try a variety of clustering results to see the effect of the degree of improvement. In this case the degree of association between each commodity and beer or distance is the algorithm itself, which has a lot of logic, including Apriori and other association rules, clustering algorithms and so on.

There is also a large category of regression analysis, which simply means that after adding, subtracting, multiplying and dividing several independent variables, you can come up with the dependent variable, so that you can project how much the dependent variable will be in the future. For example, we want to know whether the activity coverage, product price, customer salary level, customer activity and other indicators and the purchase of whether there is a relationship, as well as if there is a relationship, then can we give an equation to the data input into these indicators, you can get the purchase of this time on the need for regression analysis, through these indicators and the purchase of input into the system, the arithmetic can be derived separately, these indicators have no effect on the purchase, as well as the purchase of these indicators can be used in the future. By inputting these indicators and the purchase volume into the system, it can be calculated that these indicators have no effect on the purchase volume, and if they have a role, then how each indicator should be calculated in order to arrive at the purchase volume. Regression analysis includes algorithms such as linear and non-linear regression analysis.

There are many other ways to analyze statistics, but today we use the above categories of analytical methods, in addition to the various analytical methods, there are many different algorithms, which is part of the need for analysts to master more.

3 self-constructed model self-constructed model is the most advanced in the analysis method is also the most valuable digging, in today's financial sector, and even the industry specifically for this crowd named a name called broadcasters, this group of people is to rely on mathematical models to analyze the financial markets. As the algorithm used in the statistical analysis method also has limitations, although the statistical analysis method can be generalized in a variety of scenarios, but it has the problem of imprecision, in the guided and unguided learning algorithms, the conclusions reached are mostly contained in the conclusions more reflected in the conclusion of the imprecision, and in the field of finance such as every cent, this algorithm obviously can not achieve the demand for accuracy, so mathematicians in the This field specializes in self-built models to input data that can be obtained to derive investment recommendations. Among statistical analysis methods, regression analysis is the closest to a mathematical model, but the complexity of the formula is limited, while the mathematical model is completely free, able to make any combination of indicators to ensure the validity of the final conclusion.

Sixth, data analysis theme selection

In the data analysis methodology, further is the application of the analysis methodology in the business needs, based on the business theme of the analysis can be involved in too many areas, from the customer's participation in the activities of the conversion rate, to the customer's retention of the length of time to analyze, to the internal links between the timely rate and accuracy, etc., each has a unique indicator and the dimensional requirements, as well as the analysis methodology. dimensions of the requirements, as well as the requirements of the analysis method, in my personal experience, the main analysis of the theme are centered on the marketing, operation, customer three major perspectives to carry out.

1 marketing/operations analysis marketing and operations analysis from the process and the final results of the analysis, including marketing activities from the release of the process of customer purchases to the process of analysis, operations from the beginning of the use of the customer to stop using the process of analysis, the former is more inclined to analyze the trend of changes in customer behavior, as well as behavioral differences between different types of customers, the latter is more inclined to analyze the timeliness and effectiveness of the service during the process, the latter is more inclined to analyze the process of service and efficiency. the former tends to analyze trends in customer behavior and differences in behavior among different types of customers, while the latter tends to analyze the timeliness and efficiency of service during the process, as well as the differences in service demand among different types of customers.

While conventional analytical methods are used for this part of the analytical theme, simple patterns of change and major types of customers are presented through peer-to-peer and Pareto, through statistical analytical methods, marketing analytics can derive the differences in customer characteristics between marketing successes and marketing failures based on guided learning algorithms, while operations analytics can derive the differences in customer characteristics between marketing successes and marketing failures based on unguided learning algorithms. Which characteristics of customers have a prominent need for which services, and in addition both marketing and operations analytics can use regression analysis to determine which of the various performance metrics have a direct impact on purchasing as well as satisfaction. Through these deep dives, it can help guide marketing and operations staff to better fulfill their tasks.

2Customer analysisCustomer analysis is used in addition to marketing and operations data correlation analysis, in addition to separate analysis of customer characteristics is also of great value. This part of the analysis is more through the statistical analysis method of guided and unguided learning algorithms, on the one hand, for high-value customers, through the guided learning algorithms, to be able to see what characteristics can affect the value of the customer high and low, so as to provide guidance for the enterprise to lock the target customer; on the other hand, for all customers, through the unguided learning algorithms, to be able to see that the customer can be divided into which kinds of On the other hand, for all customers, through the unguided learning algorithm, it can see what kinds of customers can be categorized into, and carry out focus discussions and scenario observation for each group of customers, so as to dig out the differences in the needs of different groups of customers, and then provide precise marketing services for each group of customers. Through these operations, the complete process of an enterprise's data analysis or data mining work is presented. Can see, whether it is data collection, or analysis methods, or analysis of the subject, based on big data and Internet support, in the future there will be a substantial increase in data analysts will become the next stage of the key business support staff, that is, in the future, in various fields, will produce a large number of broadcasters, or growth of hackers such as data analysts to drive the development of the enterprise.