Current location - Loan Platform Complete Network - Big data management - Sense of data, can not be separated from the data analysis thinking
Sense of data, can not be separated from the data analysis thinking
This article explains some of the knowledge of data analysis, because no matter whether it is a developer, analyst, product, operation, in the digital era, master the basic concept of data analysis, is a necessary work skills. We often refer to "Data Sense", which can also be summarized as "Data Analytics Thinking".

This article includes the following four parts:

1. Why data analysis is important

2. What are the common analytical methods

3. Some data-driven methodologies

4. Skills progression of data analysts

|0x00 Why data analysis is important

The first level. Statistics remains the core methodology for data analysis.

Let's look at the definition of data analytics: "the process of studying data in detail and summarizing it in order to extract useful information and form conclusions". Data analytics is a statistically based approach to provide rigorous analysis methods and tools for social science problems. Although the emergence of big data technology has greatly expanded the boundaries of the field of statistical research, big data technology has not changed the basic idea of statistics to infer the characteristics of the overall distribution through random sampling, and most of the basic methods of statistics, such as causal inference, the principle of sufficiency, and data summarization, etc., have even been enhanced by the popularity of big data technology. Through the augmentation of big data technology, many important socio-economic psychological variables become constructible, such as residents' happiness, investors' sentiment, etc., and the development of real-time technology even makes real-time prediction possible.

At the second level, data analytics guides the development of the business.

Or to quote the management guru Peter Gluck, "You cannot impove it if you cannot measure it", only if we find the key to business development. Only after we find the key measurements of business development, that is, the "North Star Indicators", can we optimize our business. There is a widely circulated saying on the Internet, one of the promoters of Google Analytics, Avinash Kaushik's famous sentence: "All data in aggregate is crap. Segment or die.", meaning "All data in aggregate is crap. Aggregate data can hide a lot of problems, and it is only through drill-down analysis of the data that you can get to the real reason for the trend and understand how to optimize the Polaris Indicator. With the demographic dividend of the Internet gradually disappearing, in-depth understanding and analysis of business data is the only way to maintain high-quality business growth.

In summary, data analytics is still very important, and if you want to understand how your work can generate value, data analytics knowledge is the essential "data sense" for data practitioners.

|0x01 What are the common analytical methods

The job skills of data analysts require that they be able to analyze problems and solve them in a structured and systematic way, and we need to draw on some common analytical methods to quickly locate the root cause of the problem.

The analysis method includes two parts, one is the macro strategic analysis, the other is the micro data analysis.

Macro strategic analysis, mainly including:

PEST analysis, through the study of politics (Politics), economy (Economy), society (Society), technology (Technology) four aspects, to analyze the macroeconomic conditions faced by business operations;

SWOT analysis, through the study of strengths (Strengths). SWOT analysis, through the study of strengths (Strengths), weaknesses (Weaknesses), opportunities (Opportunities) and threats (Threats), to dynamically analyze the enterprise's internal and external competitive status;

Porter's Five Forces model, through the analysis of the industry's competitiveness of the existing competitors, the ability of potential competitors to enter the ability to substitute the substitution of substitutes, suppliers' bargaining power, and the bargaining power of suppliers, and the ability to compete.

The Porter's Five Forces model analyzes a company's competitive strategy strategy by analyzing the competitiveness of existing competitors in the same industry, the ability of potential competitors to enter, the substituting ability of substitutes, the bargaining power of suppliers and the bargaining power of buyers.

Although macro analysis is too big a topic for our daily work, it is actually helpful to analyze policies, regulations, risks, and other considerations for some specific industries, such as insurance, healthcare, online education, mutual funds, logistics, and so on.

Next we talk about more common methods of microdata analysis, here is a list of a few common methods, with a small Case.

The first to talk about is hypothesis testing.

Hypothesis testing analysis, also known as statistical hypothesis testing, is used to determine the sample and the sample, the sample and the overall difference is caused by sampling error or the essence of the difference caused by the statistical inference method. It is mainly divided into three steps: 1, formulate hypotheses; 2, collect evidence; 3, draw conclusions.

Hypothesis testing is mainly based on logical reasoning to analyze the causes of the problem, so it is often used in attribution analysis.

For example, if our Polaris metrics are down, we need to find out the corresponding cause, and initially there are three possibilities, namely user problems, product problems, or competitor problems.

From these three aspects, we can put forward three hypotheses:

If the user has a problem, then we can analyze the problem from the business chain diagram, or multi-dimensional analysis to dismantle it;

If the product has a problem, then we can study the product features that have recently been launched to see if they meet the needs of the users;

If the competition has a problem, then we can use external market information to investigate the problem.

If there is a problem with the product, then you can research whether the competitor is subsidizing the promotion on a large scale through external market information.

After the initial conclusion, the process of analyzing usually continues, asking a few more questions about why, and then continuing to use data to verify the reasons until the root cause of the problem is found.

The second one to talk about is logic tree analysis.

Logic tree analysis is relatively easy to understand, is the complexity of the problem is split into several simple problems, and then like the trunk of the tree, the problem gradually expand, through the solution of individual sub-problems, and then get the answer to the summary of the problem.

For example, to analyze the reasons for the slow growth of profits, we can split the problem into three dimensions: revenue, cost, and gross profit, and then analyze each dimension in turn.

Revenue needs to consider the volume of customers, customer quality, payment rate, willingness to pay and other issues; cost needs to consider the cost of advertising, labor costs, promotional strategies and other issues; gross profit needs to consider the warehouse distribution, channel quality and other issues. Finally, through the aggregation of the various sub-issues, the real reason is derived.

Logic tree has three basic principles, namely

Elementalization: summarize the same problem into elements;

Framing: organize the elements into a framework, abide by the principle of no weight, no omission;

Relevance: the elements within the framework to maintain the necessary interrelationships, simple but not isolated.

The third speaks about cluster analysis.

Cluster analysis is to divide the data into groups according to a certain characteristic, such as time, interest, etc., and compare the problem by comparing the data differences between groups.

Group analysis is useful for analyzing different stages of the product lifecycle, such as how effective a new release is, dividing users into different groups by time, and then comparing retention rates between different groups to analyze why users stay or leave.

As an example, users of a video platform need to recharge as VIP to see platform-exclusive TV series, but users can cancel the subscription in any month, and this type of canceled subscription is a churned user. In order to analyze why users are churning, we can use the cluster analysis method.

By plotting each group's data as a line, with time on the horizontal axis and retention on the vertical axis, and then comparing the lines for each group, we can usually easily see that there is a big difference in retention rates at different times, for the following general reasons:

The product recently went live with certain new features, but these new features are not suitable for new users;

The market has recently been

Together with the hypothesis testing mentioned earlier, we can further analyze the root cause of the problem, so that we have formed some fixed analysis methods: 1. group analysis, to find the group with the lower retention rate; 2. hypothesis testing, to raise the question, to verify why the retention rate is so low.

By combining different strategies, we have formed our own analytic methodology.

Of course, there are many other ways to analyze data, and these need to be summarized and improved a little bit through daily study and practice.

|0x02 Some data-driven methodologies

Data-driven means, in simple terms, to analyze the causes of problems through data for already digitized businesses, such as e-commerce, video, and so on, and to propose optimized solutions to drive business growth, or product iteration. This is the Internet industry to maintain the trick of growth, but also data practitioners need to master the business approach, but also to evaluate a person's ability to work an important measure of standards.

A data-driven approach usually consists of the following processes:

Qualitatively analyzing data to identify problems;

Quantitatively analyzing data to determine impact;

Researching companies, competitors, and industry practices;

Estimating the effect of solving a problem;

Designing the appropriate experimental mechanism;

AB testing to draw experimental conclusions;

going online and tracking subsequent changes to the strategy.

There is some knowledge here that needs to be covered for data analysis, i.e. qualitative and quantitative analysis, AB testing. Other parts are usually implemented by the engineering team accordingly.

Qualitative analysis is the "qualitative" study of the object of study, analyzing the internal laws; quantitative analysis is the quantitative study of the object of study, describing the interaction and development trends.

For example, through the data, we found the problem of "low conversion rate from order to payment" in the e-commerce scene, and we need to analyze the problem. We found that some of the products have this problem through the grouping + funnel method, and then through sampling to look at the data, analyze the reasons for the problem, probably because of false prices, this is qualitative analysis. After locating the cause, we select abstract commodities and manually assess the proportion of false prices to infer the overall scope of the impact, which is quantitative analysis.

Next, we set some strategies, we need to verify the impact of these strategies on the problem of "low conversion rate from order to payment", we need to conduct experimental comparisons.

An AB experiment is a randomized group of people with two or more solutions to the same problem, and in the same time dimension, we conduct experiments with an experimental group and a control group, using a small number of the same metrics, to measure which set of solutions performs better. Doing so, of course, requires a sufficient sample size, but that's usually not difficult for already digitized Internet businesses.

By analyzing the comparative data on the effectiveness of the different strategies after the AB experiment, we can see if our strategy delivers the expected positive results, and if so, we can go live. After going live, then analyze the quantitative data to see how well the problem was solved.

These are some of the general approaches to data-driven.

|0xFF Skill progression for data analysts

Data analysts also need to understand algorithms.

Often times, analysts are divided into "forward" and "backward" roles, just like developers. The "forward" role is close to the business to go, can find problems in the business, looking for the corresponding optimization point; "back" role is more of a function of the ground to achieve, can optimize the algorithm or test methods, more like the back end, but more intelligent.

Although statistics can provide us with a very good analysis method, but the world's problems are not all statistics can be summarized, many directions of analysts, still need to master the algorithm to cope with the needs of the work.

For example, the most typical problem is "matching supply and demand", because quantitative change leads to qualitative change.

In the past history of the development of the Internet, no matter where in B2C, C2C, B2B, B2B2C, we have established an accurate image system, not only the user's image, but also the supplier's image, to achieve a thousand people in front of the management of the user, to do a better job of matching supply and demand management. Later, this set of mechanisms derived to other aspects of the personalized recommendation of video, online car management, are part of the supply and demand matching.

But how to do matching recall inside the millions or even billions of goods, how to match the clues in the massive data, how to make clear which population is our target population, how to recommend the information flow to the most suitable people, and how to measure the effect of these ...... many programs, need to be considered comprehensively, in the end, through the data analysis based on statistical data analysis to form rules, or need to dig through the algorithm to the characteristics of the way to reach the goal.

Large companies, due to the abundance of resources, usually the two will be in parallel, from a certain point of view, it is also a strict distinction between data analysis and data algorithms between the boundaries of responsibility; and small and medium-sized enterprises with limited resources, may result in the phenomenon of the analysis of the algorithm.

Similarly, there are areas such as wind control and knowledge mapping, which, in addition to human coverage, require the intervention of machines in order to optimize the results.

In fact, the growth of data analysts, more like a marathon long-distance running, because there is a lot of knowledge that needs to be exposed to, to be able to reasonably allocate their time and energy, and often remind themselves of what the core goal is to do things well, and not to fall behind in the long long-distance running process. Analysis is just a skill, take it as a career in life, more need to be close to the actual scene, close to the development of the company, make the corresponding reasonable strategy.