Current location - Loan Platform Complete Network - Big data management - The history of the era of big data
The history of the era of big data
In the era of big data, technology has come to the end of religion.

This is an era when everyone is talking about "big data", but where does "big data" exist? Where is it affected? The general public is inevitably confused by the sweeping wave of "big data", which disturbs their sight and thoughts. It is at this moment that I feel it is particularly important to maintain awe and clear thinking and realize the limitations of "big data".

Infiltration moment, ubiquitous big data

Big data may be one of the most compelling topics nowadays. From analyzing the romantic index of different cities through the sales ratio of flowers and condoms to finding that Xinjiang people's bikini sales are the first in inland areas, from contributing to energy conservation and emission reduction to the German national team collecting player information for the World Cup by using big data technology, to analyzing landing batches and fighter models for one minute according to the landing signals of enemy airports, and then to "She", which won the Best Original Screenplay Award in the 86th Academy Awards. The affection between the protagonist and the artificial intelligence system in the screenplay gradually deepened until they fell in love, and big data brought endless reverie to people.

As Ma Yun said, mankind has moved from the IT era to the DT era. Che Pinjue, Chairman of the Data Committee of Alibaba Group, also emphasized two important points in his book "Big Data": First, big data completely eliminated "sample bias". "Samples are different from big data. Big data believes in all data, not samples; It is obtained by analysis, not by sampling "; Second, correlation analysis in the era of big data can create previously unimaginable scenes. In extreme cases, the accumulation of online data can form an individual's "online personality", which affects and even controls people's offline behavior.

Arrogance is a sin, keep a heart of awe.

The prospect of big data is so beautiful that I am speechless. However, arrogance is a sin. The "fruit of wisdom" makes mankind have wisdom, but at the same time, it also makes people who leave the Garden of Eden unable to get rid of the original sin of arrogance. From "Babel" to "the establishment of a paradise on earth", human beings who have lost their awe often do great harm to themselves. In the era of big data, we should also maintain a sense of awe and achieve the following three points.

First, sample bias has always existed, and big data has not surpassed statistics.

What is sample deviation? The best example comes from World War II. Its simplified version is that the Royal Air Force is distressed by the German fierce air defense firepower and wants to reduce the loss rate of fighters by strengthening aircraft armor. However, due to the load of the aircraft, the armor can only be strengthened in some parts. To this end, they turned to statisticians. After carefully observing the bullet marks on the plane that successfully returned to the airport, the experts gave an unexpected conclusion: add armor to the parts without bullet marks. In the face of doubt, the statistician only answered one sentence. All those planes with bullets in their parts crashed. It can be seen that statistics is always a craft, and no two brushes are fatal.

Statistics is essentially a theoretical system that infers the whole by part and predicts the future by the past. Its biggest weakness is that sample deviation will invalidate the conclusion when partially speculating the whole. Then, in the era of big data, it has really gone to heaven. Is there no problem of sample deviation? The answer is obviouslyno. Phenomenologically, even in the era of big data, data and application scenarios will be seriously separated. Take the ratio of flowers to condoms on Valentine's Day as an example. Because "you know", a lot of condom consumption takes place offline, and data cannot be obtained online. Due to the limitation of technical means or business model itself, the data collected by online system is only a part of the complete scene, not all the data. Another example is Xinjiang people selling bikinis. If the data analyst can't realize that in the real scene, Xinjiang bikini sales are mainly concentrated online (there are few or no offline traditional channels? In other provinces, bikini sales are mainly offline (online sales account for 8%~ 10%), which will lead to wrong conclusions. At the same time, in Xinjiang, Taobao Tmall's online sales basically represent real online sales. But in first-tier cities such as Beishangguang, JD.COM. The online sales of COM are already comparable to those of Taobao Tmall, and only considering Alibaba's data will seriously underestimate the real sales.

Theoretically, the separation between data and application scenarios is essentially sample deviation. Because of technical or interest reasons, the data collected in the era of big data can not completely cover all aspects of application scenarios, and the obtained data is still partial, not all. Finally, from a philosophical point of view, even if the technology has made great progress in the future and solved the problem of separating data from scenes, there will be a perfect business model for competitors to share data with each other, and sample bias will still exist. Its core lies in that although human beings have the ability to understand all the laws of the objective world, the objective world itself is not static, but constantly moving. Past data must not reflect the future development law of the objective world. The concept of "carving a boat for a sword" is not realistic. From this perspective, the essence of the "black swan" incident is sample deviation. No matter how advanced the technology is and how sophisticated the business model is, this problem will not be solved. Therefore, even in the era of big data, people should still have awe. In this era, technology has indeed wandered to the edge of religion.

Second, the big data conclusion is a statistical overall conclusion, not for individuals.

Any theoretical analysis and conclusion based on statistics are holistic. Asimov perfectly expounded this view in the book "Base". Hari seldon took billions of inhabitants of 20 million planets in the Milky Way as the research object, successfully established the psychohistory, and successfully predicted that the galactic empire would go through 30,000 years of dark barbarism and the emergence of the second galactic empire. But it is impossible to predict individuals with this theory. So it cannot predict the appearance of mutant mules. Without the existence of the second base, the whole revival plan was almost out of control. Out of control describes a similar phenomenon. Generally speaking, the behavior of fish in deep sea is easy to predict. But the behavior of a single individual is irregular and unpredictable. Taobao/Tmall's "Thousands of People" is an important attempt in the era of big data. Its core is to display personalized search results for Taobao/Tmall customers based on big data. The core details of the project are unknown to outsiders, but based on theoretical analysis, reasonable speculation can be made. First of all, the data collected by Taobao/Tmall must not be the so-called "full data". Under the existing conditions, many core data related to customers' purchasing interests cannot be collected. Secondly, even if the accuracy of the model can reach 99%, for a platform with a scale of hundreds of millions, nearly 10 million customers will have a poor user experience. Based on this, the degree of personalization of "thousands of people and thousands of faces" must be rationalized, otherwise, the better the ideal, the more skinny the reality.

Third, correlation is never a causal relationship, and there are as many traps and opportunities in this respect.

Correlation analysis is a sharp tool for data analysis, and it is also the place where problems are most easily introduced. Correlation is not causality. Statistics show that when the sales of ice cream increase, the number of people drowning will increase rapidly, and there is a strong positive correlation between them. So will eating ice cream cause people to drown? The answer is obviously no, but hot weather will increase the consumption of ice cream and people's opportunities for water activities at the same time. A more convincing example is that the statistical data of a certain period shows that there is a strong positive correlation between the price of liquor and the income of priests. Is it true that the clergy are all "wine and meat penetrate the intestines, and the Buddha pays attention"? The answer is no, the real reason is that inflation has led to the increase of liquor price and priest's income level. In the era of big data, the confusion of correlation and causality may lead to far more problems than before. In the era of big data, data is extremely rich and computing power is extremely powerful, and we can find correlations that we could not find in the past. This is an exciting place in the era of big data. But at the same time, it is difficult to distinguish between correlation and causality. Once the judgment is wrong, it will cause great problems. For example, the credit discrimination model and automatic lending that Ali Small Loan is proud of at present. Assuming that the correlation of the current credit model is invalid, "that is, the inflation rate is stable for a long time, and the price of liquor is no longer strongly related to the income of priests", then the real credit rating of the subjects screened by the existing model is extremely risky and the consequences are unimaginable. The above analysis is purely theoretical and does not point to specific projects. However, with the progress of big data technology, it will become more and more difficult to distinguish between relevance and causality, and the risks will become higher and higher.

The most understandable thing in this world is the incomprehensible. The most incomprehensible thing in this world is that it can be understood. In the era of big data, we need to have a heart of awe. Arrogance is a sin.

The above is what Bian Xiao shared about the era of big data. Technology has come to the end of religion. For more information, you can pay attention to the global ivy and share more dry goods.