Recently, a daily blog about the U.S. and European epidemic data of God predictions, in the Internet fire.
How godly is it? To name a few, here are two or three:
- For 10 consecutive days starting March 27, the blog's predictions of the number of infections in the U.S. were more than 90 percent accurate, with nearly 100 percent accuracy on April 4th.
- On March 31, the blog predicted that the U.S. outbreak would drop off a cliff in 8-10 days when the number of people tested exceeded 2 million; 7 days later, on April 6, the U.S. outbreak numbers dropped off a cliff, with the increase dropping from 12.43% to 8.13% The post generated a huge response, with more than 1.34 million readers.
- Since March 27, the blog's daily predictions of the number of infections in Europe have averaged 97 percent accuracy, including nearly 100 percent in the first five days of April.
Li Zhibin's predictions of the number of infections in the United States were 90 percent accurate
In response, some netizens commented: "God, the virus is listening to you, absolutely.
It is important to know that the outbreak of the new Crown pneumonia epidemic is a major global public **** event involving many complex factors such as politics, economics, geography, etc., the prediction of the specific number of people sounds like a fantasy, the accuracy rate is more of a metaphysicsSo, to be able to achieve the above prediction results, the blogger behind the blog can be said to be the contemporary God of the calculator.
So, how did this divine calculator come to be?
Tsinghua University graduate + 8 years of experience in market forecasting
The blogger behind this blog, that is, the God himself, is named Li Zhibin.
Li Zhibin, 1980 to 1985, studied at Tsinghua University, Department of Computer Science, 1985 to 1994, he studied and worked in the Chinese Academy of Sciences, thirty years old as an associate researcher, director of the product department, assistant director, moved to New Zealand in 1994, and then settled in Hong Kong to date, is now Hong Kong Zhijia logistics software company limited, Hong Kong I Ching Technology Limited, the general manager.
Screenshot of Li Zhibin's blog
In the two companies where Li Zhibin is located, the former's main business is the development of logistics systems; the latter has a background in the Chinese University of Hong Kong, the main business is the market demand forecast, that is, for enterprises to provide in the next three to six months in a particular region of the demand for products, price fluctuations and other aspects of the data analysis and forecast.
Lee said he began to enter the field of data analysis and forecasting in 2012, due to the background of the Chinese University of Hong Kong in the I Ching company, Lee also learned a lot of things from the professors by ear.
In addition, from the technical level, Li Zhibin's learning experience in the Department of Computer Science at Tsinghua University has also allowed him to have formed a complete knowledge system in software modeling and big data analysis; at the same time, Tsinghua University's science and technology learning style and background, which also allows him to focus more on the data, evidence, and examples, rather than conclusions.
All of this adds up to Li's sensitivity to data.
Early last year, Wuhan began to report cases, Hong Kong also appeared suspected of the new coronavirus patients, which makes the long-term in Hong Kong Li Zhibin quite vigilant; to January 7, 2020, the Hong Kong Special Administrative Region Government declared the new coronavirus pneumonia as a statutory infectious disease, and began to notify the public of the outbreak of data, which, Li Zhibin began to the new coronavirus pneumonia-related data tracking.
From then on, Li Zhibin got up every morning to carry out centralized data collection, which started with data from Wuhan, Hubei, and Hong Kong, then other parts of the mainland, and in late January, he began to collect data from overseas and organize them into Excel sheets, and at the same time, he began to use his professional knowledge to model the data and analyze and judge the official notification data in conjunction with the data in the news .
Initially, Li Zhibin just shared data and ideas with his Tsinghua classmates, but later also spent 30 minutes a day blogging and publishing on his Sina blog. Today, it's become a daily habit.
Of course, for Li, in addition to collecting, organizing, and analyzing regular data, he is also constantly combining his expertise to build a data model, and constantly supplementing and validating the parameters of that model to make it work as expected.
On March 27, Li gave his first predictions for U.S. infections, based on the stabilization of the data model, and on March 28, he gave predictions for European infections.
Li's predictions of the number of infections in Europe had an average accuracy of 97 percent
In his predictions, he included not only the number of infections, but also the rate of growth of the infections, the timing of the peaks, the total number of infections, the total number of deaths, and the death rate, but of course the number of infections was the most important metric he used to measure the accuracy of his predictions.
Even Li Zhibin himself didn't expect his predictions to be that accurate.
But Li emphasized that no one can predict the future with 100 percent accuracy, and that it is important to make rolling predictions.
He said: Forecasting is a dynamic process, because a lot of immediate measures, events and other unexpected factors, can not be predicted, this time the need for these unexpected events and decisions and other factors into the adjustment of the parameters, feedback to the prediction model, so that the operation of more accurate. My prediction model, prediction parameters are also in the process of continuous improvement.
The best software can not predict 100% accuracy
Li Zhibin's prediction, can not be separated from the two core elements: data, and prediction model.
The first is the credibility of the data. In the interview, Li Zhibin said, he began in January to collect data every day, at first only Wuhan and Hong Kong have data, until now, every day to collect data from hundreds of countries and regions.
Li Zhibin emphasized that in the process of data collection and analysis, there must be a need to screen for data conflicts; especially in the case of a large amount of data in the official notification, there are many ways, including news data, to check for possible data conflicts between data from different regions, and the more points of data conflict, the lower the credibility of the data.
At the same time, in the process of judging the authenticity of the data, it is important to look at the speed of data release; the higher the frequency of data release, then the credibility of some of the South Asia, Southeast Asia, the release of data is less, slower, the credibility will be discounted.
Outbreaks from the CDC's official website
In addition, when making judgments about the credibility of the data, you can also borrow news data to make comparisons. For example, if the ratio between doctors and patients is relatively stable, then the number of medical staff reported in the news can be used to invert the number of patients, Li told Lei Feng.
He said, in fact, all the data may have some human error or statistical error, there is no region of 100 percent credibility; but relatively speaking, the U.S. data conflict is relatively small, in terms of credibility is higher, the credibility of the data in Europe is second to the U.S. because of the imbalance between Western and Eastern Europe, so it will be taken as an average. However, the data from India, Southeast Asia, Japan and other regions just seem to have some problems with slow data release and more data conflict points, which affects the data credibility setting.
By the end of February, in the previous domestic data as the basis for modeling, validation of the basis, Li Zhibin began to the United States, Europe, two regions of the epidemic data forecast. So, on top of the data, Li Zhibin built a prediction model In fact, this is an extremely complex model, adding up to hundreds of parameters, of which there are 20 to 30 important parameters, divided into the following three categories:
The first category is the epidemic parameters of different regions/countries/cities number of people diagnosed, the population, the daily number of new people diagnosed, the number of suspected people, the number of daily tests, the number of people who died, number of cured, number of people in clinic, number of people admitted to hospital.
The second type of parameter is related to the characteristics of the region/city/country type of city, population density, temperature, weather, percentage of people over 60 years of age in the city, average age in the city, and construction of the city.
The third category of parameters is related to resources and governing capacity medical resources, number of beds, social organization capacity, transparency of information, management style and so on.
Li Zhibin said that in the actual operation process, generally the first is to use Excel to collect data, and then imported into the back-end database, in the use of self-developed software model to come up with the three conclusions, and finally he will then human judgment on the results He emphasized that there are a lot of parameters can not be quantified, for example, social sentiment; so the need for human involvement.
He also said: even the best software cannot predict 100 percent accurately.
When the big ship and the small ship meet the iceberg at the same time
Li Zhibin, who graduated from Tsinghua University, has a forward-looking insight and thinking that goes beyond data analysis.
For example, in the modeling process, Li started with domestic data, which not only had a significant impact on Li's modeling process, but also allowed him to make some observations. So the day before Wuhan was sealed off, he shared two ideas with his fellow students in his Tsinghua 80 classmates' group:
One was that Wuhan should be sealed off immediately because the data were rising too frighteningly;
two was that 20 to 30 grid-type field hospitals should be set up quickly in the Hubei province, especially in the Wuhan area, to serve as quarantine treatment centers, the so-called field hospitals, which became known as the square-cabin hospitals. Because the epidemic was growing so rapidly, isolating patients was a more critical prevention and control measure than treatment.
These ideas caused a lot of discussion in the student group, and of course there were questions and objections, but more than that, the students actively participated and put forward a lot of better ideas and suggestions, and benefited a lot. Later, these ideas proved to be pertinent, and were also confirmed by the official follow-up measures which were two weeks ahead of their time, such as the idea of a field hospital.
In addition to these suggestions, Li also found during the data analysis and modeling process that cities that become outbreak points tend to have several characteristics:
Older cities;
Humid climates;
Temperatures of 5 to 15 degrees;
Ageing sewer systems;
Higher proportions of elderly people.
It is worth noting that outbreak cities in different countries, such as Wuhan, China; Daegu, South Korea; Milan, Italy; Tehran, Iran; and New York, U.S.
all roughly fit these characteristics.
As for the attribution of these features, Li Zhibin emphasized that they are mixed with individuals' subjective and reasonable guesses, but they are also verified by a series of results before they are finally reflected in the prediction results.
He added that, in fact, among the parameters, there are also issues related to the way society is organized, the mode of management, and the transparency of social information, so he also sets the results as pessimistic or optimistic in his predictions.
If we go by the pessimistic predictions that Li gave on April 4, his overall prediction of the number of infections in the U.S. was 96 percent accurate.
Li's prediction of the number of infections in the U.S. was 96 percent accurate
But in the exclusive interview, Li emphasized the absolute importance of data in decision-making, despite the human involvement. He said, even aside from the epidemic, in a daily decision-making process, the importance of data can be said to be 100 percent; these data should not only be true, but also comprehensive, but also transparent, even if there is human involvement in the follow-up process, but also to be based on these data to determine the data, is the basis of decision-making.
So, how much coverage is there for data-based decision-making?
Li Zhibin believes that even a group public **** event such as the new Crown Pneumonia outbreak, which is quite accidental and contains complex social factors such as politics and economics, can be predicted.
He said, similar to the case of infectious diseases, there is a specific pattern of its development, there is a law of chance, we may not be able to grasp the 100% accurate law, but in a certain percentage of the law, we can still make some judgment and decision-making of course, provided that a huge amount of effective data.
From this, Li Zhibin also talked about an interesting analogy:
A big boat and a small boat, in the sudden encounter with the iceberg, they are bound to turn; but relatively speaking, the big boat's end is obviously more predictable. The small ship changes course at once, but the large ship is so large that it has an inertia, and therefore it has a greater likelihood of hitting the iceberg This inertia is the law, and the ship's size itself, is the amount of data.
The larger the volume of data, the more accurate the data, and the more transparent the relevant information, the easier it will be to predict when such mass events occur, and the more accurate the prediction, Li Zhibin concluded.
To learn more, "The accuracy rate was once 100 percent! Tsinghua alumni God predicted the U.S. epidemic" for more information, please continue to pay attention to the deep space of science and technology information column, deep space editor will continue to update you with more science and technology news.
Source: Deep Space Games Editor's Pick: Anonymous Heart of the King 2 Click to Try