Current location - Loan Platform Complete Network - Big data management - Talk about what constraints facing the development of big data in China
Talk about what constraints facing the development of big data in China
1. Few high-quality data available

In the past few years, data trading organizations have mushroomed, and "data realization" has become a new way of making money for many traditional enterprises with data accumulation. At present, the demand side of China's big data is dominated by Internet companies, covering a wide range, in the O2O trend, large Internet vendors try to introduce external data to support a variety of services such as finance, life, voice, travel, health and education.

However, within specific fields or industries, China has generally not formed a well-formed chain of data collection, processing, analysis and application, with a large number of data sources not activated and most data owners without a path to externalize the value of their data. For example, various medical and health applications have collected a large amount of data, but have not sold data to pharmaceutical companies like that. Compared with foreign countries China's government, public **** services, agricultural applications are basically missing, telecommunications and banking industries are even more lack of collision with external data.

In addition, in fact, the matter of data trading itself is a paradox. Data as a commodity has a certain specificity, I use others can also be used, there is no consumption, can be sold in the market many times. This creates a problem, you this data to the market to sell, according to the economic point of view its value is zero, you sell me I can use a lower price to sell to others, so data trading is theoretically also not feasible.

After the concept of big data has caught fire, many organizations feel that data storage is a treasure, so they have accumulated a large number of fragmented data there, in the end, what role can be played is not known. In working with many organizations that really want to do something with their data, we've found that even the most authoritative data holders, like government agencies, have a lot of problems with missing data, wrong data, and a lot of noise.

We often talk about using big data methods for big data and small data methods for small data, and that perfect data can never wait. But what problems does this lead to? In the actual project implementation process, our data scientists have to spend a lot of time on data cleansing, which is actually a waste of the already scarce data personnel.

Theoretically we have a lot of data in China, but data from different departments exists in different places and in different formats. Integrating data from different departments within the government itself is already a big headache, let alone opening up data on a large scale. At the same time, data openness is facing a serious problem is the privacy issue, desensitization is far from enough, the privacy issue is a bottomless pit. For example, if we take a person's Alipay 3-month data, we can easily know that this person bought a bottle of water at the door of the convenience store today, bought a sofa in Taobao yesterday, and every three months there will be a million dollars of expenditure. Then we can easily infer that this person has just changed the place where he rents an apartment, and we can understand his spending habits. This data is in fact completely desensitized, no name, no number, but it does not in any way prevent us from completely sketching out this person's portrait through the algorithm.

2. There is still a huge distance between the actual technology and the business

The development of the big data industry so far, there is still a huge gap between the technology and the business. First, there is the data analysis technology itself. Data source companies to realize the value of data realization, try a variety of methods, and even set up their own data analysis team, but the data analysis is a technical work, 1% error will greatly affect the market share, the industry has a specialization, data realization or the need for professional data analysis personnel to achieve.

The concept of big data is hot, do more and more big data companies, do a variety of products, data modeling seems to be anyone can get involved, but now the data analysis technology, methods, models, algorithms have improved greatly, with the past sixties and seventies is completely different, not to say that to do a few SAAS software or RAAS software is the big data, although short-term look at the market, but in the long run this road is to go, the market is hot. But in the long run this road is not going to work, the development of the big data industry, technology is the real starting point to improve the industry barriers to entry is particularly important.

Secondly, China's data has its own characteristics. For example, in the financial industry, most banks currently use risk scorecards, using expert experience to define risk variables, scoring based on qualitative understanding, and optimizing the scorecard by checking the risk after the fact, the risk warning function is poor. Although the central bank credit center and a few domestic banks with leading technology use risk scoring models, the model methodology is relatively outdated, such as the FICO scoring model used by the central bank is a scoring system constructed based on logistic regression algorithms in the 1980s, which is suitable for dealing with linear data, but the actual problem tends to be non-linear, especially in the case of credit risk assessment scenarios. In addition, the FICO model does not have a scenario breakdown for the specific business in China, and the modeling logic is not fully in line with the actual situation in China, thus leading to a lack of accuracy and poor risk early warning capability. Based on this, PBOC Credit Center cooperated with a domestic big data company for the first time. In this cooperation, CIPLIN Technology applied the international leading big data modeling and analysis technology to use algorithms such as Decision Tree Random Forest, AdaBOOST, GBDT, SVM, etc., and accurately predicted the default risk through the digital interpretation of the credit report and in-depth insights, which formed a guide to the approval of the loan and the management of the loan, and the new model The distinction between good and bad accounts is much higher than the industry average. This cooperation shows that China's big data challenges are more in need of solutions adapted to national conditions and local technical talents, which poses a new question to our market.

3. Scarcity of talent

The biggest advantage of our country's big data development is the big market, the biggest disadvantage happens to be the lack of corresponding talent, the lack of talent is very serious. First of all, in the international market, we have to compete with foreign companies for talent, but the foreign big data industry is also very hot. And whether at home or abroad, competing with companies for talent is a difficult undertaking, such as one of the best universities in the world, Princeton University, the United States, want to find mathematicians is also very difficult to find talent is very easy to be dug up by large companies, every year there is a very good data analytics talent is dug up by the enterprise. So the difficulty of finding talent is not just verbal, but also an urgent problem Big Data is a cross-disciplinary, involving statistics, management programming and other multidisciplinary, complex knowledge points, the lack of systematic learning tutorials.