Big data link reform
In recent years, big data has become a topic of the times, just like cloud computing. How did big data come into being and where are the business opportunities? Where are the research opportunities? What kind of future did this concept give birth to? I attended a small seminar in the garage coffee yesterday and discussed these issues. I made a summary according to my own understanding. First of all, how did big data come into being? 1) A lot of information in the physical world is digital. For example, the good doctor network pointed out by teacher Liu Jiang is to digitize the information of doctors and outpatients. In fact, there are many others, such as Sina Weibo's behavior of chatting in a teahouse (weak relationship leads to digitalization of information) and chatting with friends (strong relationship leads to digitalization of information). The video surveillance probe digitizes the image. 2) The emergence of social networks in the Yahoo era, mostly reading operations, only Yahoo editors do some writing operations. In the era of web2.0, the number of users has greatly increased, and users voluntarily submit their own behaviors. Entered the social era, the mobile era. With the emergence of a large number of mobile terminal devices, users not only actively submit their own behaviors, but also interact with their social circles in real time, thus generating a large amount of data, which is highly contagious. 3) Data should be kept. A guest pointed out that the San Francisco Bridge has preserved historical data for a hundred years, which has produced value in time span. Many websites did not pay enough attention to data in the early days, and the cost of saving data was high, and the storage equipment was expensive. But times have changed, storage devices have become cheaper, users' own data has been valued, and the value of data has also been valued. Therefore, more and more data are continuously saved. Second, what is the difference between big data and large-scale data? Before big data, academic circles called it super data. What is the gap between big data and large-scale data? I think in English, the meaning of big is only volume, and the meaning of big also includes weight and value. So I think: 1) Big data is not piled up in quantity at first, but has a strong correlation structure. For example, there is a data that records the height of every big tree in the world every year. Such data is worthless because it is simply piled up. If the data becomes that each big tree records its location, climate conditions, tree species, tree age, surrounding animal and plant ecology and age, then this data is structural. Structured data has great research value first, and then it has great commercial value. Take Taobao's data as an example, if only the buyer, seller, transaction items, price and other information of a transaction are recorded, then this commercial value is very limited. Taobao includes social relationships between buyers and other behaviors before and after shopping, so this data will be very valuable. Therefore, only three-dimensional and structured data can be called big data and have value, otherwise it can only be called large-scale data. 2) The scale of big data must be very large, even larger than that of large-scale data. Making some prediction models requires a lot of data and training corpus. If the amount of data is not large enough, many mining tasks are difficult to do, such as click rate prediction. The most straightforward example, if you can know a user's long-term whereabouts data, online behavior, reading and writing operations. Then you can almost make a very accurate prediction of this person, and all kinds of recommendation work can be done very accurately. Finally, where are the opportunities for big data? Where are the opportunities for small companies? Around the whole industrial chain of data, I think there are the following opportunities: 1) to obtain data. This opportunity basically belongs to a large enterprise like Sina Weibo, and the right to use a large amount of transaction data basically belongs to enterprises like JD.COM and Taobao. Small businesses basically have no chance to obtain these user data independently. 2) Data Collection For example, if you want to collect all the data of major manufacturers, major Weibo and government departments, this opportunity is great. However, if this work is enlarged, it needs the government, the middle level and the cooperation between enterprises. If the scale is small, it may be an alliance or a non-governmental organization, such as the China Rock Climbing Alliance. 3) Storage of data After data collection, the immediate problem is storage and the storage cost is extremely high. The original data cannot be deleted and needs to be kept. Therefore, companies that provide storage equipment and companies that perform storage roles have huge market opportunities, but this does not belong to small companies or early entrepreneurs. 4) Calculation of data After storing data, how to distribute data is a big problem. Various APIs and open platforms transmit data to provide follow-up mining and analysis, which also require large capital investment and are not suitable for small companies. 5) Data mining and data analysis need to provide value-added services, otherwise the data has no value, and it is not much better when it is big, and it is worthless. So this data analysis and mining work is of great value, and this opportunity belongs to small companies and small groups. 6) Use and consumption of data After the data is well mined and analyzed, these results need to be applied to specific occasions to get returns. Companies that do data mining and analysis must find these financial owners, and these financial owners are definitely not small companies. The future form of big data, or industrial chain structure, must be layered and huge, and the embodiment of value occurs at all levels. Each level is an important part of the ecological chain, which is pregnant with great opportunities and challenges. What we can do is to work hard and do the work that suits us. Editor: Kong Weiwei