At present, the commercialization of artificial intelligence has basically reached a stage of maturity in terms of arithmetic power, algorithms and technology, and if you want to be more grounded and solve industry-specific pain points, you need a large amount of labeled and processed relevant data to do algorithmic training support, and it can be said that the data determines the degree of grounding of AI. At present, China's artificial intelligence industry presents a good development trend, and as a strong correlation of data labeling industry, with the development of artificial intelligence and ushered in rapid growth.
Data determines the degree of AI landing, and basic data service is an important part of the commercialization process
The AI industry chain includes three layers: the foundation layer, the technology layer and the application layer. Among them, the foundation layer is the basis of the AI industry; the technology layer is the core of the AI industry; the application layer is the AI for specific application scenarios demand and the formation of hardware and software products or solutions.
Artificial Intelligence basic data service refers to the data collection, cleaning, information extraction, labeling and other services provided for AI algorithm training and optimization, with collection and labeling as the main focus. Among them, data labeling provides AI companies with a large amount of labeled data for machine training and learning, ensuring the effectiveness of the algorithm model.
AI companies and technology companies account for the main share, and the three major stages of AI application generate differentiated demand for data labeling services
From the demand side, AI data labeling customers are divided into four categories: AI companies, technology companies, scientific research institutes, and industry enterprises. ai companies and technology companies account for the main share, and AI companies focus more on vision, voice and other certain types of basic data services, while technology companies combine the advantages of the group to the overall strength of artificial intelligence, different departments will generate multiple types of data demand, scientific research institutions demand accounted for a relatively small share.
In addition to the traditional sense of the industry enterprises, such as automobile manufacturers, cell phone brand vendors, security manufacturers and other traditional enterprises around their own business technology expansion, also began to generate AI basic data needs, and the volume gradually increased, the future will release more market space.
From the point of view of different stages of AI data labeling service demand, enterprise application of artificial intelligence algorithms to go through three stages of research and development, training and landing, different stages of the data labeling services also have differentiated needs.
R&D demand is the new algorithm development and expansion of the data needs, generally larger in volume, the initial stage of the standard data set product training, in the middle and late stages of the need for professional data customization of the collection of labeling services;
Training demand is to optimize the accuracy and other capabilities of existing algorithms through the labeling of the data is the main demand in the market, with a focus on customization of the service, the algorithm has a higher accuracy requirements;
Training demand is through the labeling data on the existing algorithms for optimization of the ability of the accuracy Higher requirements;
The business needs of the landing scenarios are more mature algorithms, involving data collection and labeling more closely match the specific business, such as aircraft maintenance in the paint identification data, etc., for the labeling capabilities and suppliers to take the initiative to put forward optimization views of the service consciousness has a strong requirement.
Artificial Intelligence scale of nearly 200 billion, technology enterprises AI algorithm R & D investment scale is expected to be more than 37 billion yuan
July 2017, the State Council issued the "new generation of artificial intelligence development plan", artificial intelligence up to the national strategic level, benefiting from the strong support of the national policy, as well as capital and talent drive, the development of China's AI industry is at the forefront of the world. According to Sullivan's statistical forecast, the market size of China's artificial intelligence industry in 2020 will be about 185.82 billion yuan.
In 2019, Chinese technology companies invested about 400.5 billion yuan in technology research and development, of which AI algorithm research and development investment accounted for 9.3%, more than 37 billion yuan, and most of the investment came from Internet technology companies. The main AI algorithm application areas - computer vision, speech recognition/speech synthesis, and natural language processing accounted for 22.5%, 2.3%, and 7.1%, respectively, and computer vision-related algorithm R&D investment accounted for the largest share of the three, which is positively correlated with the number of vision-related startups, industrial demand and policy orientation Computer vision is still the most representative AI application technology in China.
Artificial Intelligence to promote the high-speed development of the data labeling industry, mainly image, voice data
As mentioned earlier, China's artificial intelligence industry in full swing, the landing process has greatly accelerated, and the application scene is gradually extensive, the data labeling industry as the upstream of the artificial intelligence industry has ushered in an explosive development in a few short years. Explosive development. According to iResearch data, by 2019, the market size of the data labeling industry is 3.09 billion yuan, and by 2020, the industry market size exceeds 3.6 billion yuan, and it is expected that the market size will exceed 10 billion yuan in 2025, indicating that China's data labeling industry is in a high-speed development stage.
By data type, China's AI data annotation market is dominated by annotation services in the fields of speech, image, and NLP. From the perspective of the R&D investment in AI algorithms in the previous section, computer vision, speech recognition/speech synthesis, etc. are the main R&D fields, so the demand for data labeling in the image and speech categories occupies a major proportion.In 2019, the scale of data demand in the image, speech, and NLP categories accounted for 49.7%, 39.1%, and 11.2%, respectively.
First-tier and new first-tier cities data labeling demand is strong, of which Beijing ranks first
From the point of view of the regional distribution of enterprises in the demand for data labeling, as of December 2020, Beijing, Shanghai, Chengdu, Shenzhen and Hangzhou were the distribution of data labeling enterprises in the TOP5 cities, the number of enterprises reached 185, 84, 68, 63, 46; of which the number of enterprises in Beijing, Shanghai, Chengdu, Shenzhen have all increased from April 2020, and the number of enterprises in Hangzhou has decreased from April 2020.
By type, most companies have multiple needs, such as different voices for audio labeling and different ways for image labeling. Among the companies with data labeling needs, Beijing region is far ahead, accounting for about 30% of the national demand, followed by Shanghai, Shenzhen, Hangzhou and Guangzhou in order. The proportion of each type of annotation in the TOP cities is as follows:
Customized demand has become mainstream, and the data service market is moving into demand normalization
The training of deep learning algorithms under supervised learning relies heavily on artificially annotated data, and in recent years, the AI industry has continued to optimize the algorithms to increase the depth of the neural network layer, and to utilize the large number of In recent years, the AI industry has continued to optimize its algorithms to increase the layers of deep neural networks, utilizing a large number of dataset training to improve algorithmic accuracy, and ImageNet's open source of more than 14 million training images and more than 1,000 classifications play an important role in this, in order to continue to improve the accuracy of the algorithms to maintain the superiority of the market generated a large number of labeled data needs.
To this day, the algorithmic models of AI practitioner companies have basically reached stage maturity after years of polishing, and with the commercialization of the AI industry, the demand for more forward-looking dataset products and highly customized data services has become mainstream.
It is understood that at present, a newly developed computer vision algorithm requires tens of thousands to hundreds of thousands of labeled pictures ranging from training, the development of new features requires nearly 10,000 pictures for training, and regular optimization of the algorithm also has the need for thousands of pictures, an algorithm used for the application of the smart city, hundreds of thousands of pictures of the stable demand every year; voice, the head of the company's cumulative application of the labeled dataset has reached more than a million hours, the annual demand is still rising at a rate of 20%-30% growth.
Not only that, with the popularization of 1oT devices, voice interaction scenarios are getting richer and richer, and every year there are more new scenarios and new demand parties, the demand for labeled data is also growing gradually. Combined with the market, with the development of AI commercialization, AI data annotation service demand into normalization, the stock market has a more stable source of demand, while the incremental market with the richness of the application scenarios, as well as the birth of new types of algorithms, has a broader imagination.
For more data to please refer to the Prospect Industry Research Institute China Data Labeling Industry Market Outlook and Investment Strategy Planning Analysis Report.