Current location - Loan Platform Complete Network - Big data management - The Right and Wrong of Data Crawling: Technology Neutral but Malicious Crawling Frequent, Where is the Infringement Boundary?
The Right and Wrong of Data Crawling: Technology Neutral but Malicious Crawling Frequent, Where is the Infringement Boundary?

From the case of Sina Weibo v. Pulse for improperly obtaining user data, to the data battle between Collage and hiQ Labs ...... the constant emergence of relevant judicial cases has made the issue of data crawling a concern.

On October 23rd, the Yangtze River Delta Data Compliance Forum (the third installment) and a seminar on the legal regulation of data crawlers were held in Shanghai. A number of legal experts, judicial workers and business representatives discussed topics such as the impact of crawler technology on the digital industry and the legal boundaries and regulation of crawling other people's data.

In the era of big data, as the value of data highlights, the application of data crawlers is increasingly widespread. A number of experts at the meeting mentioned that the crawler technology itself is neutral, but the application of crawler technology is often with a purpose, need to consider the crawl behavior and the use of data is justified.

"Fierce" web crawlers, increasing the burden of website operations

From a technical perspective, crawlers are programs that simulate the behavior of people surfing the web or using apps to efficiently capture web information. Not everyone is welcome to this technology.

Liu Yuchen, head of digitalization at L'Oréal China, said at the seminar that most websites reject crawler access, both for business interests and for their own operational security. Automatic, continuous and high-frequency access by crawlers can cause the load on the web server to soar, leaving some small and medium-sized platforms exposed to the risk of having their websites fail to open, their pages load slowly, or even be paralyzed outright. As a result, "website operators often suffer when faced with 'ferocious' web crawlers."

While websites can adopt strategies or technical means to prevent data from being crawled, crawlers also have more technical means to counteract what is known as anti-anti-crawler strategies. According to Liu Yuchen, anti-crawling and crawling technology has been iteratively updated - crawling is not a problem, the key lies in the willingness to crawl and how difficult to crawl. Usually, the more the big manufacturers of apps or websites the more difficult to climb, its anti-climbing mechanism is more.

Zeng Xiang, head of the Little Red Book legal affairs, observed that malicious crawler cases often occur in content platforms and e-commerce platforms. In the content more be crawled video, pictures, text, user behavior data, etc., in the field of e-commerce more be crawled merchant information and commodity information.

"Generally speaking, content platforms will agree that the intellectual property rights of the relevant content should be owned by the publisher or the publisher and the platform*** together. Crawling without consent is suspected of violating intellectual property rights." Zeng Xiang said, the platform through the input to stimulate the creativity of creators, if someone using crawler technology is very easy to access the content and plagiarized, adapted, which damages the interests of the platform.

When it comes to web crawlers, the Robots protocol is an inextricable topic - its full name is "web crawler exclusion criteria", the website through the Robots protocol clearly warns the search engine which pages can be crawled, which pages can not be crawled. Crawl. The agreement is also known as the "gentleman's agreement" in the search field.

Judge Xu Hongtao of the Intellectual Property Division of the Shanghai Pudong Court described it this way: a crawler is a visitor, and the Robots protocol is a sign hanging on the door of the house that says do not enter. When a humble person approaches the door and sees the sign, he or she will stop, but the wrongdoer may still break into the house.

Combing through the relevant jurisprudence, Xu Hongtao pointed out that the Robots Agreement is the Internet industry generally follow the rules, if the search engine in violation of the Robots Agreement to capture the content of the site, may be recognized as a violation of business ethics, constituting unfair competition. However, the Robots protocol solves the antecedent problem of whether or not the crawling behavior is appropriate, and does not solve the problem of whether or not the data is used appropriately after the crawling.

He further analyzed that the court in a case decision tends to think that the crawler technology is a neutral attribute, and respect for the website for the Robots protocol settings. If a crawler violates the Robots protocol to forcibly crawl, the legitimacy of the judgment may be given a certain negative evaluation. In addition, the Robots protocol is related to behavioral legitimacy, but it is not the only antithesis - even if the crawl conforms to the Robots protocol, it may be judged to be inappropriate because of the later use of the behavior.

It's worth noting that web crawlers defending against crawling behavior often associate Robots protocol restrictions on crawling with data flow.

According to Xu Hongtao, in the context of "connectivity", "order" and "flow" are equally important. This involves grasping the degree of "connectivity" and data ****sharing, and considering whether the Robots protocol strategy adopted by various Internet industry operators may lead to the emergence of data silos.

To determine the legitimacy of crawler behavior, multiple factors need to be considered

At the seminar, Zhang Yong, a professor at the East China University of Political Science and Law, categorized the harmful behavior of data crawlers.

He said that from the view of data types, data crawling may violate the rights and interests of computer system security, personal information, copyright, state secrets, commercial secrets, market competition order; from the view of crawling, data crawling may jeopardize the security of the computer information system, illegal access to citizens' personal information, illegal access to commercial secrets, destruction of copyright technology protection measures; from the crawling results, there is unfair competition. From the crawling result, there are problems of unfair competition, copyright infringement, and violation of personality rights.

As data becomes a factor of production, the application of data crawling technology is becoming more and more widespread, and the number of disputes and controversies is increasing. How to judge the legitimacy of crawler behavior, from the existing jurisprudence or can find some answers.

On September 14th of this year, the Hangzhou Internet Court announced a crawling WeChat public platform data unfair competition case, the defendant was sentenced to stop the data crawling behavior, and compensate WeChat losses of 600,000 yuan.

The court found that the defendant violated the principle of honesty and credit, unauthorized use of the plaintiff with the consent of the user, according to the law to bring together and have commercial value of the data, and enough to substantially replace some of the products or services provided by other operators, to the detriment of the market order of fair competition, constituting unfair competition.

In this case, the court also analyzed the legitimacy of the crawling behavior from the perspective of the "superposition of ternary goals".

Xu Hongtao mentioned, for example, that the legitimacy of non-search engine crawlers depends on whether the defendant respects the Robots protocol of the crawled website, whether it destroys the technical measures of the crawled website, whether it is sufficient to protect the security of user data, and whether it weighs up the creativity and the public **** interest.

He noted that if data is crawled at the expense of jeopardizing the security of user data, and the application of crawler technology does not create new quality resources, but merely burdens the servers of others, it is likely to be evaluated negatively in terms of the legitimacy of the behavior.