The legal risks that may be brought by using crawler technology mainly come from several aspects:
(1) Violates the will of the crawled party, such as circumventing the anti-crawling measures set by the website and forcibly breaking through its anti-crawling measures;
(2) The use of crawler has caused the actual consequences of interfering with the normal operation of the visited website;
(3) Crawlers capture specific types of information protected by law. Among them, the third risk mainly comes from grabbing unpublished information on the internet by avoiding reptiles.
A: It is not illegal to abide by the robot agreement.
A: Check the website domain name plus the file under the link/robots.txt.
For example, tiktok:/robots.txt.
User agent: robots (such as "Googlebot", etc.) to which the following rules apply. ).
Disallow: the page that you want to prevent the robot from accessing (multiple lines are prohibited as needed).
Block the whole website: not allowed:/
Block the directory and everything in it: Not allowed: /private_directory/
Blocked page: Not allowed:/private _ file.html.
Block pages and/or directories named private: Disallow: /private.
Allow: Pages that do not need to be blocked by robots.
Noindex: Pages that you want search engines to block from indexing (or de-index them if they have been indexed before). Support Google, not Yahoo and Live Search. Other searches are unknown.
For example, in order to let the robot check everything/tutorial/Zhang Zhan/2017/061771/
Reference: /article/2 172053.html