For example, for example, if you are engaged in copy editing work, the demand for large amounts of manuscripts, but the efficiency is very low, one of the biggest reasons is that a lot of time is spent in the collection of information, if you continue to browse manually in accordance with the previous way, either you stay up all night to work overtime, or is to let other people to help you, but obviously both are inconvenient. In this case, the network crawler is very important.
With the advent of the big data era, the position of the network crawler in the Internet will be more and more important. The data in the Internet is massive, how to automatically and efficiently get the information we are interested in in the Internet and use it for us is an important issue, and the crawler technology is born to solve these problems.
The information we are interested in is divided into different types: if we just do a search engine, then the information we are interested in is as many high-quality web pages as possible in the Internet; if we want to get the data of a certain vertical field or have a clear retrieval demand, then the information we are interested in is the information we locate according to our retrieval and demand, at this time, we need to filter out some useless information. The former we call generalized web crawlers, the latter we call focused web crawlers.