Proxy IP acquisition interface, if it is an ordinary proxy IP, use the ProxyGetter interface to grab the latest proxy IP from the proxy source site; if it is required to consume the proxy IP, generally have to provide access to the IP API, there will be certain restrictions, such as how many extracted each time, how many seconds between extractions.
Proxy IP database, used to store the proxy IP obtained on the dynamic VPS, it is recommended to choose SSDB. SSDB performance is very prominent, and Redis is basically equivalent to the Redis is a memory-based, capacity issues is a weakness, and the cost of memory is too high, SSDB for this weakness, the use of hard disk storage, the use of Google's high-performance storage engine. LevelDB, suitable for large data volume processing and optimize performance to the Redis level.
Proxy IP inspection program, the proxy IP has a timeliness, after the expiration date will be invalid, so you need to check the validity. Set up a regular inspection program to check the validity of the proxy IP, remove invalid IP, high latency IP, and warning, when the IP pool IP is less than a certain threshold, according to the proxy IP access interface to obtain new IP.
Proxy IP pool external interface in addition to the proxy dial-up server to obtain the proxy IP pool, you also need to design an external interface, through which to call the IP pool IP for the crawler to use. IP for the crawler to use. Proxy IP pool function is relatively simple, the use of Flask can be dealt with. The function can be to provide get/delete/refresh interfaces to the crawler to facilitate the crawler to use directly.