Big data can be used with Python.
Why python big data?
From the wikipedia introduction to big data, we see that there are two steps needed for big data to become an information asset: how the data came to be, and how the data is processed.
How does the data come?
In the data how to come to this issue, data mining is undoubtedly the preferred choice of many companies or individuals, after all, most companies or individuals are not capable of generating so much data, it can only be mining the Internet related data.
Web crawler is Python's traditional strong field, the more popular crawler framework Scrapy, HTTP toolkit urlib2, HTML parsing tools beautifulsoup, XML parser lxml, and so on, are able to stand alone class library.
Of course, web crawlers are not just about opening web pages and parsing HTML. Efficient crawler to be able to support a large number of flexible concurrent operations, often to be able to simultaneously capture thousands or even tens of thousands of web pages at the same time, the traditional thread pooling approach to resource wastage is relatively large, the number of threads after the thousands of system resources basically a waste of all the scheduling of the threads.
Python has a lot of concurrency libraries based on its ability to support concurrent (Coroutine) operations, such as Gevent, Eventlet, and distributed task frameworks like Celery. ZeroMQ is considered more efficient than AMQP is also early to provide a Python version. With support for high concurrency, web crawlers can really reach big data scale.
Data processing:
With big data, then it also needs to be processed in order to find the right data for you. And in the direction of data processing, Python is also one of the more preferred languages for data scientists, this is because Python itself is an engineering language, and the algorithms that data scientists implement in Python can be used directly in their products, which is very helpful for big data startups to save costs.
For more Python knowledge, follow the Python video tutorial section.