Current location - Loan Platform Complete Network - Big data management - Is Python good for big data volumes?
Is Python good for big data volumes?

Python can handle big data, python is not necessarily optimal for big data processing. It is suitable for big data processing. Not big data volume processing. If big data volume processing, you need to use concurrent structures, such as using python on hadoop, or a distributed processing framework that you make yourself.

The advantage of python is not in operational efficiency, but in development efficiency and high maintainability. Picking the right tool for a particular problem is also a technical skill in itself.

Python's advantages in data processing (not big data processing):

1. exceptionally fast development speed and low amount of code

2. rich data processing packages, no matter whether it's regular or html parsing, xml parsing, it's very easy to use

3. low cost of using internal types, no need to do any additional operations (java, c++ use a java, c++, xml parsing), no need to do any additional operations (java, c++ use a java, c++, xml parsing), no need to do any additional operations. (java, c++ with a map are very difficult)

4. company, a very large number of data processing work is not required to face the very large data

5. huge data can not be solved by the language, need to deal with data processing frameworks (hadoop, mpi), although niche, but python still have to deal with the framework of big data, or Some frameworks also support python.

Extensions:

Python processing data Disadvantages:

Disadvantages of Python for processing big data:

1, python threads have gil, commonly known as multi-threading can only run on one core, wasting multi-core servers. In a common scenario is fatal: there are huge data *** enjoyment or *** use between concurrent units (e.g. big dict).

Multi-process will lead to memory constraints, multi-threaded can not solve the problem of data *** enjoyment, a separate write a process between the responsible for maintaining the read and write this data is not only inefficient and troublesome

2, python execution efficiency is not high, when dealing with big data, the efficiency is not high, it is true, pypy (a jit python interpreter. It is true that pypy (a jit python interpreter, which can be interpreted as a scripting language to accelerate the execution of things) can improve the speed of a lot, but pypy does not support a lot of python's classic packages, such as numpy.

3. The vast majority of large companies, using java to deal with big data, whether it is the environment or accumulation, it will be much better.

Baidu Encyclopedia-Python