Current location - Loan Platform Complete Network - Big data management - Which is better python or r data analysis
Which is better python or r data analysis
In 2012 we said that R was the mainstream in academia, but now Python is slowly taking over R's place in academia. I don't know if it's because of the age of big data.

Python is faster than R. Python can directly process G data; R can't, when R analyzes the data, it needs to first transform the big data into small data through the database (through groupby) before it can be handed over to R to do the analysis, so R can't analyze the behavioral details directly, but only analyze the statistical results. So some people say: Python = R + SQL / Hive, is not unreasonable.

Python's one of the most obvious advantage lies in its glue language features, many books will also mention this point, some of the underlying algorithms written in C encapsulated in the Python package performance is very efficient

(Python's data mining package Orange canve

decision tree analysis of 500,000 users in 10 seconds out of the results! However, nothing is absolute, if R vectorized programming is done well (a little difficult), it will

make the speed of R and the length of the program have a significant increase.

The advantage of R is that there is a wide range of statistical functions that can be called, especially for time series analysis, and both classical and cutting-edge methods have packages that are directly accessible.

In contrast, Python was previously quite anemic in this area. But now Python has

pandas. pandas provides a standard set of tools and algorithms for processing time series. As a result, you can efficiently process very large time series, easily slice/chunk, aggregate, resample regular

/irregular time series, and more. As you may have guessed, most of these tools are particularly useful for financial and economic data, but you can certainly use them to analyze server log data as well. And so, in recent

years, Python has had ever-improving libraries (mainly pandas) that have made it a great alternative for data processing tasks.

Done a few experiments:

1. Implemented a statistical method in python, which used ctypes, multiprocess.

After that a project to do method comparison, and back to R, and found that some bioconductor on the package has defaulted to parallel. (But that package is still very slow, all of a sudden all the threads are used up, resulting in the use of the entire computer can not, look at the web page is very card ~)

2. python pandas to do some data organization work, similar to the database, two or three tables back and forth to check and match. It still feels very convenient. Although these jobs R can also do, but I guess it will be slower, after all, hundreds of thousands of rows of entries.

3. python matplotlib drawing. pyplot drawing way and R is very different, R is a command to draw a little bit of east

west, pylot is ready to come out together. pyplot's color selection is a bit awkward, the default color is relatively small, and after that, the available html color, but the name is too long. pyplot

legend is much better than R, it is semi-automatic. pyplot can be drawn and scaled freely, and then saved as a picture, this is better than R.

Overall, Python is a well-balanced language that is good at everything, whether it's calling other languages, connecting to data sources, reading data, manipulating the system, or doing regular expressions and word processing

Python has a clear advantage.

And R is more prominent in statistics. But data analysis is not just statistics, data collection, data processing, data sampling, data clustering, and more complex data mining algorithms, data modeling and so on

These tasks, as long as the data is more than 100M, R is difficult to handle, but Python is basically competent.

Combined with its strength in general-purpose programming, Python is the only language we can use to build data-centric applications.

But there is no such thing as the best software or program in the world, and few people can mine a single language to its fullest potential. In particular, many people learned R earlier and now can't be bothered to use it at all, so for those who want to learn and use it, it would be nice to combine R and Python.