have stronger decision-making, insight discovery and process optimization capabilities.
First of all, big data is a big concept, and now it is used in many fields, such as: the Internet, advertising, finance, energy, transportation and so on. And Python is a programming language that can be used to process and analyze data generated in various fields. Many beginners often compare which language is the best, in fact: there is no best programming language, only the most applicable, that is, the most applicable in a certain kind of medium scenarios!!!! Often there will be a lot of little friends in some forums shouting, PHP is the world's first language; Java is the universe's first language; life is short, I use Python and so on.
Objectively speaking, if you want to engage in the direction of big data development in the future, both languages may be used, you can learn one first, because the language is common, after learning one, then learn the other very quickly! Don't get hung up on which language to actually learn, the important thing is to take action and learn a programming language first! Because later you may also learn new languages, for example, now Spark is very hot, and have to learn Scala!
Recommended to learn "python tutorial"
Python, Java and C++ were compared in a forum before, and I think it's a very vivid image: Python is a bicycle, which can be taken over and ridden, but the fastest speed is only a hundred kilometers; Java is a large-scale transport aircraft, which is very large, and flies faster and faster; and C++ is a missile, which can be flown as soon as you press the launch button, and then whoosh! C++ is a missile, once you press the launch button, it whooshes away, and can fly up to several times the speed of sound. After comparing, it's easy to get started, but Python has the worst performance, C++ has the highest performance, but trying to master C++ is like controlling a missile, it's more costly and difficult, and Java stands out, and you'll find that the vast majority of frameworks in the big data ecosystem are now written in Java or run on top of a JVM!!!!
Internet companies do it this way: if the amount of processing is relatively large, then first use Hadoop or Spark for one or more times, and then save the processed results. If the amount of data is small and have to do some data mining or machine learning, will tend to use Python, because Python's machine learning algorithms are more, more perfect! But Hadoop and Spark also have corresponding machine learning libraries, such as Hadoop's Mahout and Spark MLlib, but the algorithm is relatively small, with the development of time, it will be more and more perfect! So whether to use Spark or Python depends on the amount of data and the complexity of the business to decide!
In terms of big data processing and analysis, python is more applicable to some, it is recommended to learn python first, after all, now the technology in the big data ecosystem can not be detached from the python, and it is easy to learn other languages later!