Python language: Python combines the rapidity of R language, the ability to deal with complex data and more pragmatic language features, and quickly becomes mainstream, simpler and more intuitive, especially in recent years. In the field of data processing, there is usually a trade-off between scale and complexity. Python, as a compromise, is a fairly good data processing tool.
Java language: java does not have the same visualization function as Python and R language, nor is it the best tool for statistical modeling, but if you need to build a huge system and use the prototype of the past, Java is the most basic choice.
Hadoop pand
Hive: In order to meet the needs of a large number of data processing, big data based on java was started. Hadoop is a batch of data processing, which is the key to develop a java-based architecture. Compared with other processing tools, Hadoop is much slower, but it is extremely accurate, can be widely used in back-end database analysis, and matches Hive well.
Scala: Another java-based language, similar to java, is a new tool for anyone who wants to do large-scale mechanical learning or establish advanced algorithms. Scala is good at presentation and has the ability to build a reliable system.
Kafkaand Storm: It is a very fast information query system, but its disadvantage is that it is too fast, so it will make mistakes when performing operations, and sometimes it will leak things. The architecture written in Scala has greatly increased its popularity in stream processing.