In the big data industry, the main working links include: big data acquisition, big data pre-processing, big data storage and management, big data analysis and mining of big data display and application (big data retrieval, big data visualization, big data application, big data security, etc.). Simply put, the three are data, data, data I. Big Data Data Acquisition and Preprocessing Big Data Acquisition is generally divided into Big Data Intelligent Sensing Layer, which mainly consists of data sensing system, network communication system, sensing adapted system, intelligent identification system and software and hardware resource access system, which realizes intelligent identification, positioning, tracking, access, transmission, signaling, signalization, monitoring, preliminary processing, and management of structured, semistructured, and unstructured huge amount of data. conversion, monitoring, preliminary processing and management. Basic Support Layer: Provides virtual servers, structured, semi-structured and unstructured data databases and IoT resources. Big Data Preprocessing: Completes preliminary identification, extraction, cleaning and other operations of received data. Generic related technologies: support for logging systems in a variety of data senders customized sink NG real-time log collection system for data collection, while simply processing data, Logstore is an open source server-side data processing pipeline, you can collect data from multiple sources at the same time, the data is converted, and then the data will be sent to the "repository"; SQOP is used to transfer data from relational database and Hadoop to Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. SQOP is used to transfer data from relational databases and Hadoop to Hadoop, and data in Hadoop can be imported into relational databases; Zookeeper is a distributed, open-source, distributed application orchestration service that provides data synchronization services.
Mathematical knowledgeMathematical knowledge is fundamental for data analysts. For junior data analysts, understanding some of the basics related to describing statistics and having some formula calculation ability is sufficient, and understanding common statistical modeling algorithms is a plus. For senior data analysts, statistical modeling related knowledge is a necessary ability, linear algebra (mainly matrix calculation related knowledge) is also best to have some understanding. For data mining engineers, in addition to statistics, all kinds of algorithms also need to be used skillfully, the highest requirements for math. Analysis tools for junior data analysts, play around with Excel is a must, pivot tables and formulas must be skilled in the use of VBA is a plus. Also, learn a statistical analysis tool, SPSS is better as a starter. For advanced data analysts, the use of analytical tools is the core competence, VBA is basically a must, SPSS/SAS/R should be skilled in using at least one of them, and other analytical tools (e.g., Matlab) depending on the situation. For data mining engineers ...... well, will use with Excel on the line, the main work to rely on writing code to solve it. Programming language for junior data analysts, will write SQL queries, write Hadoop and Hive queries if needed, basically OK. For senior data analysts, in addition to SQL, learning Python is very necessary, used to obtain and process data is twice the effort. Of course other programming languages are also available. For data mining engineers, Hadoop must be familiar with, Python/Java/C++ must be familiar with at least one, Shell must be able to use ...... In a word, programming language is definitely the most core ability of data mining engineers. Business Understanding It is not too much to say that business understanding is the foundation of all the work of a data analyst. The data acquisition program, the selection of indicators, and even the insight of the final conclusion all depend on the data analyst's understanding of the business itself. For junior data analysts, the main work is to extract data and do some simple charts, as well as a small number of insight conclusions, with a basic understanding of the business can be. For senior data analysts, it is necessary to have a more in-depth understanding of the business, can be based on the data, distill the effective point of view, the actual business can help. For data mining engineers, a basic understanding of the business can be, the focus still needs to be put on their own technical skills. Logical thinking this ability in my previous article mentioned less, this time separate out to say a few words. For junior data analysts, logical thinking is mainly reflected in the data analysis process, each step has a purpose, know what kind of means they need to use, what kind of goal to achieve. For senior data analysts, logical thinking is mainly reflected in the construction of a complete and effective analysis framework, understand the correlation between the analysis of the object, and be clear about the causes and consequences of the changes in each indicator, which will bring the impact of the business. For data mining engineers, logical thinking is not only reflected in the analysis work related to business, but also includes algorithmic logic, program logic, etc., so the requirements for logical thinking is also the highest. Data visualization data visualization said very high, in fact, including a wide range, do a PPT put on the data charts can also be considered data visualization, so I think this is a universal need for the ability. For junior data analysts, can use Excel and PPT to make basic charts and reports, can clearly show the data, the goal has been reached. For senior data analysts, they need to explore better data visualization methods, use more effective data visualization tools, and make data visualization content that is simple or complex but suitable for the audience to view according to actual needs. For data mining engineers, it is necessary to understand some data visualization tools, and also to make some complex visualization charts according to the needs, but usually do not need to consider too much beautification. Coordination and communication For junior data analysts, understanding the business, finding data, and explaining reports require dealing with people from different departments, so communication skills are important. For senior data analysts, you need to start to bring projects independently or do some cooperation with products, so in addition to communication skills, you also need some project coordination skills. For data mining engineers, communication with people on the technical side of the content is more, the business side is relatively less, the requirements for communication and coordination is also relatively low. Fast learning regardless of which direction to do data analysis, junior or senior, need to have the ability to learn fast, learn business logic, learn industry knowledge, learn technical tools, learn analysis framework ...... data analysis field there are endless content, need to have a heart that never forgets to learn.