System: Windows 10
The big data processing process includes: data collection, data preprocessing, data warehousing, data analysis, data presentation.
1, the concept of data collection: the industry will have two interpretations: one is the process of data from scratch (web server print logs, custom collection of logs, etc.) is called data collection; on the other hand, there are also through the use of Flume and other tools to collect data to the specified location of the process is called data collection.
2, data preprocessing: through the mapreduce program on the collection of raw log data preprocessing, such as cleaning, formatting, filtering dirty data, and so on, and combed into a clickstream model data.
3, data into the library: after the preprocessing data into the HIVE warehouse in the corresponding library and table.
4, data analysis: the core of the project, that is, according to the requirements of the development of ETL analysis statements, resulting in a variety of statistical results.
5, data presentation: data visualization of the data obtained from the analysis, generally through the display of charts.