Big data collection platforms are Flume, Kafka, Logstash, Fluentd, Sqoop and so on.
1, Flume
Apache Flume is a distributed, reliable, and highly available system for efficiently collecting, aggregating, and moving large amounts of log data.Flume supports a wide variety of data sources, including Avro, Thrift, JMS, Netcat, and more. It also provides a variety of output methods, such as HDFS, HBase, Elasticsearch and so on.
2, Kafka
Apache Kafka is a distributed stream processing platform with high throughput, low latency and scalability. It is suitable for massive real-time streaming data processing scenarios, such as log collection, monitoring metrics collection and so on.
3, Logstash
Logstash is a tool for collecting, filtering and forwarding logs and events, which supports a variety of input sources, filters and output plug-ins, can be flexibly adapted to the needs of different scenarios.Logstash also provides visualization tools such as Kibana, which makes it easy for users to analyze and display data.
4, Fluentd
Fluentd is an open source data collector , supports a variety of data sources and output methods. The design goal is to achieve simple , lightweight , high performance and scalability , Fluentd also provides a plug-in mechanism that can easily extend its functionality .
5, Sqoop
Apache Sqoop is a tool for transferring data between Apache Hadoop and relational databases. It supports a variety of relational databases such as MySQL, Oracle, PostgreSQL, etc. Sqoop can import data from relational databases into Hadoop for analysis and processing.