Current location - Loan Platform Complete Network - Big data management - What are the main problems that hadoop solves
What are the main problems that hadoop solves

Hadoop implements a distributed file system designed to be deployed on inexpensive hardware; and provides high throughput for accessing an application's data, making it suitable for applications with very large data sets.

Hadoop has been widely used in big data processing applications due to its own natural strengths in data extraction, transforming, and loading (ETL).Hadoop's distributed architecture, which puts the big data processing engine as close to the storage as possible, is relatively suitable for, for example, batch operations like ETL, where the results of the batch process can go directly to the storage.

Hadoop's MapReduce functionality implements the ability to break up individual tasks and send the fragmented tasks (Map) to multiple nodes, which are later loaded (Reduce) into a data warehouse as a single dataset.

Extended Information

Hadoop consists of many elements. At its bottom is HDFS, which stores files on all the storage nodes in the Hadoop cluster.The upper layer of HDFS is the MapReduce?engine, which consists of JobTrackers and TaskTrackers.

By introducing HDFS, the distributed file system at the very core of the Hadoop distributed computing platform, MapReduce processing, and Hive, the data warehouse tool, and Hbase, the distributed database, it basically covers all the technical core of the Hadoop distributed platform.

For external clients, HDFS is like a traditional hierarchical file system. Files can be created, deleted, moved or renamed, and so on. But the architecture of HDFS is built on a specific set of nodes, which is determined by its own characteristics.

These nodes include the NameNode (only one), which provides metadata services within HDFS, and the DataNode, which provides storage blocks for HDFS.

Baidu Encyclopedia-Hadoop