Current location - Loan Platform Complete Network - Big data management - The difference between the three modes of operation of big data hadoop, and the detailed configuration of the explanation
The difference between the three modes of operation of big data hadoop, and the detailed configuration of the explanation

Based on Hadoop development, sometimes Hadoop's mode of operation will be confused, silly distinction between the various modes of operation, to the daily development of a lot of confusion, different clusters of configuration files are also different. To understand the mode of operation of Hadoop and the role of the configuration file to do in mind, in order to work in the hands of smooth.

hadoop configuration files are configured as XML files, and there are four most common configuration files:

core-site.xml file is mainly used to configure common attributes.

The hdfs-site.xml file is used to configure Hdfs properties.

mapred-site.xml file is used to configure Mapreduce properties.

The yarn-site.xml file is used to configure the properties of Yarn.

Generally, all four configuration files are stored in the etc/hadoop subdirectory of the default hadoop installation directory. However, we can also copy the etc/hadoop directory and the files under it to another location, depending on the actual requirements when building the cluster. This separates the configuration files from the installation files and makes them easier to manage.

Note: If you copy the etc/hadoop directory and the files under it to another location.

We need to set hadoop_conf_dir in the environment variable to point to the new directory.

1. Native mode

No daemon is required, all programs run on the same JVM. Debugging MR programs in local mode is very efficient and convenient, and it is usually used for debugging during the learning or development phase.

2. Pseudo-distributed mode

The Hadoop daemon runs on the local machine, simulating a small-scale cluster, in other words, you can configure a Hadoop cluster on a single machine, and pseudo-distributed is a special case of fully distributed.

3. Fully distributed mode

The Hadoop daemon runs on a cluster. This mode of operation is also known as the various clouds we commonly see, and is mainly used in large-scale production environments.

Note: Distributed to start the daemon , means that when using distributed hadoop, you have to start some preparatory program processes before you can use it. For example, start-dfs.sh start-yarn.sh, while local mode does not need to start these daemons.

Note: In local mode, the local file system and the local MapReduce runner will be used. In distributed mode, the HDFS and YARN daemons will be started.