How to optimize the operation efficiency of mapreduce job

The optimization of MapReduce program mainly focuses on two aspects: one is the optimization of computing performance; the other is the optimization of IO operations.

Specifically embodied in the following several aspects:

1. Task scheduling

a. Try to select idle nodes for computation

b. Try to assign the task to the machine where the InputSplit is located

2. Data preprocessing and InputSplit size

Try to process a small amount of large data; rather than a large number of large data. data; rather than a large amount of small data. So you can preprocess the data once before processing and merge the data.

If you are too lazy to merge the data yourself, you can refer to using the CombineFileInputFormat function. Please refer to the relevant function manual for specific usage.

3. Number of Map and Reduce tasks

The number of tasks in the Map task slot needs to refer to the running time of the Map, while the number of Reduce tasks only needs to refer to the number of tasks in the Map slot, which is usually 0.95 or 1.75 times.

4. Use the Combine function

This function is used to merge data locally and can greatly reduce network consumption. Please refer to the function manual for details.

5. Compression

You can compress some intermediate data to reduce network consumption.

6. customcomparator

You can customize the data type to achieve more complex purposes.

Location big data insight

Semiconductor packaging and testing leader Changdian Technology completed 5 billion yuan of capital increase

The latest big data report on Liaoyuan epidemic situation

"Health treasure" privacy leak: data protection must be closed-loop management, how to protect privacy in the era of big data?

What I want to ask is about the national scholarship, is it a uniform one? What should I do if other students in my class have been able to check and I haven't received mine yet?

Hubei justice involved in lawsuits, administrative penalties and other legal information can be seen on the Huifa network above?

How about jhemcu flight controls?

The use of inverter motor noise

What was the name of that movie with the bugs? It's the one with the brain bugs, forgot the title.

How to beat the war without inflammation, I have Kuba, bird of fire, magic inflammation, alloy, penguin, Klin, bubbles, a month, Xuanwu, Qinglong, flower lion