Current location - Loan Platform Complete Network - Big data management - Now learn hadoop from which version to start!
Now learn hadoop from which version to start!
Big Podium hadoop training for you:

1, Apache Hadoop version 2.0, with the following modules:

Hadoop Common Module, a common toolset that supports other Hadoop modules;

Hadoop Distributed File System (HDFS), which supports high-throughput access to application data for the distributed file system;

Hadoop YARN, a framework for job scheduling and cluster resource management;

Hadoop MapReduce, a YARN-based parallel processing system for big data.

Hadoop is currently available in distributions from numerous vendors in addition to the community edition.

2, Cloudera: the most molded distribution with the most deployment cases; provides powerful deployment, management and monitoring tools. Developed and contributed to the Impala project that can handle big data in real time.

3, Hortonworks: 100% open source Apache Hadoop only provider. Hortonworks is the first provider to use the metadata service features of Apache HCatalog. Moreover, their Stinger greatly optimizes the Hive project. hortonworks provides a very good and easy to use sandbox. hortonworks developed a lot of enhancements and submitted to the core backbone, which enables Apache Hadoop to be used on Microsoft Servers and Windows Azure, including Windows Servers. Azure and Microsoft Windows platforms, including Windows Servers and Azure, to run natively.

4. MapR: It uses a number of different concepts compared to its competitors, notably support for native UNIX filesystems rather than HDFS (using non-open-source formations) in order to get better performance and ease of use. We can use native UNIX commands instead of Hadoop commands. In addition to this, MapR differentiates itself from its competitors with high-availability features such as snapshots, mirroring, or stateful failure recovery. The company also leads the Apache Drill project, a reimplementation of Google's Dremel's open source project to execute SQL-like queries on Hadoop data to provide real-time processing.

5, Amazon Elastic Map Reduce (EMR): Distinguished from the other providers, this is a hosted solution that runs on a network scale consisting of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service ( Amazon S3) on top of a web-scale infrastructure consisting of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). In addition to the Amazon distribution, it is also possible to use MapR on EMR, with temporary clustering being the primary use case. If you need to process big data on a one-time or infrequent basis, EMR may save you big bucks. However, there are disadvantages. It only includes Pig and Hive projects from the Hadoop ecosystem, and by default does not include many other projects. Also, EMR is highly optimized to work with data in S3, which has high latency and doesn't target data on your compute nodes. So file IO on EMR is much slower and has higher latency than your own Hadoop cluster or your private EC2 cluster.