Hadoop ecosystem framework in the ecosystem includes the following major components, in addition to the following components are not part of the Hadoop ecosystem.
1) HDFS: a distributed file system that provides highly available access to application data.
2) MapReduce: a programming model for processing large data sets in parallel.
3) HBase: a scalable distributed database that supports structured data storage with large tables. It is a column-oriented NoSQL database built on top of HDFS for reading/writing large amounts of data quickly.
4) Hive: a data warehouse infrastructure built on Hadoop. It provides a range of tools; it can be used for data Extract Transform Load (ETL), which is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop.Hive defines a simple SQL-like query language, called HQL, which allows developers unfamiliar with MapReduce to write data query statements, which are then translated to MapReduce tasks on top of Hadoop.
5) Mahout: Extensible machine learning and data mining library. It provides MapReduce contains many implementations, including clustering algorithms, regression testing, statistical modeling.
6) Pig: a high-level data flow language and execution framework to support parallel computing. It is an abstraction of the complexity of MapReduce programming.The Pig platform includes a runtime environment and a scripting language (PigLatin) for analyzing Hadoop datasets. Its compiler translates PigLatin into sequences of MapReduce programs.
7) Zookeeper: - A high-performance orchestration service for distributed applications. It is a software that provides consistency services for distributed applications, providing features such as configuration maintenance, domain name services, distributed synchronization, group services, and so on.
8) Amban:A Web-based tool for provisioning, managing, and monitoring Hadoop clusters, including support for HDFS, MapReduceAHive, HCatalog, HBase, ZooKeeperAOozie, Pig, and Sqoop.Ambari also provides a visual dashboard to view the health of the cluster and enables users to visualize MapReduce, Pig, and Hive applications to diagnose their performance characteristics.
9) Sqoop: A connectivity tool for moving data between relational databases, data warehouses, and Hadoop.Sqoop utilizes database technology to describe the architecture and perform import/export of data; MapReduce for parallelized operations and fault-tolerant technology.
10) Flume:Provides a distributed, reliable, and efficient service for collecting and aggregating big data and transferring large amounts of data from a single computer to HDFS.It is based on a simple and flexible architecture and provides streaming of data. It utilizes a simple and scalable data model to move data from multiple computers in an enterprise to Hadoop.