The Apache Hadoop system
The Apache Hadoop system is the software (open source) most commonly associated with Big Data. Is designed to handle hundreds of computers from individual servers, each offering local computing and storage. It works as a framework that allows large volumes of data to be processed through groups of computers using simple programming models.
The Apache Hadoop system is designed to handle hundreds of computers from individual servers, each offering local computing and storage. This system is based on Java and allows calculation tasks to be fragmented into different processes and distributed in the nodes of an interrelated group of computers so they can work in parallel. In fact, thousands of computers can be used, which makes better financial sense as they only require several standard servers instead of a latest generation machine.
Rather than depending on the hardware to ensure high availability, Apache Hadoop is designed to detect and manage faults in the application layer.
Hadoop is a very extensive software package, which is why it is sometimes known as the Hadoop ecosystem. Along with the central components (Core Hadoop), this package includes a wide variety of extensions (Pig, Chukwa, Oozie and ZooKeeper) that add a large number of extra functions to the framework and serve to handle large volumes of data groups.
The basis of the Hadoop ecosystem is the Core Hadoop. However, the project includes the following modules:
- Hadoop Common: the common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS): a file distribution system that provides high access to application data.
- Hadoop YARN: a framework for scheduling work and resource management grouping.
- Hadoop MapReduce: a system based on YARN for parallel processing of large volumes of data.
The first version comprises the basic module Hadoop Common, the Hadoop Distributed File System (HDFS) and a MapReduce engine. Starting with version 2.3, this last element was replaced by the YARN interconnected computer group management technology, also called MapReduce 2.0.
Sign up to the BBVAOPEN4U newsletter and receive tips, tools and the most innovative events directly in your inbox.