What is Hadoop Distributed File System (HDFS)?

The Hadoop Distributed File System (HDFS) is a system that is designed to store very large data sets in an extremely reliable environment. The system can also stream these data sets at high bandwidth to user applications. The HDFS consists of a large cluster of servers that can host directly attached storage and also execute user application tasks. The flexible system can accommodate the growing needs of any business as it distributes storage and computation across multiple servers.

Hadoop offers a critical feature that’s very important for businesses. It can bring about partitioning of data and computation across numerous hosts. A Hadoop cluster can scale storage and computation capacity and I/O bandwidth by merely adding commodity servers.

The key features of HDFS are:

• It is best suited for distributed storage and processing.

• It provides a command interface to interact with HDFS.

• The built-in servers help users to check the status of cluster without any effort.

• You can get streaming access to file system data.

• Provides authentications and file permissions

HDFS has elements called namenode and datanode.

The namenode is the commodity hardware and includes the GNU/Linux operating system and the namenode software. It helps manage the file system name space and also regulates client’s access to files. It also executes file system operations such as renaming‚ closing and opening of files and directories.

The datanode is a commodity hardware powered by the GNU/Linux operating system and datanode software. For every node in the cluster, there will be a datanode. These nodes are used for data storage management. Datanodes can do read-write functions on the file systems and also carry out block creation‚ replication and deletion functions.

HDFS comes with mechanisms for fast and automated fault detection and recovery. When the computation occurs close to the data‚ tasks requested can be done quickly and efficiently.