Wednesday, 27 March 2019

Hadoop : Part - 3


Checkpoint node

Checkpoint Node keeps track of the latest checkpoint in a directory that has same structure as that of NameNode’s directory. Checkpoint node creates checkpoints for the namespace at regular intervals by downloading the edits and fsimage file from the NameNode and merging it locally.

Backup node

It maintains its up-to-date in-memory copy of the file system namespace that is in sync with the active NameNode.

Overwriting replication factor in HDFS

The replication factor in HDFS can be modified or overwritten in 2 ways-
  • Using the Hadoop FS Shell, replication factor can be changed per file basis using the below command-$hadoop fs –setrep –w 2 /my/test_file (test_file is the filename whose replication factor will be set to 2)
  • Using the Hadoop FS Shell, replication factor of all files under a given directory can be modified using the below command- $hadoop fs –setrep –w 5 /my/test_dir (test_dir is the name of the directory and all the files in this directory will have a replication factor set to 5

Edge nodes

Edges nodes or gateway nodes are the interface between hadoop cluster and the external network. Edge nodes are used for running cluster adminstration tools and client applications.

InputFormats in Hadoop

  • TextInputFormat
  • Key Value Input Format
  • Sequence File Input Format

Rack

It is the collection of machines around 40-50. All these machines are connected using the same network switch and if that network goes down then all machines in that rack will be out of service. Thus we say rack is down.

Rack awareness

The physical location of the data nodes is referred to as Rack in HDFS. The rack id of each data node is acquired by the NameNode. The process of selecting closer data nodes depending on the rack information is known as Rack Awareness.

Replica Placement Policy

The contents present in the file are divided into data block. After consulting with the NameNode, client allocates 3 data nodes for each data block. For each data block, there exists 2 copies in one rack and the third copy is present in another rack. This is generally referred to as the Replica Placement Policy.


In the middle of difficulty lies opportunity.. 

No comments:

Post a Comment