Limit the Storage contributed by Data Node to Name Node (Hadoop Cluster)

HDFS (Hadoop Distributed File Storage) Architecture

Use case: We have to restrict the amount of storage contributed to the Name Node.

But why?

Simply, because so that we can use left storage for any other purposes.

So, we will use the Partitioning Concept to achieve our purpose.

Pre-Requisite:

Setup Hadoop Cluster first (Name Node/Master Node and Data Node/Slave Nodes)

Steps to be followed:

  1. Create and attach a Virtual Hard Disk to the Slave Node.
  2. Create partition in the created Hard Disk.
  3. Format the created partition (Through formatting, INode Table will be created which will have all the information about the Data Nodes and also the files present in the Hadoop Cluster, etc.)
  4. Mount the created Partition with the Directory which is to be shared with the Name Node.

Create and Attach Virtual Hard Disk

Make sure your Slave node machine is in the OF state.

  1. Go to the Settings of the Data Node Virtual Machine.
  2. Click (+) icon in the storage settings.

3. Click on create.

4. Click on Next

Select VDI (Virtual Hard Disk Image)

5. Again, click Next

Select Dynamically Allocated

6. Click Next

Choose the highlighted Disk

7. Click on ‘Choose’.

Now, you can see one more Storage in the settings. It is created and attached to your Data Node VM

8. Now, Start the Data Node. Go to Terminal

9. Check if the newly created Virtual Hard Disk is attached or not using the following command:

fdisk -l

You can see here /dev/sdb of 8 GiB

10. Now, Create Partition in the Virtual Hard Disk

But why we need to create a partition? Because, you can’t use the storage device directly. You have to compulsorily create a partition in it in order to use it.

Run ‘ fdisk /dev/sdb ’ command :

a). Type ‘ n ’ for New Partition

b). Type ‘ p ’ for Primary Partition (default)

c). Type Partition Number: 1 (default)

d). First Sector Size: 2048 (2MB, By default)

e). Last Sector Size: +2G

f). Type ‘ w ’ to Save this Partition

Steps to Create a Partition

11. Run “ fdisk -l /dev/sdb ” to check the details of the partition.

12. Now, load a driver for the newly created partition. Drivers are needed to communicate with any device as we add it.

Run the following:

udevadm settle

13. Format the Partition. But why we format? This will create a INode Table which will contain all the metadata about the device. It’s like the index table.

Run the following:

mkfs.ext4 /dev/sdb1

ext4 is a type of formatting in Linux. There are other types of formatting as well

14. Create a directory in the root storage. And then mount the folder with the newly created partition.

Run the following:

mkdir /datanode_partition

mount /dev/sdb1 /datanode_partition

Now, mounted the directory with the newly created partition

15. Now, go to /etc/hadoop directory and open hdfs-site.xml.

And in the value tag, add the name of the directory which you have mounted with the newly created partition.

16. Start the Data Node now using

hadoop-daemon.sh start datanode (Make sure Name Node is running)

17. Run

hadoop dfsadmin -report (This will show the datanodes connected with the namenode, their IP Address, Storage they contribute, etc.

So, now you can see that 1.91 GB is shared with the Name Node

Summary:

So, we have contributed only limited storage to the Name Node by creating a new hard disk and creating a partition in it. And then, the directory to be shared is mounted with the partition and that is shared with the Name Node.

Thank you

Hope you liked it!!

Data Science, Big Data, Cloud Computing