Make the Hadoop Cluster Elastic


🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Ever thought how we can increase the storage contributed by the DataNode dynamically to the NameNode.

This is a very critical use-case in the Industry. There are situations when the storage contributed by the DataNode gets full, but just to make the data available, so we have to keep it ON.

So, in this practical, we will instead increase the storage contributed dynamically to the NameNode. The best part is in this case, even the datanode needs not to be stopped.

The concept used behind to solve this use case is Logical Volume Management.

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

Some Terminologies:

  1. Physical Volume:

Physical block devices or other disk-like devices (for example, other devices created by device mapper, like RAID arrays) are used by LVM as the raw building material for higher levels of abstraction. Physical volumes are regular storage devices. LVM writes a header to the device to allocate it for management.

2. Volume Group:

LVM combines physical volumes into storage pools known as volume groups. Volume groups abstract the characteristics of the underlying devices and function as a unified logical device with combined storage capacity of the component physical volumes.

3. Logical Volume:

A volume group can be sliced up into any number of logical volumes. Logical volumes are functionally equivalent to partitions on a physical disk, but with much more flexibility. Logical volumes are the primary component that users and applications will interact with.


  1. Add any number of hard disks first. These all will be combined into one Logical Volume.

2. Check all the Disks available using:

fdisk -l

3. Create a physical volume of any number of Disks. All these disks will be later combined into one Logical Volume.

pvcreate /dev/sdb

pvdisplay /dev/sdb

pvcreate /dev/sdc

pvdisplay /dev/sdc

4. Now since we have created physical volumes, we need to create Volume Group which we have named hadoop-elasticity /dev/sdb /dev/sdc

vgcreate <name_of_vg_group> <all the disks>

This volume group will combine the physical volumes

5. Create a logical volume

Specify the size of the partition, name of the logical volume and the volume group names

lvcreate — size 45G — name <lv_name> <vg_name>

6. Display the Information about the Logical Volume Created

lvdisplay <vg_name>/<lv_name>

7. Format the Logical Volume Partition

mkfs.ext4 /dev/<vg_name>/<lv_name>

8. Mount the directory shared by DataNode to NameNode to the created Logical Volume Partition

mount /dev/<vg_name>/<lv_name> <directory_name>

9. Display Information about Volume Group

vgdisplay <vg_name>

Notice that 45 GB is allocated, 15 GB is Free

10. Check the report of the Hadoop Cluster

hadoop dfsdamin -report

Notice that now 44GB is contributed

11. Now, in future, if you would like to extend or reduce the storage contributed, you can do that. Just make sure you have storage available in the Volume Group.

lvextend — size <size> /dev/<vg_name>/<lv_name>

See now, allocated space is 50 GB

12. If you reformat, all the data stored in the storage will be lost. That’s why we only need to format the extended partition. This can be done using resize2fs

resize2fs /dev/<vg_name>/<lv_name>

13. Now, see the size contributed is around 50GB.

Thank you

Hope you liked it!!

Data Science, Big Data, Cloud Computing