Difference Between Hadoop 2 and Hadoop 3


Hadoop 3 version was released on 2017 and comes with some new features to override the drawbacks in hadoop 2 version. In this article we can learn what are the major and minor difference between hadoop 2 and hadoop 3 versions.


There is no chances in license part, both hadoop 2 and hadoop 3 are under apache 2.0 and both versions are open source. Anyone can use for free of cost.

Java Version

Hadoop 2 minimum support java version is Java 7.

Hadoop 3 minimum support java version is Java 8.

Support of Programming languages

Both hadoop versions can support all major programming languages like Java,Python,R,Scala etc

Hardware and cost

Comparing Hadoop 2 hardware and cost is very less for hadoop 3 because of changes in fault-tolerance providing system. So in hadoop 3 version we don’t need more disk spaces to store the data.

Fault tolerance

In Hadoop 2 version replication used for fault tolerance but in Hadoop 3 erasure coding technique used for fault tolerance.

Data Balancing

HDFS balancer concept used in Hadoop 2 for distributing the data between the datanodes, but this technique is not useful between the disks in a datanode.

HDFS disk balancer concept introduced in Hadoop 3 version and by using this technique we can share the space load between two disks with in a datanode.

Storage Schema

Hadoop 2 used replication concept.

Hadoop 3 used erasure coding.

Storage Overhead

Because of replication if you copy a file of 6 blocks in hdfs,it creates 18 blocks including replication,so 200% storage overhead in hadoop 2.

erasure coding creates only 50% storage overhead by using parity cells concept. Supose a file with six blocks will consume only nine blocks of disk space (6 data, 3 parity).

Default Port Rangers

In Hadoop 2 all hadoop ports are with in linux ephemeral port range.So at the time of startup, they will fail to bind.

But in Hadoop 3.0 these ports have been moved out of the ephemeral range.

Multiple Standby Name nodes

In Hadoop 2 only one standby name node but in hadoop 3 introduced multiple standby namenodes support.

Hadoop 2 and Hadoop 3 differences

Features Hadoop 2Hadoop 3
License Open source,Apache 2Open source,Apache 2
Initial Release Year20172013
Use CategoryData Processing EngineData Processing Engine
Real Time AnalysisNot SupportedNot Supported
Interactivity NoNo
Level of AbstractionLowLow
Ease to Learn and UseYesYes
Operating SystemsWindows,Linux,MacOsWindows,Linux,MacOs
Speed Depends onDiskDisk
Security Techniques ACLs & KerberosACLs & Kerberos
Fault Tolerance Techniques ReplicationErasure Coding
Yarn Version12
Name NodesSignle Active Namenode and Single Standby NamenodeSignle Active Namenode and Multiple Standby Namenodes
DatanodesAdd upto 10000 data nodes in a clusterAdd more than10000 data nodes in a cluster

Hardware and cost difference between Hadoop 2 and Hadoop 3

Features Hadoop 2Hadoop 3
Disc CostHighLittle low compared to Hadoop 2
Memory Cost (RAM)SameSame
Total CostHighLittle low compared to Hadoop 2

Prerequisites Difference Between Hadoop 2 and Hadoop 3

PrerequisitesHadoop 2Hadoop 3
Minimum Java Version RequiredJava 7Java 8
Minimum Linux Version RequiredRedhat 6/Centos 6Redhat 7/Centos 7

Supported File systems Difference Between Hadoop 2 and Hadoop 3

File SystemsHadoop 2Hadoop 3
Local File systemsYesYes
Amazon S3YesYes
Azure StorageYesYes
Distributed File systemYesYes
Microsoft Azure Data LakeNoYes
Aliyun Object Storage SystemNoYes

Programming Languages Difference Between Hadoop 2 and Hadoop 3

Program Languages Hadoop 2Hadoop 3


In this post we discussed about major and minor differences between Hadoop 2 and Hadoop 3 versions. If we forgot or miss any difference please let us know by comment below. Please share our article and support hadoopcdp.com



Please enter your comment!
Please enter your name here