Hadoop 3 version was released on 2017 and comes with some new features to override the drawbacks in hadoop 2 version. In this article we can learn what are the major and minor difference between hadoop 2 and hadoop 3 versions.
There is no chances in license part, both hadoop 2 and hadoop 3 are under apache 2.0 and both versions are open source. Anyone can use for free of cost.
Hadoop 2 minimum support java version is Java 7.
Hadoop 3 minimum support java version is Java 8.
Support of Programming languages
Both hadoop versions can support all major programming languages like Java,Python,R,Scala etc
Hardware and cost
Comparing Hadoop 2 hardware and cost is very less for hadoop 3 because of changes in fault-tolerance providing system. So in hadoop 3 version we don’t need more disk spaces to store the data.
In Hadoop 2 version replication used for fault tolerance but in Hadoop 3 erasure coding technique used for fault tolerance.
HDFS balancer concept used in Hadoop 2 for distributing the data between the datanodes, but this technique is not useful between the disks in a datanode.
HDFS disk balancer concept introduced in Hadoop 3 version and by using this technique we can share the space load between two disks with in a datanode.
Hadoop 2 used replication concept.
Hadoop 3 used erasure coding.
Because of replication if you copy a file of 6 blocks in hdfs,it creates 18 blocks including replication,so 200% storage overhead in hadoop 2.
erasure coding creates only 50% storage overhead by using parity cells concept. Supose a file with six blocks will consume only nine blocks of disk space (6 data, 3 parity).
Default Port Rangers
In Hadoop 2 all hadoop ports are with in linux ephemeral port range.So at the time of startup, they will fail to bind.
But in Hadoop 3.0 these ports have been moved out of the ephemeral range.
Multiple Standby Name nodes
In Hadoop 2 only one standby name node but in hadoop 3 introduced multiple standby namenodes support.
Hadoop 2 and Hadoop 3 differences
|Features||Hadoop 2||Hadoop 3|
|License||Open source,Apache 2||Open source,Apache 2|
|Initial Release Year||2017||2013|
|Use Category||Data Processing Engine||Data Processing Engine|
|Real Time Analysis||Not Supported||Not Supported|
|Level of Abstraction||Low||Low|
|Ease to Learn and Use||Yes||Yes|
|Speed Depends on||Disk||Disk|
|Security Techniques||ACLs & Kerberos||ACLs & Kerberos|
|Fault Tolerance Techniques||Replication||Erasure Coding|
|Name Nodes||Signle Active Namenode and Single Standby Namenode||Signle Active Namenode and Multiple Standby Namenodes|
|Datanodes||Add upto 10000 data nodes in a cluster||Add more than10000 data nodes in a cluster|
Hardware and cost difference between Hadoop 2 and Hadoop 3
|Features||Hadoop 2||Hadoop 3|
|Disc Cost||High||Little low compared to Hadoop 2|
|Memory Cost (RAM)||Same||Same|
|Total Cost||High||Little low compared to Hadoop 2|
Prerequisites Difference Between Hadoop 2 and Hadoop 3
|Prerequisites||Hadoop 2||Hadoop 3|
|Minimum Java Version Required||Java 7||Java 8|
|Minimum Linux Version Required||Redhat 6/Centos 6||Redhat 7/Centos 7|
Supported File systems Difference Between Hadoop 2 and Hadoop 3
|File Systems||Hadoop 2||Hadoop 3|
|Local File systems||Yes||Yes|
|Distributed File system||Yes||Yes|
|Microsoft Azure Data Lake||No||Yes|
|Aliyun Object Storage System||No||Yes|
Programming Languages Difference Between Hadoop 2 and Hadoop 3
|Program Languages||Hadoop 2||Hadoop 3|
In this post we discussed about major and minor differences between Hadoop 2 and Hadoop 3 versions. If we forgot or miss any difference please let us know by comment below. Please share our article and support hadoopcdp.com