Hadoop Smoke Test Script For Hortonworks

0
45
Hadoop-hdfs-mapreduce-Smoketest

As a admin we will do lot of daily activities on our hadoop clusters. Some of them are minor/configuration changes and some are major changes like Patching / Upgrades. For the minor changes we can simply go to ambari and check the respective service health checks.

What about the major changes? How can we assure the service is running as expected?

One of the option is to do the smoke test on respective services.

What is smoke test?

A smoke test is nothing but a primary test to check whether the basic functionality of a service is working or not.

Here we have written automation scripts for hdfs and hive services.

Hadoop Smoke Test 1 (RandomWrite Test)

This smoke test is to generate random data and write the data to DFS sequence file. We can configure all the values for running the job. By default randomwriter will generate 10GB data. Please check below configuration properties for any changes.

NameDefault ValueDescription
test.randomwriter.maps_per_host10Number of maps/host
test.randomwrite.bytes_per_map1073741824Number of bytes written/map
test.randomwrite.min_key10minimum size of the key in bytes
test.randomwrite.max_key1000maximum size of the key in bytes
test.randomwrite.min_value0minimum size of the value
test.randomwrite.max_value20000maximum size of the value
test.randomwrite.total_bytes10000000Number of total bytes to write

RandomWriter

bin/hadoop jar hadoop*examples*.jar randomwriter <output dir> [<configuration parameters>]

Please find example below

bin/hadoop jar hadoop*examples*.jar randomwriter /output_dir -D test.randomwrite.min_value=0 -D test.randomwrite.total_bytes=20000
  • Based on your hadoop version you need to mention the bin path.
  • -D is to pass the runtime parameters to the command.

Hadoop Smoke Test 2 & 3 (TeraGen & TeraSort Test)

The TeraSort benchmark testing consists of below 3 parts.

  • TeraGen – It generates the random data.
  • TeraSort – It runs a MapReduce sort job with a custom partitioner. Here the input to the TeraSort is output directory of TeraGen.
  • TeraValidate – It creates map task for each file in TeraSort output directory and checks the each key is less than the previous key. Here the input to TeraValidate is output directory of TeraSort.

TeraGen

bin/hadoop jar hadoop*examples*.jar teragen <number of 100-byte rows> <output dir>

TeraSort

bin/hadoop jar hadoop*examples*.jar terasort <TeraGen output dir> <output dir>

TeraValidate

bin/hadoop jar hadoop*examples*.jar teravalidate <terasort output dir> <output dir>

Hadoop Smoke Test 4 & 5 (TestDFSIO -write & -read)

It is a benchmark read and write test for HDFS. It is also useful to understand the performance bottlenecks in the network IO.
The default directory for the test is: /benchmarks/TestDFSIO

bin/hadoop jar hadoop*jobclient*.jar TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]

It consists of three parts.

  • Write
  • Read
  • Clean

Write

It writes its files to the /benchmarks/TestDFSIO path in HDFS. If we have only older files then those will be overwritten. If you want to use custom filename then use -resFile parameter.

bin/hadoop jar hadoop*jobclient*.jar TestDFSIO -write [-nrFiles 10 -fileSize 1000]

Read

It reads the files which are written by using the write command.

bin/hadoop jar hadoop*jobclient*.jar TestDFSIO -read [-nrFiles 10 -fileSize 1000]

Clean

Once the test activity is completed by using the below command we can clean it.

bin/hadoop jar hadoop*jobclient*.jar TestDFSIO -clean

Please find the script below for HDFS and Mapreduce smoke test. This script is to validate the functionality of hadoop hdfs and mapreduce services before and after major changes (upgrades and patchings).

load_hdfs_keytab ()
{
  cluster=${C_NAME}
  kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "Keytabs Initializing failed with error code $output_code"
    exit $output_code
  fi
}

smoke_test1 ()
{
  echo "Processing Hadoop SmokeTest, RandomWrite Test"
  output_folder=$OUT_DIR/$TEST_NAME/smoke_test1;
  mkdir -p $output_folder;
  echo "Started Executing the test:"
  echo "$HADOOP jar $MR_EXAMPLE_JAR randomwriter -Dtest.randomwrite.total_bytes=10000000 $TEST_NAME 2>$output_folder/error.log 1>$output_folder/output.log"
  $HADOOP jar $MR_EXAMPLE_JAR randomwriter -Dtest.randomwrite.total_bytes=10000000 $TEST_NAME 2>$output_folder/error.log 1>$output_folder/output.log
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "RandomWrite smoke test failed."
    exit 1
  fi
  echo "RandomWrite smoke test passed."
}

smoke_test2 ()
{
  #TERAGEN
  echo "Processing Hadoop SmokeTest, TeraGen Test"
  output_folder=$OUT_DIR/$TEST_NAME/smoke_test2
  mkdir -p $output_folder;
  echo "Started Executing the test:"
  echo "$HADOOP jar $MR_EXAMPLE_JAR teragen 10000 $TEST_NAME/teragenout &>$output_folder/output"
  $HADOOP jar $MR_EXAMPLE_JAR teragen 10000 $TEST_NAME/teragenout &>$output_folder/output
  hadoop fs -ls /user/hdfs/$TEST_NAME/teragenout/
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "Teragen out file is not created"
    exit 1
  fi
  echo "TeraGen smoke test passed."
}

smoke_test3 ()
{
  #TERASORT
  echo "Processing Hadoop SmokeTest, TeraSort Test"
  output_folder=$OUT_DIR/$TEST_NAME/smoke_test3
  mkdir -p $output_folder;
  echo "Started Executing the test:"
  echo "$HADOOP jar $MR_EXAMPLE_JAR terasort $TEST_NAME/teragenout $TEST_NAME/terasortout &>$output_folder/output"
  $HADOOP jar $MR_EXAMPLE_JAR terasort $TEST_NAME/teragenout $TEST_NAME/terasortout &>$output_folder/output
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "Failed in processing the Terasort"
    exit 1
  fi
  echo "TeraSort smoke test passed."
}

smoke_test4 ()
{
  #TestDFSIO Write
  echo "Processing Hadoop SmokeTest, TestDFSIO Write Test"
  output_folder=$OUT_DIR/$TEST_NAME/smoke_test4
  mkdir -p $output_folder;
  echo "Started Executing the test:"
  echo "$HADOOP jar $MR_JOB_CLIENT_JAR TestDFSIO -Dmapred.output.compress=False -write &>$output_folder/output"
  $HADOOP jar $MR_JOB_CLIENT_JAR TestDFSIO -Dmapred.output.compress=False -write &>$output_folder/output
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "Failed in processing DFSIO Write"
    exit 1
  fi
  echo "TestDFSIO write test passed."
}


smoke_test5 ()
{
  #TestDFSIO Read
  echo "Processing Hadoop SmokeTest, TestDFSIO Read Test"
  output_folder=$OUT_DIR/$TEST_NAME/smoke_test5
  mkdir -p $output_folder;
  echo "Started Executing the test:"
  echo "$HADOOP jar $MR_JOB_CLIENT_JAR TestDFSIO -Dmapred.output.compress=False -read &>$output_folder/output"
  $HADOOP jar $MR_JOB_CLIENT_JAR TestDFSIO -Dmapred.output.compress=False -write &>$output_folder/output
  output_code=$?
  if [ $output_code != 0 ];
  then
    echo "Failed in processing DFSIO Read"
    exit 1
  fi
  echo "TestDFSIO write test passed."
  
  #echo "Cleaning the test result data"
  #hadoop jar $MR_JOB_CLIENT_JAR TestDFSIO -clean
}

display_help ()
{
  echo "Usage: $0 cluster test_name [service names]"
  echo "Cluster - name"
  echo "Test_Name - preupgrade, postupgrade"
}

############################
# Main code
############################

if [ $# -ne 1 ];
then
  display_help;
  exit 0;
fi

C_NAME=$1
TEST_NAME=$2
OUT_DIR=$(pwd)/output
HADOOP=/usr/hdp/current/hadoop-client/bin/hadoop
MR_EXAMPLE_JAR=/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
MR_JOB_CLIENT_JAR=/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.jar
mkdir -p $OUT_DIR

load_hdfs_keytab;
smoke_test1;
smoke_test2;
smoke_test3;
smoke_test4;
smoke_test5;

For Non Kerberized clusters, remove the load_hdfs_keytab function from the script.
For Kerberized clusters, if you are using custom users (xxxx-hdfs) then update this username instead of hdfs user in load_hdfs_keytab function.

LEAVE A REPLY

Please enter your comment!
Please enter your name here