Running the SLive Test on WANdisco Distro

The SLive test is a stress test designed to simulate distributed operations and load on the NameNode by utilizing the MapReduce paradigm. It was designed by Konstantin Shvachko and introduced into the Apache Hadoop project in 2010 by him and others. It is now one of the many stress tests we ran here at WANdisco in testing our distribution, WANdisco Distro (WDD).

You can read the original paper about how this test works here:
https://issues.apache.org/jira/secure/attachment/12448004/SLiveTest.pdf
You can view the associated Apache JIRA for the introduction of this test here:
https://issues.apache.org/jira/browse/HDFS-708

This blog will provide a short tutorial on how you can run the SLive test on your own cluster of Hadoop 2 and YARN / MapReduce. Before we begin, please make sure you are logged in as the ‘hdfs’ user:

su – hdfs

The first order of business is to become familiar with the parameters supported by the stress test.

The percentage of operation distribution parameters:
-create <num> -delete <num> -rename <num> -read <num>  -append <num> -ls <num> -mkdir <num>

Stress test property parameters:
-blockSize <min,max> -readSize <min,max> -writeSize <min,max> -files <total>

The first set of parameters controls “how many of this kind of operation do you want?”. For example, if you want to simulate just a create and delete scenario, with no reads or writes, you would run the test with -create 50 -delete 50 (or any other percentages that add up to 100) and set the others in that first set to 0, or just don’t specify them and the test will automatically set them to 0.

The second set of parameters controls properties that extend throughout the entire test. “How many files do you want to make?,” “What is the biggest and smallest that you want each block in the file to be?” They can be ignored for the most part, except for “-blockSize”. Using the default block size, which is 64 megabytes, may cause your run of the SLive test to take longer. In order to make this a speedy tutorial, we will use small block sizes. Please note that block sizes must be in multiples of 512 bytes. We will use 4096 bytes in this tutorial.

There are other parameters available, but they are not necessary in order to provide a basic understanding and run of this stress test. You can refer to the document at the top of this entry if your curiosity of the other parameters is getting the best of you, or you can run:

hadoop org.apache.hadoop.fs.slive.SliveTest –help

The second step is to understand how to run the test. Although it is advised NOT to do this just yet, you can make the following call to instantly run the test with default parameters by executing the following command:

hadoop org.apache.hadoop.fs.slive.SliveTest

However, since we have no initial data within the cluster, you should notice that most, if not all, of the operations in the report are failures. Run the following to initialize the cluster with 10,000 files, all with a tiny 4096 byte block size, in order to achieve a quick run of the SLive test:

hadoop org.apache.hadoop.fs.slive.SliveTest -create 100 -delete 0 -rename 0 -read 0 -append 0 -ls 0 -mkdir 0 -blockSize 4096,4096 -files 10000

On a cluster with 1 NameNode and 3 DataNodes, running this command should take no longer  than about 3 – 4 minutes. If it is taking too long, you can try re-running with a lower “-files” parameter number and/or a smaller “-blockSize” parameter as well.

After you have initialized the cluster with data, you will need to delete the output directory that your previous SLive test run had created:

hadoop fs -rmr /test/slive/slive/output

You will need to do this after every time you have run an SLive test; otherwise your next run attempt will fail, telling you that the output directory for your requested run already exists.

You can now run the default test, which performs an equal distribution of creates, deletes, reads, and other operations across the cluster:

hadoop org.apache.hadoop.fs.slive.SliveTest

Or you can specify the parameters of your own choosing and customize your own load to stress test with! That is the purpose of the test, after all. Enjoy!

Here are the results obtained from our own in-house run of the SLive test for you to compare your own results with. I ran the following command after initialization:

hadoop org.apache.hadoop.fs.slive.SliveTest -blockSize 4096,4096 -files 10000

And I got the following results:

13/02/11 11:00:36 INFO slive.SliveTest: Reporting on job:
13/02/11 11:00:36 INFO slive.SliveTest: Writing report using contents of /test/slive/slive/output
13/02/11 11:00:36 INFO slive.SliveTest: Report results being placed to logging output and to file /home/hdfs/part-0000
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type AppendOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “bytes_written” = 4317184
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “failures” = 1
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “files_not_found” = 365
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 59813
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 1054
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “bytes_written” = 0.067 MB/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 23.741 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 17.622 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type CreateOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “bytes_written” = 1490944
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “failures” = 1056
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 19029
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 364
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “bytes_written” = 0.053 MB/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 74.623 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 19.129 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type DeleteOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “failures” = 365
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 4905
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 1055
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 289.501 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 215.087 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type ListOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “dir_entries” = 1167
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “files_not_found” = 1145
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 536
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 275
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “dir_entries” = 2177.239 directory entries/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 2649.254 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 513.06 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type MkdirOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 5631
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 252.175 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 252.175 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type ReadOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “bad_files” = 1
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “bytes_read” = 25437917184
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “chunks_unverified” = 0
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “chunks_verified” = 3188125200
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “files_not_found” = 342
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 268754
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 1077
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “bytes_read” = 90.265 MB/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 5.284 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 4.007 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type RenameOp
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “failures” = 1165
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 1130
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 1420
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “successes” = 255
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 1256.637 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “successes” = 225.664 successes/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Basic report for operation type SliveMapper
13/02/11 11:00:36 INFO slive.ReportWriter: ————-
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “milliseconds_taken” = 765862
13/02/11 11:00:36 INFO slive.ReportWriter: Measurement “op_count” = 9940
13/02/11 11:00:36 INFO slive.ReportWriter: Rate for measurement “op_count” = 12.979 operations/sec
13/02/11 11:00:36 INFO slive.ReportWriter: ————-

avatar

About Plamen Jeliazkov