Next Previous Contents
I will propose a basic benchmarking toolkit for Linux. This is a
preliminary version of a comprehensive Linux Benchmarking
Toolkit, to be expanded and improved. Take it for what it's
worth, i.e. as a proposal. If you don't think it is a valid test
suite, feel free to email me your critics and I will be glad to
make the changes and improve it if I can. Before getting into an
argument, however, read this HOWTO and the mentionned references:
informed criticism is welcomed, empty criticism is not.
This is just common sense:
- It should not take a whole day to run. When it comes to
comparative benchmarking (various runs), nobody wants to spend
days trying to figure out the fastest setup for a given system.
Ideally, the entire benchmark set should take about 15 minutes to
complete on an average machine.
- All source code for the software used must be freely
available on the Net, for obvious reasons.
- Benchmarks should provide simple figures reflecting the
measured performance.
- There should be a mix of synthetic benchmarks and application
benchmarks (with separate results, of course).
- Each synthetic benchmarks should exercise a particular
subsystem to its maximum capacity.
- Results of synthetic benchmarks should not be
averaged into a single figure of merit (that defeats the whole
idea behind synthetic benchmarks, with considerable loss of
information).
- Applications benchmarks should consist of commonly executed
tasks on Linux systems.
I have selected five different benchmark suites, trying as much
as possible to avoid overlap in the tests:
- Kernel 2.0.0 (default configuration) compilation using gcc.
- Whetstone version 10/03/97 (latest version by Roy
Longbottom).
- xbench-0.2 (with fast execution parameters).
- UnixBench benchmarks version 4.01 (partial results).
- BYTE Magazine's BYTEmark benchmarks beta release 2 (partial
results).
For tests 4 and 5, "(partial results)" means that not all results
produced by these benchmarks are considered.
- Kernel 2.0.0 compilation: 5 - 30 minutes, depending on the
real performance of your system.
- Whetstone: 100 seconds.
- Xbench-0.2: < 1 hour.
- UnixBench benchmarks version 4.01: approx. 15 minutes.
- BYTE Magazine's BYTEmark benchmarks: approx. 10 minutes.
Kernel 2.0.0 compilation:
-
What: it is the only application benchmark in the LBT.
- The code is widely available (i.e. I finally found some use
for my old Linux CD-ROMs).
- Most linuxers recompile the kernel quite often, so it is a
significant measure of overall performance.
- The kernel is large and gcc uses a large chunk of memory:
attenuates L2 cache size bias with small tests.
- It does frequent I/O to disk.
- Test procedure: get a pristine 2.0.0 source, compile with
default options (make config, press Enter repeatedly). The
reported time should be the time spent on compilation i.e. after
you type make zImage, not including make dep, make clean.
Note that the default target architecture for the kernel is the
i386, so if compiled on another architecture, gcc too should be
set to cross-compile, with i386 as the target architecture.
-
Results: compilation time in minutes and seconds (please
don't report fractions of seconds).
Whetstone:
-
What: measures pure floating point performance with a
short, tight loop. The source (in C) is quite readable and it
is very easy to see which floating-point operations are
involved.
- Shortest test in the LBT :-).
- It's an "Old Classic" test: comparable figures are available,
its flaws and shortcomings are well known.
- Test procedure: the newest C source should be obtained from
Aburto's site. Compile and run in double precision mode. Specify
gcc and -O2 as precompiler and precompiler options, and define
POSIX 1 to specify machine type.
-
Results: a floating-point performance figure in MWIPS.
Xbench-0.2:
-
What: measures X server performance.
- The xStones measure provided by xbench is a weighted average
of several tests indexed to an old Sun station with a
single-bit-depth display. Hmmm... it is questionable as a test of
modern X servers, but it's still the best tool I have found.
- Test procedure: compile with -O2. We specify a few options
for a shorter run:
./xbench -timegoal 3 >
results/name_of_your_linux_box.out. To get the xStones
rating, we must run an awk script; the simplest way is to type
make summary.ms. Check the summary.ms file: the
xStone rating for your system is in the last column of the line
with your machine name specified during the test.
-
Results: an X performance figure in xStones.
- Note: this test, as it stands, is outdated. It should be
re-coded.
UnixBench version 4.01:
-
What: measures overall Unix performance. This test will
exercice the file I/O and kernel multitasking performance.
- I have discarded all arithmetic test results, keeping only
the system-related test results.
- Test procedure: make with -O2. Execute with
./Run
-1 (run each test once). You will find the results in the
./results/report file. Calculate the geometric mean of the EXECL
THROUGHPUT, FILECOPY 1, 2, 3, PIPE THROUGHPUT, PIPE-BASED CONTEXT
SWITCHING, PROCESS CREATION, SHELL SCRIPTS and SYSTEM CALL
OVERHEAD indexes.
-
Results: a system index.
BYTE Magazine's BYTEmark benchmarks:
-
What: provides a good measure of CPU performance. Here
is an excerpt from the documentation: "These benchmarks are
meant to expose the theoretical upper limit of the CPU, FPU,
and memory architecture of a system. They cannot measure video,
disk, or network throughput (those are the domains of a
different set of benchmarks). You should, therefore, use the
results of these tests as part, not all, of any evaluation of a
system."
- I have discarded the FPU test results since the Whetstone
test is just as representative of FPU performance.
- I have split the integer tests in two groups: those more
representative of memory-cache-CPU performance and the CPU
integer tests.
- Test procedure: make with -O2. Run the test with
./nbench > myresults.dat or similar. Then, from
myresults.dat, calculate geometric mean of STRING SORT,
ASSIGNMENT and BITFIELD test indexes; this is the memory
index; calculate the geometric mean of NUMERIC SORT, IDEA,
HUFFMAN and FP EMULATION test indexes; this is the integer
index.
-
Results: a memory index and an integer index calculated
as explained above.
The ideal benchmark suite would run in a few minutes, with
synthetic benchmarks testing every subsystem separately and
applications benchmarks providing results for different
applications. It would also automatically generate a complete
report and eventually email the report to a central database on
the Web.
We are not really interested in portability here, but it should
at least run on all recent (> 2.0.0) versions and flavours
(i386, Alpha, Sparc...) of Linux.
If anybody has any idea about benchmarking network performance in
a simple, easy and reliable way, with a short (less than 30
minutes to setup and run) test, please contact me.
Besides the tests, the benchmarking procedure would not be
complete without a form describing the setup, so here it is
(following the guidelines from comp.benchmarks.faq):
LINUX BENCHMARKING TOOLKIT REPORT FORM
CPU
==
Vendor:
Model:
Core clock:
Motherboard vendor:
Mbd. model:
Mbd. chipset:
Bus type:
Bus clock:
Cache total:
Cache type/speed:
SMP (number of processors):
RAM
====
Total:
Type:
Speed:
Disk
====
Vendor:
Model:
Size:
Interface:
Driver/Settings:
Video board
===========
Vendor:
Model:
Bus:
Video RAM type:
Video RAM total:
X server vendor:
X server version:
X server chipset choice:
Resolution/vert. refresh rate:
Color depth:
Kernel
=====
Version:
Swap size:
gcc
===
Version:
Options:
libc version:
Test notes
==========
RESULTS
========
Linux kernel 2.0.0 Compilation Time: (minutes and seconds)
Whetstones: results are in MWIPS.
Xbench: results are in xstones.
Unixbench Benchmarks 4.01 system INDEX:
BYTEmark integer INDEX:
BYTEmark memory INDEX:
Comments*
=========
* This field is included for possible interpretations of the results, and as
such, it is optional. It could be the most significant part of your report,
though, specially if you are doing comparative benchmarking.
Testing network performance is a challenging task since it
involves at least two machines, a server and a client machine,
hence twice the time to setup and many more variables to control,
etc... On an ethernet network, I guess your best bet would be the
ttcp package. (to be expanded)
SMP tests are another challenge, and any benchmark specifically
designed for SMP testing will have a hard time proving itself
valid in real-life settings, since algorithms that can take
advantage of SMP are hard to come by. It seems later versions of
the Linux kernel (> 2.1.30 or around that) will do
"fine-grained" multiprocessing, but I have no more information
than that for the moment.
According to David Niemi, " ... shell8 [part of the
Unixbench 4.01 benchmaks]does a good job at comparing similar
hardware/OS in SMP and UP modes."
Next Previous Contents