| Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules |
| |
| Reinhold P. Weicker |
| Siemens AG, E STE 35 |
| Postfach 3240 |
| D-8520 Erlangen |
| Germany (West) |
| |
| |
| |
| |
| The Dhrystone benchmark program [1] has become a popular benchmark for |
| CPU/compiler performance measurement, in particular in the area of |
| minicomputers, workstations, PC's and microprocesors. It apparently |
| satisfies a need for an easy-to-use integer benchmark; it gives a first |
| performance indication which is more meaningful than MIPS numbers |
| which, in their literal meaning (million instructions per second), |
| cannot be used across different instruction sets (e.g. RISC vs. CISC). |
| With the increasing use of the benchmark, it seems necessary to |
| reconsider the benchmark and to check whether it can still fulfill this |
| function. Version 2 of Dhrystone is the result of such a re- |
| evaluation, it has been made for two reasons: |
| |
| o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal |
| and C have been distributed by Reinhold Weicker via floppy disk. |
| However, the version that was used most often for benchmarking has |
| been the version made by Rick Richardson by another translation from |
| the Ada version into the C programming language, this has been the |
| version distributed via the UNIX network Usenet [2]. |
| |
| There is an obvious need for a common C version of Dhrystone, since C |
| is at present the most popular system programming language for the |
| class of systems (microcomputers, minicomputers, workstations) where |
| Dhrystone is used most. There should be, as far as possible, only |
| one C version of Dhrystone such that results can be compared without |
| restrictions. In the past, the C versions distributed by Rick |
| Richardson (Version 1.1) and by Reinhold Weicker had small (though |
| not significant) differences. |
| |
| Together with the new C version, the Ada and Pascal versions have |
| been updated as well. |
| |
| o As far as it is possible without changes to the Dhrystone statistics, |
| optimizing compilers should be prevented from removing significant |
| statements. It has turned out in the past that optimizing compilers |
| suppressed code generation for too many statements (by "dead code |
| removal" or "dead variable elimination"). This has lead to the |
| danger that benchmarking results obtained by a naive application of |
| Dhrystone - without inspection of the code that was generated - could |
| become meaningless. |
| |
| The overall policiy for version 2 has been that the distribution of |
| statements, operand types and operand locality described in [1] should |
| remain unchanged as much as possible. (Very few changes were |
| necessary; their impact should be negligible.) Also, the order of |
| statements should remain unchanged. Although I am aware of some |
| critical remarks on the benchmark - I agree with several of them - and |
| know some suggestions for improvement, I didn't want to change the |
| benchmark into something different from what has become known as |
| "Dhrystone"; the confusion generated by such a change would probably |
| outweight the benefits. If I were to write a new benchmark program, I |
| wouldn't give it the name "Dhrystone" since this denotes the program |
| published in [1]. However, I do recognize the need for a larger number |
| of representative programs that can be used as benchmarks; users should |
| always be encouraged to use more than just one benchmark. |
| |
| The new versions (version 2.1 for C, Pascal and Ada) will be |
| distributed as widely as possible. (Version 2.1 differs from version |
| 2.0 distributed via the UNIX Network Usenet in March 1988 only in a few |
| corrections for minor deficiencies found by users of version 2.0.) |
| Readers who want to use the benchmark for their own measurements can |
| obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX |
| format) from the author. |
| |
| |
| In general, version 2 follows - in the parts that are significant for |
| performance measurement, i.e. within the measurement loop - the |
| published (Ada) version and the C versions previously distributed. |
| Where the versions distributed by Rick Richardson [2] and Reinhold |
| Weicker have been different, it follows the version distributed by |
| Reinhold Weicker. (However, the differences have been so small that |
| their impact on execution time in all likelihood has been negligible.) |
| The initialization and UNIX instrumentation part - which had been |
| omitted in [1] - follows mostly the ideas of Rick Richardson [2]. |
| However, any changes in the initialization part and in the printing of |
| the result have no impact on performance measurement since they are |
| outside the measaurement loop. As a concession to older compilers, |
| names have been made unique within the first 8 characters for the C |
| version. |
| |
| The original publication of Dhrystone did not contain any statements |
| for time measurement since they are necessarily system-dependent. |
| However, it turned out that it is not enough just to inclose the main |
| procedure of Dhrystone in a loop and to measure the execution time. If |
| the variables that are computed are not used somehow, there is the |
| danger that the compiler considers them as "dead variables" and |
| suppresses code generation for a part of the statements. Therefore in |
| version 2 all variables of "main" are printed at the end of the |
| program. This also permits some plausibility control for correct |
| execution of the benchmark. |
| |
| At several places in the benchmark, code has been added, but only in |
| branches that are not executed. The intention is that optimizing |
| compilers should be prevented from moving code out of the measurement |
| loop, or from removing code altogether. Statements that are executed |
| have been changed in very few places only. In these cases, only the |
| role of some operands has been changed, and it was made sure that the |
| numbers defining the "Dhrystone distribution" (distribution of |
| statements, operand types and locality) still hold as much as possible. |
| Except for sophisticated optimizing compilers, execution times for |
| version 2.1 should be the same as for previous versions. |
| |
| Because of the self-imposed limitation that the order and distribution |
| of the executed statements should not be changed, there are still cases |
| where optimizing compilers may not generate code for some statements. |
| To a certain degree, this is unavoidable for small synthetic |
| benchmarks. Users of the benchmark are advised to check code listings |
| whether code is generated for all statements of Dhrystone. |
| |
| Contrary to the suggestion in the published paper and its realization |
| in the versions previously distributed, no attempt has been made to |
| subtract the time for the measurement loop overhead. (This calculation |
| has proven difficult to implement in a correct way, and its omission |
| makes the program simpler.) However, since the loop check is now part |
| of the benchmark, this does have an impact - though a very minor one - |
| on the distribution statistics which have been updated for this |
| version. |
| |
| |
| In this section, all changes are described that affect the measurement |
| loop and that are not just renamings of variables. All remarks refer to |
| the C version; the other language versions have been updated similarly. |
| |
| In addition to adding the measurement loop and the printout statements, |
| changes have been made at the following places: |
| |
| o In procedure "main", three statements have been added in the non- |
| executed "then" part of the statement |
| if (Enum_Loc == Func_1 (Ch_Index, 'C')) |
| they are |
| strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING"); |
| Int_2_Loc = Run_Index; |
| Int_Glob = Run_Index; |
| The string assignment prevents movement of the preceding assignment |
| to Str_2_Loc (5'th statement of "main") out of the measurement loop |
| (This probably will not happen for the C version, but it did happen |
| with another language and compiler.) The assignment to Int_2_Loc |
| prevents value propagation for Int_2_Loc, and the assignment to |
| Int_Glob makes the value of Int_Glob possibly dependent from the |
| value of Run_Index. |
| |
| o In the three arithmetic computations at the end of the measurement |
| loop in "main ", the role of some variables has been exchanged, to |
| prevent the division from just cancelling out the multiplication as |
| it was in [1]. A very smart compiler might have recognized this and |
| suppressed code generation for the division. |
| |
| o For Proc_2, no code has been changed, but the values of the actual |
| parameter have changed due to changes in "main". |
| |
| o In Proc_4, the second assignment has been changed from |
| Bool_Loc = Bool_Loc | Bool_Glob; |
| to |
| Bool_Glob = Bool_Loc | Bool_Glob; |
| It now assigns a value to a global variable instead of a local |
| variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not |
| used afterwards. |
| |
| o In Func_1, the statement |
| Ch_1_Glob = Ch_1_Loc; |
| was added in the non-executed "else" part of the "if" statement, to |
| prevent the suppression of code generation for the assignment to |
| Ch_1_Loc. |
| |
| o In Func_2, the second character comparison statement has been changed |
| to |
| if (Ch_Loc == 'R') |
| ('R' instead of 'X') because a comparison with 'X' is implied in the |
| preceding "if" statement. |
| |
| Also in Func_2, the statement |
| Int_Glob = Int_Loc; |
| has been added in the non-executed part of the last "if" statement, |
| in order to prevent Int_Loc from becoming a dead variable. |
| |
| o In Func_3, a non-executed "else" part has been added to the "if" |
| statement. While the program would not be incorrect without this |
| "else" part, it is considered bad programming practice if a function |
| can be left without a return value. |
| |
| To compensate for this change, the (non-executed) "else" part in the |
| "if" statement of Proc_3 was removed. |
| |
| The distribution statistics have been changed only by the addition of |
| the measurement loop iteration (1 additional statement, 4 additional |
| local integer operands) and by the change in Proc_4 (one operand |
| changed from local to global). The distribution statistics in the |
| comment headers have been updated accordingly. |
| |
| |
| The string operations (string assignment and string comparison) have |
| not been changed, to keep the program consistent with the original |
| version. |
| |
| There has been some concern that the string operations are over- |
| represented in the program, and that execution time is dominated by |
| these operations. This was true in particular when optimizing |
| compilers removed too much code in the main part of the program, this |
| should have been mitigated in version 2. |
| |
| It should be noted that this is a language-dependent issue: Dhrystone |
| was first published in Ada, and with Ada or Pascal semantics, the time |
| spent in the string operations is, at least in all implementations |
| known to me, considerably smaller. In Ada and Pascal, assignment and |
| comparison of strings are operators defined in the language, and the |
| upper bounds of the strings occuring in Dhrystone are part of the type |
| information known at compilation time. The compilers can therefore |
| generate efficient inline code. In C, string assignemt and comparisons |
| are not part of the language, so the string operations must be |
| expressed in terms of the C library functions "strcpy" and "strcmp". |
| (ANSI C allows an implementation to use inline code for these |
| functions.) In addition to the overhead caused by additional function |
| calls, these functions are defined for null-terminated strings where |
| the length of the strings is not known at compilation time; the |
| function has to check every byte for the termination condition (the |
| null byte). |
| |
| Obviously, a C library which includes efficiently coded "strcpy" and |
| "strcmp" functions helps to obtain good Dhrystone results. However, I |
| don't think that this is unfair since string functions do occur quite |
| frequently in real programs (editors, command interpreters, etc.). If |
| the strings functions are implemented efficiently, this helps real |
| programs as well as benchmark programs. |
| |
| I admit that the string comparison in Dhrystone terminates later (after |
| scanning 20 characters) than most string comparisons in real programs. |
| For consistency with the original benchmark, I didn't change the |
| program despite this weakness. |
| |
| |
| When Dhrystone is used, the following "ground rules" apply: |
| |
| o Separate compilation (Ada and C versions) |
| |
| As mentioned in [1], Dhrystone was written to reflect actual |
| programming practice in systems programming. The division into |
| several compilation units (5 in the Ada version, 2 in the C version) |
| is intended, as is the distribution of inter-module and intra-module |
| subprogram calls. Although on many systems there will be no |
| difference in execution time to a Dhrystone version where all |
| compilation units are merged into one file, the rule is that separate |
| compilation should be used. The intention is that real programming |
| practice, where programs consist of several independently compiled |
| units, should be reflected. This also has implies that the compiler, |
| while compiling one unit, has no information about the use of |
| variables, register allocation etc. occuring in other compilation |
| units. Although in real life compilation units will probably be |
| larger, the intention is that these effects of separate compilation |
| are modeled in Dhrystone. |
| |
| A few language systems have post-linkage optimization available |
| (e.g., final register allocation is performed after linkage). This |
| is a borderline case: Post-linkage optimization involves additional |
| program preparation time (although not as much as compilation in one |
| unit) which may prevent its general use in practical programming. I |
| think that since it defeats the intentions given above, it should not |
| be used for Dhrystone. |
| |
| Unfortunately, ISO/ANSI Pascal does not contain language features for |
| separate compilation. Although most commercial Pascal compilers |
| provide separate compilation in some way, we cannot use it for |
| Dhrystone since such a version would not be portable. Therefore, no |
| attempt has been made to provide a Pascal version with several |
| compilation units. |
| |
| o No procedure merging |
| |
| Although Dhrystone contains some very short procedures where |
| execution would benefit from procedure merging (inlining, macro |
| expansion of procedures), procedure merging is not to be used. The |
| reason is that the percentage of procedure and function calls is part |
| of the "Dhrystone distribution" of statements contained in [1]. This |
| restriction does not hold for the string functions of the C version |
| since ANSI C allows an implementation to use inline code for these |
| functions. |
| |
| |
| |
| o Other optimizations are allowed, but they should be indicated |
| |
| It is often hard to draw an exact line between "normal code |
| generation" and "optimization" in compilers: Some compilers perform |
| operations by default that are invoked in other compilers only when |
| optimization is explicitly requested. Also, we cannot avoid that in |
| benchmarking people try to achieve results that look as good as |
| possible. Therefore, optimizations performed by compilers - other |
| than those listed above - are not forbidden when Dhrystone execution |
| times are measured. Dhrystone is not intended to be non-optimizable |
| but is intended to be similarly optimizable as normal programs. For |
| example, there are several places in Dhrystone where performance |
| benefits from optimizations like common subexpression elimination, |
| value propagation etc., but normal programs usually also benefit from |
| these optimizations. Therefore, no effort was made to artificially |
| prevent such optimizations. However, measurement reports should |
| indicate which compiler optimization levels have been used, and |
| reporting results with different levels of compiler optimization for |
| the same hardware is encouraged. |
| |
| o Default results are those without "register" declarations (C version) |
| |
| When Dhrystone results are quoted without additional qualification, |
| they should be understood as results obtained without use of the |
| "register" attribute. Good compilers should be able to make good use |
| of registers even without explicit register declarations ([3], p. |
| 193). |
| |
| Of course, for experimental purposes, post-linkage optimization, |
| procedure merging and/or compilation in one unit can be done to |
| determine their effects. However, Dhrystone numbers obtained under |
| these conditions should be explicitly marked as such; "normal" |
| Dhrystone results should be understood as results obtained following |
| the ground rules listed above. |
| |
| In any case, for serious performance evaluation, users are advised to |
| ask for code listings and to check them carefully. In this way, when |
| results for different systems are compared, the reader can get a |
| feeling how much performance difference is due to compiler optimization |
| and how much is due to hardware speed. |
| |
| |
| The C version 2.1 of Dhrystone has been developed in cooperation with |
| Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the |
| "Version 1.1" distributed previously by him over the UNIX network |
| Usenet. Through his activity with Usenet, Rick Richardson has made a |
| very valuable contribution to the dissemination of the benchmark. I |
| also thank Chaim Benedelac (National Semiconductor), David Ditzel |
| (SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael |
| Saavedra-Barrera (UC at Berkeley) for their help with comments on |
| earlier versions of the benchmark. |
| |
| |
| [1] |
| Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming |
| Benchmark. |
| Communications of the ACM 27, 10 (Oct. 1984), 1013-1030 |
| |
| [2] |
| Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text) |
| Informal Distribution via "Usenet", Last Version Known to me: Sept. |
| 21, 1987 |
| |
| [3] |
| Brian W. Kernighan and Dennis M. Ritchie: The C Programming |
| Language. |
| Prentice-Hall, Englewood Cliffs (NJ) 1978 |
| |
| |
| |
| |
| |