gcocco software Home!
gcocco software, inc
Benchmark: SPECjbbTM2000

FAQ:
  1. What is SPECjbb2000 anyway?
  2. What does SPECjbb2000 measure?
  3. Why is SPECjbb2000 important?
  4. Why did gcocco software publish multiple SPECjbb2000 runs on the
    identical hardware configuration?
  5. Why do most hardware vendors give only one performance number for
    a hardware configuration?
  6. What is a JVM?
  7. What significance are the differing levels of JVM code?

Machine Tested:

A.   Sun Microsystems - Enterprise 420R

  1. Results generated with "Generally Available" Java Virtual Machine (JVM) Code
  2. Results generated with "Beta" Java Virtual Machine (JVM) Code
  3. Performance Range Variability from 6,000 to 18,000 ops/sec!
  4. Performance Boost using Intimate Shared Memory (ISM) is at least 8%.
  5. Run to Run Reproducability
  6. Tricks/Tips/Cautions
  7. Net/Net - The Performance you should expect!



Questions and Answers:
  1. What is SPECjbb2000 anyway?
    SPECjbb2000 is a server benchmark that is loosely based on the TPC-C online transaction processing environment benchmark. The SPEC website notes that the "jbb" in SPECjbb2000 stands for "Java Business Benchmark". The benchmark has been totally implemented in Java and has been simplified by running with all the data in memory and no dependency on network I/O. This could be viewed as the "processor only" portion of an OLTP benchmark. SPECjbb2000 is "relatively easy" to run, since it is self contained and has a friendly test harness. For more information visit the SPECjbb2000 homepage.

    Back to FAQ

  2. What does SPECjbb2000 measure?
    SPECjbb2000 is a test of a system's memory hierarchy, operating system and Java Virtual Machine (JVM) software. SPECjbb2000 involves no disk I/O or network I/O, and is thus a good candidate for showing one of perhaps many upper bounds of a system's SMP scaling capability. In order to illustrate good scaling, we need to discuss what scaling is and is not.

    Scaling Graphs
    uni-processor (a) uni-processor 4-way SMP - Good (b) 4-way SMP - Good 4-way SMP - Bad (c) 4-way SMP - Bad

    1. Let's start with a graph (a), the uni-processor. By definition, it does not "scale" at all! As you add more workload to a "completely busy single CPU", you simply add more job management overhead to the system as the operating system's job scheduler attempts to time slice each ready process. It is the scheduler's job to "fairly" allocate the resources of the processor(s) to each of the competing processes. As each additional job is added to the system, each job takes slightly more systems resources to run, versus running the same job on a system where it has no competition for the CPU.

      SPECjbb2000 is normally run with multiple CPUs active, but if it were run on a uniprocessor, the system performance graph found in a "SPEC full disclosure" would look like graph (a). It had its' maximum performance at "1 warehouse" of work, and slowly gets worse (downward slope of the performance line) as more warehouses are added.

    2. Let's look at graph (b), the normal case you will see in the "SPEC full disclosure reports". This exhibits near-linear improvement in system utilization when the first 4 warehouses of work are added to the 4-processor system. In the ideal case, each of the 4 warehouses of work complete in same elapsed time it would take for 1 warehouse to complete if it were on a single dedicated processor. There is of course some overhead, but if the operating system, JVM AND the application (in this case the application is SPECjbb2000) are well written, you do actually see "near-linear" performance.

    3. Graph (c) shows a "very poorly" scaling application which you will never find in "SPEC full disclosure" submitted by a hardware vendor. There is a massive amount of overhead somewhere, so it would be prudent to figure that out before hanging that laundry out in public! For an example of at least "poor" scaling, see the example submitted by gcocco software for the Sun Enterprise 420R with version 1.2.2 of Java.

      To sum it up, an SMP which "scales well", gains almost a full processor's power with the addition of each CPU applied to the given task (up to saturation). Thus if one takes a single processor application and rewrite it to make use of all "n" processors of an SMP, one expects "n" times greater performance (in theory at least). In practice, one gets something less than "perfect scaling". How well one does indicates how well the overall hardware, operating systems software and application run as a combined system. Problems in any of the components can spell disaster to scaling results, as can the latencies introduced by network or disk I/O. This I/O latency issue is why benchmarks that measure scaling generally will not be doing any I/O. SPECjbb2000 does not do I/O.

      As with any benchmark results, utmost care must be used in evaluating scaling benchmarks since they are extremely sensitive to the levels of O/S code, Java code and all patches that have been applied. This range of performance can be seen in the section below which describes a performance range of 6000 to 1800 ops/sec on the same machine .

    Back to FAQ

  3. Why is SPECjbb2000 important?
    If you are using "Java based" software as a major middleware component in your business solution (custom or vendor provided), this benchmark should be useful and important to you. It is a robust test of the JVM and operating system under a considerable load. For more information about the benchmark visit the SPECjbb2000 homepage.

    Back to FAQ

  4. Why did gcocco software publish multiple SPECjbb2000 runs on the identical hardware configuration?
    As a software consulting firm, gcocco software is interested in a deep understanding of what is truly affecting overall system performance. Such an understanding can only be arrived at by analyzing a range of datapoints. A few examples of such range questions follow:
    1. What happens to the performance if an important parameter like "heap size" is varied?
    2. What happens to the run if new "beta" code is used as a means of getting the best results for this benchmark?
    3. What happens if you use "aggressive" tuning parameters that you would never risk using in production?

    Back to FAQ

  5. Why do most hardware vendors give only one performance number for a hardware configuration?
    Good Question! All vendors are very protective of their performance numbers. The vendor's primary goal is to be the fastest number in the list. Failing that, a vendor attempts to price their offering be the lowest price/performance solution for a given class of machine, then markets it as such. One can easily see why only certain numbers are published. However, the most interesting thing to try and understand is why certain numbers aren't released by a vendor!


    Back to FAQ

  6. What is a JVM?
    A JVM is an application environment which supports the "write once, run anywhere" Java code developed by Sun. This may be an interpretive environment or one of the more sophisticated JIT (Just In Time) compile environments which takes Java source "code" and turns it into native executable code "on the fly". This is the heart and soul of what SPECjbb2000 is measuring for performance. For much more detailed information, see Java Virtual Machine Specification, second edition on the java.sun.com site.

    Back to FAQ

  7. What significance are the differing levels of JVM code?
    Each new release of Java, brings with it new functionality and hopefully better performance! This can be upto a factor of 3 times performance improvement from one release to another on identical hardware. For the SPECjbb2000 benchmark to be successful, the designers had to limit the functionality of the Java API used to be compatible with all the JVMs that were available at the time the benchmark was created. Since then, new bells and whistles have been added, and by definition most are not being measured. If they are a direct language change or addition, the benchmark will not use them. If it is an "undercover change" like compiler optimization tricks and techniques, the benchmark will benefit from them.

    You will note that some vendors have tested results from several levels of the JVM on the same machine. In some cases there is significant performance change from one level of the JVM to the next, and in other cases the performance doesn't change very much!


    Back to FAQ | Back to top

A. Machine Tested: Sun Microsystems - Enterprise 420R

System Sun Enterprise 420R
CPU (4) - UltraSPARC II
Clock Speed 450 MHz
Memory 4 GB

All Results quoted or displayed in this section have been
reviewed and accepted
by the OSG Java Subcommittee of the
Standard Performance Evaluation Corporation (SPEC®).

A hyper-link is provided on each result to the
"SPEC full disclosure report" which is located on the
OSG JBB2000 Web-site.


Back to Machine Tested


a. Results with the "Generally Available (GA)" JVM Code:

Tuning Heap CPUs JVM Accepted Result Report
ga0 "high opt" 2010 MB 4 1.2.2_07a 13-Dec-2001 6,153 SPEC full disclosure
ga1 "high opt" 512 MB 4 1.3.0_02 19-Apr-2001 8,850 SPEC full disclosure
ga2 "high opt" 1024 MB 4 1.3.0_02 31-May-2001 8,893 SPEC full disclosure
ga3 "high opt" 2010 MB 4 1.3.0_02 31-May-2001 9,069 SPEC full disclosure
ga4 "aggressive" 2010 MB 4 1.3.0_02 31-May-2001 9,351 SPEC full disclosure

Table 1. Effect of heap size and tuning options for Java 1.3.0 and Java 1.2.2. All runs were done on the same Enterprise 420R system with no change in physical configuration.

"Generally Available" code is that which has the full faith and support of the vendor company. This code is production ready and recommended by the vendor. Table 1 above, gives us insight into the questions posed above about why it is interesting to look at a range of results!

  1. Tuning: definitions.

    The first data column of Table 1 describes the type of tuning used for each row of the benchmark.  gcocco software describes "high opt" tuning (ga0, ga1, ga2 and ga3) as highly optimized for a production environment, but perhaps "normal" in a benchmarker's eye. In "high opt", one always turns on as many "helpful", reasonable-risk parameters as possible. Parameters which are described as "experimental" or "subject to change" are not used. The exact set of parameters that were used in all the runs discussed on this page can be found by viewing the "SPEC full disclosure report" link in the last column of each result table. There is a full disclosure for each benchmark discussed.

    Row (ga4) of Table 1 is called "aggressive", and makes use of some "tricks of the trade" specific for benchmarking on Solaris. The "aggressive" parameters for tuning memory are not for the faint of heart. The ones used for these runs are describe in the Sun online manual: "autoup, tune_t_fsflush and rechoose_interval . For some light reading, go ahead and research this set of "tunables"(sic) at Sun's website. Be sure to look up what Sun describes in their manual as "Interface Stability". (You'll have to page down a few pages on the preceding link.)

    OK, so now that you know everything there is to know about these parameters, why should you be concerned? You are concerned because all three of the parameters listed above, which are used in all of Sun's published runs to date, are described in the Sun online manual as "Unstable"! OK, so "Unstable" in this sense means "subject to change", but none-the-less, you can be sure these are not parameters one should use casually (if at all)!

    Another important point to note, is that line (ga4) in Table 1 above used the three "aggressive" parameters described above. That run yielded [(9351-9069)/9351] = 0.03 or a 3% performance gain. So, in this case, is the potential gain of 3%, worth the risk of dealing with a changing, "unstable" interface? We would recommend that you use only parameters in production that are described as "stable" unless you have a very good reason to do so.

  2. Heap: The effect of heap size for the SPECjbb2000 workload

    A heap is a chunk of virtual memory allocated to a program for its' local storage needs. How efficiently the program (in this case JVM) manipulates the storage has a huge effect on the programs efficiency and scalability. The first three runs (ga1, ga2, ga3) are identical except for the variation in the heap size, from a modest 512 MB to almost 2 GB (the maximum allowed for this 32 bit Java Virtual Machine). Although most would expect the performance to be dramatically better with much more heap, it is not the case. [(9069-8853)/9069] = 0.02 or 2%

    So, is it worth it to dedicate 4 times as much memory to one application for the heap and only gain about 2% in performance? We don't think so, therefore in this case we would recommend running with a 512MB heap and save some money on real memory!

  3. JVM: Java Virtual Machine Level

    Java Virtual Machine(JVM) level 1.3.0_02 was the "then current", downloadable version of the Sun's recommended "Generally Available" code. It was used for runs ga1-ga4 in Table 1.

  4. Accepted: SPECjbb2000 Committee acceptance date

    This is the date of the SPECjbb2000 review meeting for this benchmark run. All Results quoted or displayed in this section have been reviewed and accepted by the OSG Java Subcommittee of the Standard Performance Evaluation Corporation (SPEC®).

  5. Result: operations/second (ops/s)

    The only approved metric used to compare SPECjbb2000 results from vendor to vendor is operations/second!


Back to Machine Tested


b. Results generated with the "Beta" JVM

Tuning Heap CPUs JVM Accepted Result Report
ß1 "high opt" 1024 MB 4 1.3.1 beta 03-May-2001 11,157 SPEC full disclosure
ß1a "high opt" 1024 MB 4 1.3.1_01 29-Nov-2001 11,672 SPEC full disclosure
ß2 "aggressive" 3786 MB 4 1.3.1 beta 03-May-2001 15,990 SPEC full disclosure
ß3 "aggressive"
plus ISM"
3786 MB 4 1.3.1 beta n/a fails ism core dump
ß4 "high opt" 2010 MB 4 1.4.0-beta3 10-Jan-2002 18,489 SPEC full disclosure

Table 2. Effect of tuning options for Java 1.3.1 Beta and 1.4.0 Beta.

Now let's look at the heart of getting really good benchmark numbers. In this case, it was done using "Beta" code. Beta code is sometimes experimental and may eventually make it into a product, but it may not! In this case, as ß1a shows, the actual performance achieved was slightly better when the product finally came out.

  1. Tuning: definitions.

    In recommending the use of the Beta JVM, Sun also recommended parameter changes for us to try. We were asked to use the new experimental heap parameter for these "aggressively tuned" runs. This parameter is appropriately named the "-XX:+AggressiveHeap" parameter. This and other standard as well as non-standard parameters are described on java.sun.com site. The "-XX" in the parameter name means the parameter is "Non-Standard" and "is subject to change in future releases". The "-XX:+AggressiveHeap" parameter allocates "almost" all of the available memory for the heap. The JVM then does "lazy" heap management to defer all work to a time outside of the benchmark measurement window. It is certainly much faster than expected, [(15,990-8,893)]/8,893 = 0.80 or 80%. If we use the newer version 1.4.0 beta, we see an even better speedup which is [(18,489-8,893)]/8,893 = 1.08 or 108% faster. If we were the hardware vendor for these runs, we would be quoting this as our "single number" indicating the performance of this machine!

    Now let's learn about the down-side. Read about the use of the Aggressive Heap Option on the Java site. Sun strongly cautions one about using this parameter except where the situation warrants and you have the skill to do so. It's explicitly not for the "casual user".

  2. Heap: actual heap size for the run

    This is always an important parameter. Compare the first "high opt" run (ß1) in Table 2 with run (ga2) from Table 1 which have the same heap size (1024 MB). The median size heap was chosen for comparison of of runs with "GA" code to the runs with "Beta" code. The comparison was also done on runs with "high opt" tuning parameters, similar to what one would be running in a highly tuned "production" environment. The performance gain is a very nice improvement over the "stable" and supported release of about [(11,157-8,893)]/8,893 = 0.25 or 25%. So far we like it!

    Now lets go back to the on-line manual, and look at this Intimate Shared Memory (ISM) option.

    Those of us who are OS people, usually just call this "pinned" or "locked down" virtual pages. That means, no one else can use this critical memory resource. Since no one else can "move" them, the operating system can skip lots of checks and gain on the order of a 5%-10% performance boost on the workload. This is worthwhile, and is sometimes done for servers which host performance critical workloads. However, if you can't or won't use ISM, remember to apply some performance degradation to any SPECjbb number you are looking at that Sun has produced.

    Tuning Heap CPUs JVM Accepted Result Report
    S1 "aggressive"
    plus ISM
    3900 MB 8 1.3.1 05-Apr-2001 43,353 SPEC full disclosure
    S2 "aggressive"
    plus ISM
    3900 MB 12 1.3.1 05-Apr-2001 62,463 SPEC full disclosure
    S3 "aggressive"
    plus ISM
    3900 MB 24 1.3.1 05-Apr-2001 109,146 SPEC full disclosure

    Table 3. All of the published benchmarks by Sun use the parameters that have been discussed above.

    Anyway, go back to the ISM on-line reference manual and study the part on "Problems and Dangers". Be sure to read the references for ISM which is described by Sun, as "not for the casual user"!

    Our experience with ISM on the E420R wasn't all that good. There was an embarrassing problem (for Sun) of an immediate core dump when trying to use this option. Basically, it appears that the combination of "Aggressive Heap" and "useISM" parameters was targeted for Sun's larger memory machines. As the truss data shows, it could never run on "our" 4GB system and Sun couldn't tell us how to make it work. After ignoring the "useISM" parameter problem for almost a year( Sun Java "Review (ID:120953) -XX:+UseISM causes core dump in 1.3.1 RC1") Sun assures us that this problem will not occur in the "released" version of Java 1.4.0.

    For the general case, we would never recommend using this "ISM" code in production based on Sun's own cautions. The question is if the dangers are worth the 7-10% improvement that "ISM" provides. We certainly don't think it is worth it.

  3. JVM: Java Virtual Machine Level

    Java Virtual Machine(JVM) Beta level "1.3.1 RC1" was the "then current (4/2001)", downloadable version of the lastest and greatest "beta". We were personally asked by a Sun Performance Manager to switch to this level of code and make our runs on this level of JVM. It was used for all runs in Table 2. And should be quite similar to what Sun has used for its runs in Table 3.

  4. Accepted: Date SPECjbb2000 Committee accepted run.

    This is the date of the SPECjbb2000 review meeting for this benchmark run. All Results quoted or displayed in this section have been reviewed and accepted by the OSG Java Subcommittee of the Standard Performance Evaluation Corporation (SPEC®).

  5. Result: operations/second (ops/s)

    The only approved metric used to compare SPECjbb2000 results from vendor to vendor is operations/second!


Back to Machine Tested


c. Performance Range Variability from 6,000 to 18,000 ops/sec!

So, if the basic question is how fast is a certain piece of hardware, the data below presents a dilemma. If the same exact machine performs SPECjbb2000 from 6K to 18K ops/sec, which value should you use? The following table summarizes the 3 runs in question:

Tuning Heap CPUs JVM Accepted Result Report
ga0 "high opt" 2010 MB 4 1.2.2_07a 13-Dec-2001 6,153 SPEC full disclosure
ß1a "high opt" 1024 MB 4 1.3.1_01 29-Nov-2001 11,672 SPEC full disclosure
ß4 "high opt" 2010 MB 4 1.4.0-beta3 10-Jan-2002 18,489 SPEC full disclosure

Table 4. Summary of performance range of runs on identical hardware.

Table 4 above shows the gains that can be made through software improvements only on the same hardware. Tripling your performance on the same workload on the identical platform is impressive.

Basically, ga3 uses the old version 1.2.2 JVM which ships as the standard JVM on Solaris. If you don't do anything, this is what you get.

If you instead download the now (1/2002) current JVM, you will get results similar to ß1a.

If you are willing to live on the bleeding edge, you can pull down the version 1.4 beta (ß4). You will see performance like the 18K ops/sec result. This is of course not ready for prime time, but it has been shown in the past to be a good indicator of the performance you can get on the next release of the JVM code.


Back to Machine Tested


d. Performance Boost using Intimate Shared Memory (ISM) is at least 8%.

Intimate Shared Memory (ISM) is one of the options that Sun uses to squeeze out the last few percent of the performance on its' machine. While it is a "real" option and can be used, it is described as "dangerous" by Sun themselves (see discussion above) .

So we set out to give you some data points on why they are using it in spite of their own best advice. The answer is that with the addition of this parameter, Sun can boost its' results about 8-10%. This could mean the difference in claiming leadership performance and being second or third. So it is clear that Sun will use this for its' benchmarks, but you have to factor this in to your interpretation of Sun's results.

Tuning Heap CPUs JVM Run Date Result Report
256 high 256 MB 4 1.3.1_01 05-Dec-2001 10,846 full disclosure
i256 high,ism 256 MB 4 1.3.1_01 04-Dec-2001 11,711 full disclosure
512 high 512 MB 4 1.3.1_01 04-Dec-2001 11,412 full disclosure
i512 high,ism 512 MB 4 1.3.1_01 04-Dec-2001 12,392 full disclosure
1024 high 1024 MB 4 1.3.1_01 04-Dec-2001 11,674 full disclosure
i1024 high,ism 1024 MB 4 1.3.1_01 03-Dec-2001 12,432 full disclosure
2010 high 2010 MB 4 1.3.1_01 01-Dec-2001 11,846 full disclosure
i2010 high,ism 2010 MB 4 1.3.1_01 30-Nov-2001 12,816 full disclosure

Note: Results in the above table have been tested in accordance with SPEC run rules but have not been submitted for formal review.


Back to Machine Tested


e. Run to Run Reproducability

By their very nature, benchmarks push systems into a very extreme runtime condition. How well a system performs under such a stress-load is a testament to its' design. We decided to regression test a larger sample of runs to see what sort of distribution of results we would achieve.

With time constraints a factor, we ran 23 identical runs and only 30% would be considered valid benchmark runs. This system was obviously running at the edge or beyond its' design envelope. This is usually how most if not all "one number benchmarking" takes place! Keep running until you get a really good one and submit it for review!


Back to Machine Tested


f. Tricks/Tips/Cautions:
  • No I/O during Benchmark
    This is important if you "actually" do I/O! For more background on the design of this benchmark read the SPECjbb2000 Whitepaper.
  • SMP "scaling" test
    SPECjbb2000 is useful as an indicator for scaling (of this workload). If you have to decide on an upgrade (for instance), all the data you need can be found in the full disclosures on the SPEC website. Symmetric Multi-Processor (SMP) issues generally come down to "scaling", so, if your workload is like SPECjbb2000, you have your indicator!
  • Potentially large variability in runs
    Software platforms for this class of machines are complex. The combinations of Hardware, Operating System, Java Version as well as the fixes applied to these components is mind numbing. Care must be exersized by the system administrator or system evaluator since a large range of performance can be had on the SAME hardware (even if each combination of components is tuned to its' best possible performance).

Back to Machine Tested


g. Net/Net - The Performance you should expect!

For production use, with the "Generally Available" code, you should expect the Sun Enterprise 420R to be about 6150 ops/sec if you must use a JVM that is 1.2.2 compatible. Otherwise, you can now get about 8900 ops/sec with version 1.3.0, 11,500 ops/sec with version 1.3.1 or 18,500 ops/sec on the latest version 1.4.0 code.

Back to Machine Tested | Back to top

gcocco home
webmaster at gcocco dot com
Copyright © 1998 - 2006 gcocco software, inc. All Rights Reserved.
Page generated on: Saturday 30 September 2006 at 12:12:43 PM
URL: http://www.gcocco.com