SPECweb2009 Release 1.10 Run and Reporting Rules

Version 1.10, Last modified 2009-09-10

(To check for possible updates to this document, please see http://www.spec.org/web2009/docs/runrules.html)


1.0 Introduction
1.1 Philosophy
1.2 Fair Use of SPECweb2009 Results
1.3 Research and Academic Usage
1.4 Caveat
2.0 Running the SPECweb2009 Benchmark
2.1 Environment
2.1.1 Power and Temperature
2.1.2 Protocols
2.1.3 Testbed Configuration
2.1.4 System Under Test (SUT)
2.2 Measurement
2.2.1 Power Measurement
2.2.2 Load Generation
2.2.3 Benchmark Parameters
2.2.4 Running SPECweb2009 Workloads
2.3 Workload Filesets
2.3.1 Banking Fileset
2.3.2 Ecommerce Fileset
2.3.3 Support Site Fileset
2.4 Dynamic Request Processing
3.0 Reporting Results
3.1 Metrics and Reference Format
3.1.1 Categorization of Results
3.2 Testbed Configuration
3.2.1 SUT Hardware
3.2.2 SUT Software
3.2.2.1 SUT Software Tuning Allowances
3.2.2.2 SUT Software Tuning Limitations
3.2.3 Network Configuration
3.2.4 Clients
3.2.5 Backend Simulator (BeSim)
3.2.6 Measurement Devices
3.2.7 General Availability Dates
3.2.8 Rules on Community Supported Applications
3.2.9 Test Sponsor
3.2.10 Notes
3.3 Log File Review
4.0 Submission Requirements for SPECweb2009
5.0 The SPECweb2009 Benchmark Kit

1.0 Introduction


SPECweb2009 is the first web server benchmark for evaluating the power and performance of server class web serving computers. This document specifies the guidelines on how SPECweb2009 is to be run for measuring and publicly reporting power and performance results of servers. These rules abide by the norms laid down by SPEC in order to ensure that results generated with this benchmark are meaningful, comparable to other generated results, and repeatable, with documentation covering factors pertinent to reproducing the results. Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

The general philosophy behind the rules of SPECweb2009 is to ensure that an independent party can reproduce the reported results.

The following attributes are expected:

The SPECweb2009 benchmark is based on the SPECweb2005 benchmark, with the addition of measuring power of web applications. The average power usage at the maximum load level is reported for all three original workloads. In addition, the power metric is based on running the Ecommerce workload at various load levels relative to the maximum load level.

The SPECweb2009 power workload is based on the methodology outlined by the SPECpower group.

Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for System Under Test (SUT) configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:

SPEC requires that any public use of results from this benchmark follow the SPEC OSG Fair Use Policy and those specific to this benchmark (see Fair Use section below).  In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.

1.2 Fair Use of SPECweb2009 Results

The SPECweb2009 benchmark uses different metric names, depending on the type of pages served.

For PHP, the metrics will be known by the following names:
SPECweb2009_PHP_Peak, SPECweb2009_PHP_Power, SPECweb2009_PHP_Banking, SPECweb2009_PHP_Ecommerce, and SPECweb2009_PHP_Support.

For JSP, the metrics will be known by the following names:
SPECweb2009_JSP_Peak, SPECweb2009_JSP_Power, SPECweb2009_JSP_Banking, SPECweb2009_JSP_Ecommerce, and SPECweb2009_JSP_Support.

For ASP.NET, the metrics will be known by the following names:
SPECweb2009_ASPX_Peak, SPECweb2009_ASPX_Power, SPECweb2009_ASPX_Banking, SPECweb2009_ASPX_Ecommerce, and SPECweb2009_ASPX_Support.

In this section, these metrics are denoted as SPECweb2009_(JSP/PHP/ASPX)_*, where (JSP/PHP/ASPX) stands for either JSP, PHP or ASPX.

When public disclosures and competitive comparisons are made using SPECweb2009 results, the following benchmark specific rules apply:

  1. Results from a fully compliant run of the SPECweb2009 suite must be used when making competitive comparisons.  A fully compliant run consists of a valid run of each workload in the suite:  SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce, SPECweb2009_(JSP/PHP/ASPX)_Support, SPECweb2009_(JSP/PHP/ASPX)_Power and the associated full disclosure report. 
  2. Only the following metrics and submetrics generated from a complete and compliant set of results for all four workloads may be used:  SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce, SPECweb2009_(JSP/PHP/ASPX)_Support, SPECweb2009_(JSP/PHP/ASPX)_Power and the overall performance metric SPECweb2009_(JSP/PHP/ASPX)_Peak.
  3. The main metrics, SPECweb2009_(JSP/PHP/ASPX)_Power and SPECweb2009_(JSP/PHP/ASPX)_Peak, as well as the submetrics, SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce and SPECweb2009_(JSP/PHP/ASPX)_Support, may only be published after at least one compliant result has been reviewed and accepted from that licensee test location (see section 3.0.1).
  4. Individual metrics (Simultaneous User Sessions and Power for Banking, Ecommerce and Support; and average simultaneous user sessions/watt or asus/watt for Power) may be used while comparing individual results (above limitations apply). These comparisons must include a reference to the SPEC publication.
  5. Median Aggregate QoS Compliance and/or Total Weighted Aggregate Byte Rate values may be used to distinguish between SPECweb2009 workload specific submetrics at the same value.
  6. Different scripting languages tend to have different workload characteristics and are based on technologies that are not directly comparable. Therefore, only comparisons between results using the same scripting language are allowed. For instance, comparison between results using JSP and the ones that use PHP or ASPX will not be allowed.
  7. Comparisons between SPECweb2009 and SPECweb2005 results are not allowed. Though the benchmark design for SPECweb2009 is fundamentally derived from that for SPECweb2005, hardware and software used by submitters of SPECweb2009 are likely to be drastically different than the ones used in SPECweb2005 due to the addition of the new Power metric. Moreover, unlike SPECweb2005, SPECweb2009 does not use a reference platform to normalize the submetrics that feed into the main benchmark metric.
  8. Comparisons between details other than the primary metrics and submetrics should include both primary metrics (SPECweb2009_(JSP/PHP/ASPX)_Peak and SPECweb2009_(JSP/PHP/ASPX)_Power) in close proximity of the comparison.

"Close proximity" as used above is defined to mean in the same paragraph, in the same font style and size, and either within 100 words or on the same presentation slide. The following paragraphs are examples of acceptable language when publicly using SPECweb2009.

SPEC expects that the following template be used:

SPEC and SPECweb are registered trademarks of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of <date>. [The comparison presented is based on <basis for comparison>].  For the latest SPECweb2009 results visit http://www.spec.org/osg/web2009.
(Note: [...] above required only if selective comparisons are used.)

Example:

SPECweb2009 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of September 12, 2009. The comparison presented is based on best performing 4-core Single Node Platform servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECweb2009 results visit http://www.spec.org/osg/web2009.

The rationale for the template is to provide fair comparisons, by ensuring that:

1.3 Research and Academic Usage

SPEC encourages use of the SPECweb2009 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware or software that simply cannot be expected to function reliably for the length of time required for completing a compliant data point, or may use research hardware and/or software components that are not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.

Where the rules cannot be followed, the deviations from the rules must be disclosed. SPEC requires these noncompliant results be clearly distinguished from results officially submitted to SPEC or those that may be published as valid SPECweb2009 results. For example, a research paper can use simultaneous sessions but may not refer to them as SPECweb2009 results if the results are not compliant.  

1.4 Caveat

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECweb2009 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to this document and will rename the metrics.

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URLs may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC Web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite.


2.0 Running the SPECweb2009 Benchmark

2.1 Environment

2.1.1 Power and Temperature

This section outlines some of the Environmental and other electrical requirements related to power measurement while running the SPECweb2009 benchmark.

Line Voltage Source

The preferred Line Voltage source used for measurements is the main AC power as provided by local utility companies. Power generated from other sources often has unwanted harmonics which are incapable of being measured correctly by many power analyzers, and thus would generate inaccurate results.

The usage of an uninterruptible power source (UPS) as the line voltage source is allowed, but the voltage output must be a pure sine-wave. For placement of the UPS, see SPECpower_ssj2008 Run and Reporting Rules section 2.13.1. This usage must be specified in the Notes section of the FDR.

If an unlisted AC line voltage source is used, a reference to the standard must be provided to SPEC. DC line voltage sources are currently not supported.

For situations in which the appropriate voltages are not provided by local utility companies (e.g. measuring a server in the United States which is configured for European markets, or measuring a server in a location where the local utility line voltage does not meet the required characteristics), an AC power source may be used, and the power source must be specified in the notes section of the disclosure report. In such situation the following requirements must be met, and the relevant measurements or power source specifications disclosed in the notes section of the disclosure report:

The intent is that the AC power source not interferes with measurements such as power factor by trying to adjust its output power to improve the power factor of the load.

Environmental Conditions

SPEC requires that power measurements be taken in an environment representative of the majority of usage environments. The intent is to discourage extreme environments that may artificially impact power consumption or performance of the server.

SPECweb2009 requires the following environmental conditions to be met:

Power Analyzer Setup

The power analyzer must be located between the AC Line Voltage Source and the SUT. No other active components are allowed between the AC Line Voltage Source and the SUT.

Power analyzer configuration settings that are set by SPEC PTDaemon must not be manually overridden.

Power Analyzer Specifications

To ensure comparability and repeatability of power measurements, SPEC requires the following attributes for the power measurement device used during the benchmark. Please note that a power analyzer may meet these requirements when used in some power ranges but not in others, due to the dynamic nature of power analyzer Accuracy and Crest Factor. The usage of power analyzer’s auto-ranging function is discouraged.

Uncertainty and Crest Factor

For example:

An analyzer with a vendor-specified uncertainty of +/- 0.5% of reading +/- 4 digits, used in a test with a maximum wattage value of 200W, would have "overall" uncertainty of (((0.5%*200W)+0.4W)=1.4W/200W) or 0.7% at 200W.

An analyzer with a wattage range 20-400W, with a vendor-specified uncertainty of +/- 0.25% of range +/- 4 digits, used in a test with a maximum wattage value of 200W, would have "overall" uncertainty of (((0.25%*400W)+0.4W)=1.4W/200W) or 0.7% at 200W.

Temperature Sensor Specifications

Temperature must be measured no more than 50mm in front of (upwind of) the main airflow inlet of the SUT. To ensure comparability and repeatability of temperature measurements, SPEC requires the following attributes for the temperature measurement device used during the benchmark:

Supported and Compliant Devices

See Accepted Measurement Devices list (http://spec.org/power_ssj2008/docs/device-list.html) for a list of currently supported (by the benchmark software) and compliant (in specifications) power analyzers and temperature sensors.

2.1.2 Protocols

As the WWW is defined by its interoperative protocol definitions, SPECweb2009 requires adherence to the relevant protocol standards. It is expected that the Web server is HTTP 1.1 compliant. The benchmark environment shall be governed by the following standards:

To run SPECweb2009, in addition to all the above standards, SPEC requires the SUT to support SSLv3 as defined in the following:

Of the various ciphers supported in SSLv3, cipher SSL_RSA_WITH_RC4_128_MD5 is currently required for all workload components that use SSL.  It was selected as one of the most commonly used SSLv3 ciphers and allows results to be directly compared to each other. SSL_RSA_WITH_RC4_128_MD5 consists of:

A compliant result must use the cipher suite listed above, and must employ the 1024 bit key for RSA public key encryption, 128-bit key for RC4 bulk data encryption, and have a 128-bit output for the Message Authentication code.

For further explanation of these protocols, the following might be helpful:


The current text of all IETF RFC's may be obtained from: http://ietf.org/rfc.html

All marketed standards that a software product states as being adhered to must have passed the relevant test suits used to ensure compliance with the standards. For example, In the case of Java Servlet Pages, one must pass the published test suites from Sun.

2.1.3 Testbed Configuration

These requirements apply to all hardware and software components used in producing the benchmark result, including the System under Test (SUT), network, and clients.

2.1.4 System Under Test (SUT)

For a run to be valid, the following attributes must hold true in addition to the requirements listed under section 2.1.3 for the Testbed configuration:

2.2 Measurement

2.2.1 Power Measurement

The measurement of power should be in accordance with Section 2.1.1 and the SPECpower Methodology. The SPECweb2009 benchmark tool set provides the ability to automatically gather measurement data from supported power analyzers and temperature sensors and integrate that data into the benchmark result. SPEC requires that the analyzers and sensors used in a submission be supported by the measurement framework, and be compliant with the specifications in the following sections. The provided SPECweb2009 tools (or a newer version provided by SPECpower) must be used to run and produce measured SPECweb2009 results.

2.2.2 Load Generation

In the benchmark run, a number of simultaneous user sessions are requested. Typically, each user session would start with a single thread requesting a dynamically created file or page. Following the receipt of this file and the need to request multiple embedded files within the page, two threads corresponding to that user session actively make connections and request files on these connections. The number of threads making requests on behalf of a given user session is limited to two, in order to comply with the HTTP 1.1 recommendations.

The load generated is based on page requests, transition between pages and the static images accessed within each page, as defined in the SPECweb2009 Design Document.

The QoS requirements for each workload are defined in terms of two parameters, Time_Good and Time_Tolerable. QoS requirements are page based, Time_Good and Time_Tolerable values are defined separately for each workload (Time_Tolerable > Time_Good). For each page, 95% of the page requests (including all the embedded files within that page) are expected to be returned within Time_Good and 99% of the requests within Time_Tolerable.  Very large static files (i.e. Support downloads) use specific byte rates as their QoS requirements.

The validation requirement for each workload is such that less than 1% of requests for any given page and less than 0.5% of the all page requests in a given test iteration fail validation.

It is required in this benchmark that all user sessions be run at the HIGH-SPEED-INTERNET speed of 100,000 bytes/sec.

In addition, the URL retrievals (or operations) performed must also meet the following quality criteria:

Note: The Weighted Percentage Difference for any given workload page is calculated using the following formulas:

WPD = PageMix% * ETR

ETR = (#Sessions * RunTime) / (ThinkTime * %RwTT + AvgRspTime)


Where:

Workload Page Mix Percentage Table

Banking

Mix %

 

Ecommerce/Power

Mix%

 

Support

Mix%

Acct summary

15.11%

 

billing

3.37%

 

catalog

11.71%

add payee

1.12%

 

browse

11.75%

 

download

6.76%

bill pay

13.89%

 

browse product

10.03%

 

file

13.51%

bill pay status

2.23%

 

cart

5.30%

 

file catalog

22.52%

check detail html

8.45%

 

confirm

2.53%

 

home

8.11%

check image

16.89%

 

customize1

16.93%

 

product

24.78%

change profile

1.22%

 

customize2

8.95%

 

search

12.61%

Login

21.53%

 

customize3

6.16%

     

logout

6.16%

 

index

13.08%

     

payee info

0.80%

 

login

3.78%

     

Post check order

0.88%

 

product detail

8.02%

     

Post fund transfer

1.24%

 

search

6.55%

     

Post profile

0.88%

 

shipping

3.55%

     

quick pay

6.67%

           

request checks

1.22%

           

req xfer form

1.71%

           

The Workload Page Mix Percentages as well as QoS requirements for each page, must be met at every step for the Power run.

2.2.3 Benchmark Parameters

Workload-specific configuration files are supplied with the harness. All configurable parameters are listed in these files. For a run to be valid, all the parameters in the configuration files must be left at default values, except for the ones that are marked and listed clearly as "Configurable Workload Properties".

2.2.4 Running SPECweb2009 Workloads

SPECweb2009 contains three distinct workloads (Banking, Ecommerce, and Support) and the stepped run for the Power metric using the Ecommerce workload. The benchmarker may:

For a valid run, the following restrictions must be observed:

  1. A superset of all hardware components needed to run all the workloads must stay connected for the duration of the test, must be powered on at the beginning of each test, and the application must be ready to perform operations, throughout the duration of the test including all four workloads.

  2. The unused hardware components must be connected, but can be powered off or disabled by using an automated daemon or method that resides on the SUT. Automation should be used to power on these components, when done so.

A valid run must comply with the following:

 

2.3 Workload Filesets

The particular files referenced shall be determined by the workload generation in the benchmark itself. A fileset for a workload consists of content that the dynamic scripts reference. This represents images, static content, and also "padding" to bring the dynamic page sizes in-line with that observed in real-world Web sites. All filesets are to be generated using the Wafgen fileset generator supplied with the benchmark tools. It is the responsibility of the benchmarker to ensure that these files are placed on the SUT so that they can be accessed properly by the benchmark. These files and only these files must be used as the target fileset. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.

Separate filesets are associated with Banking, Ecommerce and Support workloads. The Power workload uses the same fileset as the Ecommerce workload. The SUT is required to be configured with the storage to contain all necessary software and logs for compliant runs of all four workloads.  At a minimum, the system must also be configured to contain the largest of the three filesets (Banking, Ecommerce, and Suppport) such that each of the other two workload filesets can be mapped into to the same storage footprint.  If the system has not been configured to contain storage to hold the filesets for all three workloads concurrently, then the benchmarker must not add or remove storage hardware while switching workloads.   The disclosure details must indicate whether the filesets were stored concurrently or remapped between workload runs.

2.3.1 Banking Fileset

For the Banking workload, we define two types of files:

1. The embedded image files, which do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The number of check images increase linearly with the number of simultaneous connections supported. For each connection supported, we would maintain check images for 50 users, each in its own directory. For each user defined, there will be 20 check images maintained, 10 representing the front of the checks and the other 10 representing the back of the checks.

The above assumes that under high load conditions in a banking environment, we would expect to see no more than 1% of the banking customers logged in at the same time.

2.3.2 Ecommerce Fileset

For the Ecommerce workload, two types of files are defined:

1. The embedded image files that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The product images, which increase linearly with the number of simultaneous sessions requested. For each simultaneous session, 5 "product line" directories are created. Each product line directory contains images for 10 different "products". Each product has 3 different sizes, representing the various views of products that are often presented to users (i.e., thumbnails, medium-sized, and larger close-up views).

2.3.3 Support Site Fileset

For the support site workload, two types of files are defined:

1. The embedded image files that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The file downloads, which increase linearly with the number of simultaneous sessions requested. The ratio of simultaneous sessions to download directories is 4:1. Each directory contains downloads for 5 different categories (i.e. flash BIOS upgrades, video card drivers, etc.).  The file sizes were determined by analyzing the file sizes observed at various hardware vendors' support sites.

2.4 Dynamic Request Processing

SPECweb2009 follows a page based model, identical to SPECweb2005. Each page is initiated by a dynamic GET or POST request, which runs a dynamic script on the server and returns a dynamically created Web page. Associated with each dynamic page, are a set of static files or images, which the client requests right after the receipt of the dynamically created page. The page returned is marked as complete when all the associated images/static files for that page are fully received.

Only the dynamic scripts provided in the benchmark kit may be used for submissions/publications. The current release provides implementations in PHP, JSP and ASP.NET.

The pseudo code reference specifications are the standard definition of the functionality. Any dynamic implementation must follow the specification exactly.

For new dynamic implementations, the submitter must inform the subcommittee at least one month prior to the actual code submission.  All dynamic implementations submitted to SPEC must include a signed permission to use form and must be freely available for use by other members and licensees of the benchmark.  Once the code has been submitted, the subcommittee will then review the code for a period of four months.  Barring any issues with the implementation, the subcommittee will then incorporate the implementation into a new version of the benchmark.

Acceptance of any newly submitted dynamic code for future releases will include testing conformance to pseudo code as well as running of the code on other platforms by active members of the subcommittee. This will be done in order to ensure compliance with the letter and spirit of the benchmark, namely whether the scripts used to code the dynamic requests are representative of scripts commonly in use within the relevant customer base.  An acceptable scripting language must meet the following requirements:


3.0 Reporting Results

3.0.1 Publication

SPEC requires that each licensee test location (city, state/province and country) measure and submit a single compliant result for review, and have that result accepted, before publicly disclosing or representing as compliant any SPECweb2009 result. Only after acceptance of a compliant result from that test location by the subcommittee may the licensee publicly disclose any future SPECweb2009 result produced at that location in compliance with these run and reporting rules, without acceptance by the SPECweb subcommittee. The intent of this requirement is that the licensee test location demonstrates the ability to produce a compliant result before publicly disclosing additional results without review by the subcommittee.

SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Licensees, who have met the requirements stated above, may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.

3.1 Metrics and Reference Format

SPECweb2009 will have two main metrics:

SPECweb2009_(JSP/PHP/ASPX)_Peak represents the geometric mean of SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce and SPECweb2009_(JSP/PHP/ASPX)_Support @ X watts, where X is the geometric mean of the average watts consumed while running each of these workloads.

SPECweb2009_(JSP/PHP/ASPX)_Power on the other hand represents the ratio of the sum of the number of sessions to the sum of the watts while running the Ecommerce workload at the six different load levels (100%, 80%, 60%, 40%, 20% and 0% relative to the maximum score attained in SPECweb2009_(JSP/PHP/ASPX)_Ecommerce).

Other than these, the benchmark will also include submetrics SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce, and SPECweb2009_(JSP/PHP/ASPX)_Support, each of which represents the maximum number of simultaneous sessions that the SUT can support while running the Banking, Ecommerce and Support workloads and  meeting the QoS requirements for TIME_GOOD and TIME_TOLERABLE. The QoS requirements for each step of the Power run are exactly the same as that required for the Ecommerce run at full load.

Given that the benchmark supports three types of scripts (PHP, ASPX and JSP) and since workloads run with each script type are not comparable to the other, the metric names are accordingly distinct. When running with the PHP scripts, the main metrics are SPECweb2009_PHP_Peak and SPECweb2009_PHP_Power. The corresponding workload metric names for the PHP runs are SPECweb2009_PHP_Banking, SPECweb2009_PHP_Ecommerce and SPECweb2009_PHP_Support. When running with the JSP scripts, the main metrics are SPECweb2009_JSP_Peak and SPECweb2009_JSP_Power. The corresponding workload metrics are SPECweb2009_JSP_Banking, SPECweb2009_JSP_Ecommerce and SPECweb2009_JSP_Support. When running with the ASP.NET scripts, the main metrics are SPECweb2009_ASPX_Peak and SPECweb2009_ASPX_Power. The corresponding workload metrics are SPECweb2009_ASPX_Banking, SPECweb2009_ASPX_Ecommerce and SPECweb2009_ASPX_Support. Note that the metric and submetric names in the rest of this document will not include the script name at all places where the description is generic, taking the format of SPECweb2009_Type rather than SPECweb2009_Script_Type.

Runs for Banking, Ecommerce and Support include three iterations. Each iteration for the Banking, Ecommerce and Support runs consists of a minimum 3 minute thread ramp up, a minimum 5 minute warm up period, and a 30 minute measurement period (i.e. run time; which may be increased to ensure at least 100 requests for each page type are completed where the load is minimal).   There are also corresponding rampdown periods (3 minutes + 5 minutes) between iterations.

All intervals of complete runs for the Banking, Ecommerce and Support workloads are shown in Figure 1.

SPECweb2009_phase_diagram_1.gif

The intervals of a Power workload run are shown in Figure 2.

SPECweb2009_phase_diagram_2.gif

The metrics SPECweb2009_(JSP/PHP/ASPX)_Peak, SPECweb2009_(JSP/PHP/ASPX)_Power and individual workload metrics (SPECweb2009_(JSP/PHP/ASPX)_Banking, SPECweb2009_(JSP/PHP/ASPX)_Ecommerce, and SPECweb2009_(JSP/PHP/ASPX)_Support) may not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric for some other system configuration.

The report of results for the SPECweb2009 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed without prior SPEC approval. The tools perform error checking and will flag some error conditions as resulting in an "invalid run".  However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.

SPEC reviews and accepts for publication on SPEC's website only a complete and compliant set of results for all four workloads run and reported according to these rules.  Any public disclosure of either the main metrics or the individual metrics should follow the formal review and acceptance process by SPEC.All public disclosures must adhere to the Fair Use Rules.

3.1.1 Categorization of Results

SPECweb2009 results will be categorized separately based on the script set used.
The current release supports PHP, JSP and ASP.NET scripts. Therefore, there will be three main categories of results, PHP, JSP and ASPX. In order to keep the results in the three categories separate and minimize confusion, the name of the script type will be appended to the name of each metric after the SPECweb2009. For example, in the PHP category, the main metrics will be known as SPECweb2009_PHP_Peak and SPECweb2009_PHP_Power; the workload submetrics will be labeled as SPECweb2009_PHP_Banking, SPECweb2009_PHP_Ecommerce and SPECweb2009_PHP_Support. Similarly, in the JSP category, the main metrics will be SPECweb2009_JSP_Peak and SPECweb2009_JSP_Power; and the workload submetrics will be labeled as SPECweb2009_JSP_Banking, SPECweb2009_JSP_Ecommerce and SPECweb2009_JSP_Support. Finally, in the ASPX category, the main metrics will be SPECweb2009_ASPX_Peak and SPECweb2009_ASPX_Power; and the workload submetrics will be labeled as SPECweb2009_ASPX_Banking, SPECweb2009_ASPX_Ecommerce and SPECweb2009_ASPX_Support.

The current release of the benchmark only supports single node results, since the current harness does not support aggregation of power readings which is necessary if multi-node platforms were to be supported. Moreover, the methodology outlined here does not describe the details of multi-platform power or temperature measurements.

A Single Node Platform for SPECweb2009 consists of one or more processors executing a single instance of a first level supervisor software, i.e. an operating system or a hypervisor hosting one or more instances of the same guest operating system, where one or more instances of the same web server software are executed on the main operating system or the guest operating systems. Externally attached storage for software and filesets may be used; all other performance critical operations must be performed within the single server node. A single common set of NICs must be used across all 4 workloads to relay all HTTP and HTTPS traffic.


Example:

                                |
test harness (clients,switches)=|=Server NICs:Server Node:Storage
                                |



If a separate load balancing appliance is used, it must be included in the SUT's definition and the power measurements presented for the SUT must include the power to the load balancer.

 

3.2 Testbed Configuration

All system configuration information required to duplicate published performance results must be reported. Tunings not in default configuration for software and hardware settings including details on network interfaces must be reported.

3.2.1 SUT Hardware

No SUT hardware may be added or removed between workload runs or during a workload run.  All hardware must be powered up at the beginning of each workload run and be application accessible through the duration of the run. However, hardware may be reconfigured for each workload. The FDR must include the configuration and use of hardware for each workload.

The following SUT hardware components must be reported:

3.2.2 SUT Software

The following SUT software components must be reported:

3.2.2.1 SUT Software Tuning Allowances

The following SUT software tunings are acceptable:

3.2.2.2 SUT Software Tuning Limitations

The following SUT software tunings are not acceptable:

3.2.3 Network Configuration

A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:

3.2.4 Clients

The following load generator hardware components must be reported:

3.2.5 Backend Simulator (BeSim)

The following BeSim hardware and software components must be reported:

Note: BeSim API code is provided as part of the SPECweb2009 kit, and can be compiled in several different ways: ISAPI, NSAPI, or FastCGI. For more information, please see the User's Guide.

3.2.6 Measurement Devices

The following properties must be reported:
Since auto-ranging is discouraged, if auto-ranging is used, an explanation of the reason must be provided. Also, the ranges used by the analyzer during auto-ranging must be readable by the SPEC PTDaemon in order to ensure that an uncertainty calculation can be made.

3.2.7 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be generally available on or before date of publication, or within 3 months of the date of publication (except where precluded by these rules, see section 3.2.8). With multiple components having different availability dates, the latest availability date must be listed.

Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.

Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client and BeSim systems.

Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years. For support of products that use Open Source, the reader is referred to Section 3.2.8.

In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.

If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds any performance metric to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a new corrected test result.

3.2.8 Rules on Community Supported Applications

In addition to the requirements stated in OSG Policy Document, the following guidelines will apply for a SPECweb2009 submission that relies on Community Supported Applications.

SPECweb2009 does permit Community Supported Applications outside of a commercial distribution or support contract which meet the following guidelines.

The following are the rules that govern the admissibility of any Community Supported Application in the context of a benchmark run or implementation.

  1. Open Source operating systems or hypervisors would still require a commercial distribution and support. The following rules do not apply to Operating Systems used in the publication.
  2. Only a "stable" release can be used in the benchmark environment; “non-stable" releases (alpha, beta, or release candidates) cannot be used. A stable release must be unmodified source code or binaries as downloaded from the Community Supported site. A "stable" release is one that is clearly denoted as a stable release or a release that is available and recommended for general use.  It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre-released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use. The 3 month General Availability window (outlined in section 3.2.7 above) does not apply to Community Supported Applications, since volunteer resources make predictable future release dates unlikely.
  3. The initial "stable" release of the application must be a minimum of 12 months old.
    Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the first three months to meet SPEC's forward availability window.
  4. At least two additional stable releases (major, minor, or bug fix) must have been completed, announced and shipped beyond the initial stable release.
    Reason: This helps establish a track record for the project and shows that it is actively maintained.
  5. The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/.
  6. The "stable" release used in the actual test run must be the current stable release at the time the test result is run or the prior "stable" release if the superseding/current "stable" release will be less than 3 months old at the time the result is made public.
  7. The "stable" release used in the actual test run must be no older than 18 months.  If there has not been a "stable" release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements.  An exception may be made for mature projects (see below).
  8. In rare cases, open source projects may reach maturity where the software requires little or no maintenance and there may no longer be active development.  If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is less than 20,000 lines, then a request can be made to the subcommittee to grant this software mature status.  This status may be reviewed semi-annually.  An example of a mature project would be the FastCGI library.

3.2.9 Test Sponsor

The reporting page must list the date the test was performed, month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.

3.2.10 Notes

This section is used to document:

3.3 Log File Review

The following additional information may be required to be provided for SPEC's results review:

The submitter is required to keep the entire log file from both the SUT and the BeSim box, for each of the four workloads, for the duration of the review period.


4.0 Submission Requirements for SPECweb2009

Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the following:

Once you have the submission ready, place the combined raw file in a zip file and attach this zip file to an email to subweb2009@spec.org. Note, only one raw result file per zip file is allowed; however, multiple zip files can be attached to the email to the submission drop alias.

Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee.  


5.0 The SPECweb2009 Benchmark Kit

SPEC provides client driver software, which includes tools for running the benchmark and reporting its results.  This client driver is written in Java; precompiled class files are included in the jar files of the kit, so no build step is necessary. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used; except that necessary substitution of equivalent functionality (e.g. fileset generation) may be done only with prior approval from SPEC. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.

The kit also includes Java code for the file set generator (Wafgen) and C code for BeSim.

SPEC also provides server-side script code for each workload. In the current release, PHP, JSP and ASP.NET scripts are provided. These scripts have been tested for functionality and correctness on various operating systems and Web servers. Hence all submissions must use one of these script implementations. Any new dynamic script implementation will be evaluated by the subcommittee according to the acceptance process (see section 2.4 Dynamic Request Processing).

Once the code is accepted by the subcommittee, it will be made available on the SPEC Web site for any licensee to use in their tests/submissions. Upon approval, the new implementation will be made available in future releases of the benchmark and may not be used until after the release of the new version.

The kit also includes the PTDaemon used for power and temperature measurement.


Copyright © 2009 Standard Performance Evaluation Corporation.  All rights reserved.

Java® is a registered trademark of Sun Microsystems.