Metrics Gathered

WARNING: This page is still being actively developed.

This document contains information about the metrics gathered in Browsertime tests, as well as detailed information about how they are gathered.

Pageload Tests

For browsertime pageload tests, there is a limited set of metrics that we collect (which can easily be expanded). Currently we divide these into two sets of metrics: (i) visual metrics, and (ii) technical metrics. These are gathered through two types of tests called warm and cold pageload tests. We have combined these two into a single “Chimera” mode which you’ll find in the Treeherder tasks.

Below, you can find the process of how we run Warm, Cold, and Chimera pageload tests.

Warm Pageload

In this pageload test type, we open the browser, then repeatedly navigate to the same page without restarting the browser in between cycles.

  • A new, or conditioned, browser profile is created

  • The browser is started up

  • Post-startup browser settle pause of 30 seconds or 1 second if using a conditioned profile

  • A new tab is opened

  • The test URL is loaded; measurements taken

  • The tab is reloaded X more times (for X replicates); measurements taken each time

NOTES: - The measurements from the first page-load are not included in overall results metrics because of first load noise; however they are listed in the JSON artifacts - The bytecode cache gets populated on the first test cycle, and subsequent iterations will already have the cache built to reduce noise.

Cold Pageload

In this pageload test type, we open the browser, navigate to the page, then restart the browser before performing the next cycle.

  • A new, or conditioned, browser profile is created

  • The browser is started up

  • Post-startup browser settle pause of 30 seconds or 1 second if using a conditioned profile

  • A new tab is opened

  • The test URL is loaded; measurements taken

  • The browser is shut down

  • Entire process is repeated for the remaining browser cycles

NOTE: The measurements from all browser cycles are used to calculate overall results.

Chimera Pageload

A new mode for pageload testing is called Chimera mode. It combines the warm and cold variants into a single test. This test mode is used in our Taskcluster tasks.

  • A new, or conditioned, browser profile is created

  • The browser is started up

  • Post-startup browser settle pause of 30 seconds or 1 second if using a conditioned profile

  • A new tab is opened

  • The test URL is loaded; measurements taken for Cold pageload

  • Navigate to a secondary URL (to preserve the content process)

  • The test URL is loaded again; measurements taken for Warm pageload

  • The desktop browser is shut down

  • Entire process is repeated for the remaining browser cycles

NOTE: The bytecode cache mentioned in Warm pageloads still applies here.

Technical Metrics

Technical metrics are values obtained directly from the browser. This includes metrics like First Paint, DOM Content Flushed, etc..

Visual Metrics

Visual metrics can be obtained by running Raptor Browsertime with --browsertime-visualmetrics, it will record a video of the page being loaded and then process this video to build the metrics. The video is either produced using FFMPEG (with --browsertime-no-ffwindowrecorder) or the Firefox Window Recorder (default).

Benchmarks

Benchmarks gather their own custom metrics, unlike the pageload tests above. Please ping the owners of those benchmarks to determine what they mean and how they are produced, or reach out to the Performance Test and Tooling team in #perftest on Element.

Metric Definitions

The following documents all available metrics that current alert in Raptor Browsertime tests.

Contentful Speed Index

Similar to SpeedIndex, except that it uses the contentfulness of a frame to determine visual completeness, rather than relying on histogram differences. The contentfulness is determined by calculating the number of edges found in the frame. A lower number of edges detected gives a smaller contentfulness value (and a smaller visually complete value).

Estimated Frame Latency (Any)

Similar to estimatedFirstFrameLatency, except that it uses all identified frames during video playback, normalized to be an estimate of when the first frame was displayed by using the expected time offset from the video itself.

Estimated Frame Latency (First)

A metric used to denote the latency on displaying the first frame of a video. Calculated by using videos of the pageload from which key frames are identified by matching a given solid RGB color with fuzz.

First Paint

Denotes the first time the browser performs a paint that has content in it (in ms).

First Visual Change

The first visual change detected in the test (in ms).

Largest Contentful Paint

The time (in ms) at which the largest piece of content on the page was rendered/painted.

Last Visual Change

The last visual change detected in the test (in ms).

Load Time

The time it took for the page to complete loading (in ms).

Perceptual Speed Index

Similar to SpeedIndex, except that it uses the structural similarity index measure (ssim) to determine visual completeness. This technique compares the luminance, contrast, and structure of the frames (a given frame vs. a final frame) to determine the completeness.

Speed Index

A metric used to denote the speed at which a page loaded. Lower values indicate faster pageloads. Units are in (Visually-Complete x Milliseconds). Calculated by using videos of the pageload which provide a measure of visual completeness. Visual completeness is calculated by comparing the histogram of a given frame to the final frame of the pageload. The SpeedIndex is calculated as the area between the curves of a constant line at y=1, and the graph of the visual completeness from 0ms to when visual completeness reaches 100% (or hits the y=1 line).

Youtube Playback Metrics

Metrics starting with VP9/H264 give the number of frames dropped, and painted.

cpu Time

Specifies cumulative CPU usage in milliseconds across all threads of the process since the process start.