When you push a commit to mozilla-central or a related repository, it initiates a large chain of builds and tests across multiple types of infrastructure.  This document will help you understand all the pieces that comprise Mozilla's continuous integration systems.

TaskCluster and Treeherder

TaskCluster, Mozilla's continuous integration (CI) system picks up changes pushed to Hg.  TaskCluster generate binary builds for Firefox and Firefox for Android across a variety of operating sytems.  After the builds are completed, they are used to run a series of correctness and performance tests.

The results of TaskCluster jobs (both builds and tests) are displayed in Treeherder.  There is a group of individuals who are constantly monitoring Treeherder, looking for broken builds and/or tests.  These individuals are known as sheriffs.  The sheriffs' role is to "keep the tree green", or in other words, to keep the code in our respositories in a good state, to the extent that the state is reflected in the output shown on Treeherder.  When sheriffs see a build or test has been broken, they are empowered to take one of several actions, including backing out a patch which caused the problem and closing the tree (i.e., preventing any additional commits).

Results in Treeherder are ordered by Mercurial pushes.  Each TaskCluster job is represented by a colored label; green means a job has succeeded, while other colors represent different kinds of problems.  The label text indicates the job type.  For a full list of job types, see the Help menu in Treeherder's upper-right corner.  Below is a list of the most common.

Builds

Builds that run in CI are clobber builds.  Clobber builds mean the directory hierarchy, including the local source and object directory are deleted if it exists from a previous build.

Functional Tests

These jobs are scheduled after a build job has successfully produced a build.  These test jobs can sometimes run even if a build job fails, if the build job failed during 'make check'.

See the full list of tests at the Mozilla Automated Testing page.

Talos Performance Tests

All performance tests displayed in Treeherder are run using the Talos framework, and denoted by the letter T.  These jobs are scheduled at the same time as the correctness jobs.  Talos is used to execute several suites for desktop Firefox and Firefox for Android; these suites are denoted using lower-case letters, e.g., T(c d g1 o s tp).

For a list of tests, see the Mozilla Automated Testing page.

The Talos indicators in Treeherder appear green if the job successfully completed; to see the performance data generated by the jobs, click on the performance tab of the job details panel that pops up when you click on a job in Treeherder.

Each Talos suite contains a set of tests or pages, some of these in turn have sub-tests.  Each test is executed multiple times to produce a number of data replicates.   The Talos harness produces a single number per test (typically the median of all the replicates excluding the first 1-5), which are stored in Treeherder's database, and are accessible via the Perfherder interface.

Other Performance Systems

Autophone (Android)

Autophone is a test harness which runs a set of performance tests on a variety of real Android phones.  It reports to a custom dashboard known as phonedash.  Tests currently run are primarily startup tests.

Games Benchmarking (Firefox)

Under development, the games benchmarking harness (aka mozbench) will allow a number of games-related benchmarks to be run against Firefox and Chrome. Eventually, the system will likely be expanded with support for Android.

Post-Job Analysis and Alerts

There is some analysis of test data that occurs out-of-band after jobs complete. 

Perfherder Alerts

We track changes to Talos and other performance frameworks inside Perfherder, and try to automatically alert when there is a sustained change exceeding a certain magnitude (specified per test). Performance sheriffs review the list of alerts on a regular basis and file bugs if appropriate. You can view the current set of alerts on the Perfherder Alerts dashboard.

Intermittent failures

After functional tests complete, test log data are combined with Treeherder's failure classification data. The result is the Treeherder Intermittent Failures view. The "Orange Factor" is the average number of intermittent test failures that occur per push. The dashboard can be used to view the most frequent intermittent test failures, as well as to inspect historical trends.