This article covers important background information about power profiling, with an emphasis on Intel processors used in desktop and laptop machines. It serves as a starting point for anybody doing power profiling for the first time.
In physics, power is the rate of doing work. It is equivalent to an amount of energy consumed per unit time. In SI units, energy is measured in Joules, and power is measured in Watts, which is equivalent to Joules per second.
Although power is an instantaneous concept, in practice measurements of it are determined in a non-instantaneous fashion, i.e. by dividing an energy amount by a non-infinitesimal time period. Strictly speaking, such a computation gives the average power but this is often referred to as just the power when context makes it clear.
In the context of computing, a fully-charged mobile device battery (as found in a laptop or smartphone) holds a certain amount of energy, and the speed at which that stored energy is depleted depends on the power consumption of the mobile device. That in turn depends on the software running on the device. Web browsers are popular applications and can be power-intensive, and therefore can significantly affect battery life. As a result, it is worth optimizing (i.e. reducing) the power consumption caused by Firefox and Firefox OS.
The following diagram (from the Intel Power Governor documentation) shows how machines using recent Intel processors are constructed.
The important points are as follows.
Intel processors have aggressive power-saving features. The first is the ability to switch frequently (thousands of times per second) between active and idle states, and there are actually several different kinds of idle states. These different states are called C-states. C0 is the active/busy state, where instructions are being executed. The other states have higher numbers and reflect increasing deeper idle states. The deeper an idle state is, the less power it uses, but the longer it takes to wake up from.
Note: the ACPI standard specifies four states, C0, C1, C2 and C3. Intel maps these to processor-specific states such as C0, C1, C2, C6 and C7. and many tools report C-states using the latter names. The exact relationship is confusing, and chapter 13 of the Intel optimization manual has more details. The important thing is that C0 is always the active state, and for the idle states a higher number always means less power consumption.
The other thing to note about C-states is that they apply both to cores and the entire package — i.e. if all cores are idle then the entire package can also become idle, which reduces power consumption even further.
The fraction of time that a package or core spends in an idle C-state is called the C-state residency. This is a misleading term — the active state, C0, is also a C-state — but one that is nonetheless common.
Intel processors have model-specific registers (MSRs) containing measurements of how much time is spent in different C-states, and tools such as powermetrics (Mac), powertop and turbostat (Linux) can expose this information.
A wakeup occurs when a core or package transitions from an idle state to the active state. This happens when the OS schedules a process to run due to some kind of event. Common causes of wakeups include scheduled timers going off and blocked I/O system calls receiving data. Maintaining C-state residency is crucial to keep power consumption low, and so reducing wakeup frequency is one of the best ways to reduce power consumption.
One consequence of the existence of C-states is that observations made during power profiling — even more than with other kinds of profiling — can disturb what is being observed. For example, the Gecko Profiler takes samples at 1000Hz using a timer. Each of these samples can trigger a wakeup, which consumes power and obscures Firefox's natural wakeup patterns. For this reason, integrating power measurements into the Gecko Profiler is unlikely to be useful, and other power profiling tools typically use much lower sampling rates (e.g. 1Hz.)
Intel processors also support multiple P-states. P0 is the state where the processor is operating at maximum frequency and voltage, and higher-numbered P-states operate at a lower frequency and voltage to reduce power consumption. Processors can have dozens of P-states, but the transitions are controlled by the hardware and OS and so P-states are of less interest to application developers than C-states.
There are several kinds of power and power-related measurements. Some are global (whole-system) and some are per-process. The following sections list them from best to worst.
The best measurements are measured in joules and/or watts, and are taken by measuring the actual hardware in some fashion. These are global (whole-system) measurements that are affected by running programs but also by other things such as (for laptops) how bright the monitor backlight is.
The next best measurements come from recent (Sandy Bridge and later) Intel processors that implement the RAPL (Running Average Power Limit) interface that provides MSRs containing energy consumption estimates for up to four power planes or domains of a machine, as seen in the diagram above.
The following relationship holds: PP0 + PP1 <= PKG. DRAM is independent of the other three domains.
These values are computed using a power model that uses processor-internal counts as inputs, and they have been verified as being fairly accurate. They are also updated frequently, at approximately 1,000 Hz, though the variability in their update latency means that they are probably only accurate at lower frequencies, e.g. up to 20 Hz or so. See section 14.9 of Volume 3 of the Intel Software Developer's Manual for more details about RAPL.
Tools that can take RAPL readings include the following.
tools/power/rapl
: all planes; Linux and Mac.Of these, tools/power/rapl is generally the easiest and best to use because it reads all power planes, it's a command line utility, and it doesn't measure anything else.
The next best measurements are proxy measurements, i.e. measurements of things that affect power consumption such as CPU activity, GPU activity, wakeup frequency, C-state residency, disk activity, and network activity. Some of these are measured on a global basis, and some can be measured on a per-process basis. Some can also be measured via instrumentation within Firefox itself.
The correlation between each proxy measure and power consumption is hard to know and can vary greatly. When used carefully, however, they can still be useful. This is because they can often be measured in a more fine-grained fashion than power measurements and estimates, which is vital for gaining insight into how a program can reduce power consumption.
Most profiling tools provide at least some proxy measurements.
These are combinations of proxy measurements. The combinations are semi-arbitrary, they amplify the unreliability of proxy measurements, and unlike non-hybrid proxy measurements, they don't have a clear physical meaning. Avoid them.
The most notable example of a hybrid proxy measurement is the "Energy Impact" used by OS X's Activity Monitor.
Most power-related measurements are global or per-process. Such low-context measurements are typically good for understand if power consumption is good or bad, but in the latter case they often don't provide much insight into why the problem is occurring, which part of the code is at fault, or how it can be fixed. Nonetheless, they can still help improve understanding of a problem by using differential profiling.
A few power-related measurements can be obtained in a high-context fashion, e.g. with stack traces that clearly pinpoint specific parts of the code as being responsible.
This section aims to put together all the above information and provide a set of strategies for finding, diagnosing and fixing cases of high power consumption.
mach power
on Mac, or Intel Power Gadget on Windows) for the comparisons. Avoid lower-quality measurements, especially Activity Monitor's "Energy Impact".Chapter 13 of the Intel optimization manual has many details about optimizing for power consumption. Section 13.5 ("Tuning Software for Intelligent Power Consumption") in particular is worth reading.