.. SPDX-License-Identifier: GPL-2.0 .. include:: ======================================= Intel thermal throttle events reporting ======================================= :Author: Srinivas Pandruvada Introduction ------------ Intel processors have built in automatic and adaptive thermal monitoring mechanisms that force the processor to reduce its power consumption in order to operate within predetermined temperature limits. Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, & 3D): System Programming Guide" for more details. In general, there are two mechanisms to control the core temperature of the processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)". The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) is indicated through the "thermal status flag" and "thermal status log flag" in MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for package level. Thermal Status flag, bit 0 — When set, indicates that the processor core temperature is currently at the trip temperature of the thermal monitor and that the processor power consumption is being reduced via either TM1 or TM2, depending on which is enabled. When clear, the flag indicates that the core temperature is below the thermal monitor trip temperature. This flag is read only. Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has tripped since the last power-up or reset or since the last time that software cleared this flag. This flag is a sticky bit; once set it remains set until cleared by software or until a power-up or reset of the processor. The default state is clear. It is possible that when user reads MSR_IA32_THERM_STATUS or MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case, "Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set to show any previous "TM1/TM2" activation. But since it needs to be cleared by the software, it can't show the number of occurrences of "TM1/TM2" activations. Hence, Linux provides counters of how many times the "Thermal Status flag" was set. Also presents how long the "Thermal Status flag" was active in milliseconds. Using these counters, users can check if the performance was limited because of thermal events. It is recommended to read from sysfs instead of directly reading MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate control. Sysfs Interface --------------- Thermal throttling events are presented for each CPU under "/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number. All these counters are read-only. They can't be reset to 0. So, they can potentially overflow after reaching the maximum 64 bit unsigned integer. ``core_throttle_count`` Shows the number of times "Thermal Status flag" changed from 0 to 1 for this CPU since OS boot and thermal vector is initialized. This is a 64 bit counter. ``package_throttle_count`` Shows the number of times "Thermal Status flag" changed from 0 to 1 for the package containing this CPU since OS boot and thermal vector is initialized. Package status is broadcast to all CPUs; all CPUs in the package increment this count. This is a 64-bit counter. ``core_throttle_max_time_ms`` Shows the maximum amount of time for which "Thermal Status flag" has been set to 1 for this CPU at the core level since OS boot and thermal vector is initialized. ``package_throttle_max_time_ms`` Shows the maximum amount of time for which "Thermal Status flag" has been set to 1 for the package containing this CPU since OS boot and thermal vector is initialized. ``core_throttle_total_time_ms`` Shows the cumulative time for which "Thermal Status flag" has been set to 1 for this CPU for core level since OS boot and thermal vector is initialized. ``package_throttle_total_time_ms`` Shows the cumulative time for which "Thermal Status flag" has been set to 1 for the package containing this CPU since OS boot and thermal vector is initialized.