ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833 hpcsupport@zih.tu-dresden.de

Login and project application

Phone: 40000
Fax: 42328
servicedesk@tu-dresden.de

You are here: Compendium » SystemTaurus » x86_adapt

x86_adapt and Energy Efficiency settings

General overview

Intel and AMD processors host registers that allow the BIOS and the operating system to tune processor internal features and monitor system components. These registers however are by default not accessible from user-space. To overcome this barrier, taurus hosts the x86_adapt software stack. This software stack consists of two kernel modules, a library and two command line tools. The kernel modules allow the access registers to read and change processor specific settings. The library hides the access to the kernel module devices behind a more readable API. The command line tools provide you with a low barrier starting point for accessing such items.

Access rights and defaults

As a user, you have access to items of the CPUs that your SLURM step allocated. I.e., if you run
# run a sequential program and gain access to the respective x86_adapt CPU items
srun -n 1 ./cmd

you will get the access rights to the items of one random CPU, that will also execute your job. If your program spans over multiple processes with one CPU each, every of these CPUs will be accessible. Example:
# run an MPI program with 12 ranks and gain access to the respective x86_adapt CPU items
srun -n 12 ./mpi-cmd

To get to know which CPU(s) execute(s) a process, you can use the command line tool taskset or the library call sched_getaffinity.

If your program spans over all CPUs of a processor (in Linux terms this is called a node), you will also get access to the node-wide items.
# run a sequential program but gain access to all CPU and node items of the host that executes the job
srun -n 1 -c 24 ./cmd

A note on CPUs and nodes:

In Linux, the term "CPU" refers to a logical CPU, i.e., a hardware thread or a single processor core if hyper-threading is not supported. The term "node" refers to a NUMA node. Items in x86_adapt can have two scopes: CPU and node. However, the registers that are used by the x86_adapt kernel module can have additional scopes (e.g., "This register exists once per processor core"). Items from such registers are mapped to scopes that are defined by x86_adapt.

A note on defaults:

After every slurm job, the original settings of all items that could have been changed are restored . This means that you are not able to set up one job step to set the items and a following one to measure performance. All settings have to be performed within the job step.

Source

If you want to try the software on your system have a look at the github repository.

Interesting items

Performance and system monitoring
Category Description Scope Items
RAPL Allows users to read energy consumption values from processor internal registers. More information can be found in the Intel manual in Chapter 14.9 and in the Intel datasheet. Energy counter values must be scaled with the energy unit, which can be encoded in the Intel_RAPL_Power_Unit register (for Pckg_Energy on SandyBridge and Haswell, RAM_Energy on SandyBridge) or be specified in the documentation (15.3ĶJ for RAM_Energy on Haswell). Node
  • Intel_RAPL_Pckg_Energy
  • Intel_RAPL_RAM_Energy
  • Intel_RAPL_Power_Unit
  • Intel_PCU_POWER_SKU_UNIT
  • Intel_PCU_DRAM_ENERGY_STATUS
  • Intel_DRAM_ENERGY_STATUS_CH0
  • Intel_DRAM_ENERGY_STATUS_CH1
  • Intel_DRAM_ENERGY_STATUS_CH2
  • Intel_DRAM_ENERGY_STATUS_CH3
core frequencies Allows users to read the current frequencies or the frequencies over time. CPU
  • Intel_PERF_STATUS_Current_PState (should be multiplied with 100 to get the current MHz)
  • APERF (counter which increases with the actual clock rate)
  • MPERF (counter which increases with the reference clock rate)
uncore frequency (Haswell) Allows users to read the current uncore frequency. Node
  • Intel_UNCORE_CURRENT_RATIO (should be multiplied with 100 to get the current MHz)
C-state residencies Allows users to read how many cycles a core/processor resided in a certain C-state. For more information on C-states, read this paper. CPU/Node
  • Intel_CORE_C3_RESIDENCY
  • Intel_CORE_C6_RESIDENCY
  • Intel_CORE_C7_RESIDENCY
  • Intel_PKG_C2_RESIDENCY
  • Intel_PKG_C3_RESIDENCY
  • Intel_PKG_C6_RESIDENCY
  • Intel_PKG_C7_RESIDENCY
uncore performance counter Allows users to read information from uncore devices. For more information check the Intel uncore guide. Node
  • [...]PMON[...]
L3 Cache usage, a.k.a. Cache Monitoring Technology (CMT) (Haswell) Enables users to monitor the L3 cache usage per core. For more information read the Intel manual chapter 17.16. In the current version, you can set Intel_QM_EVTSEL_Event to 1 to start counting. RMIDs are not supported by this software. Sample the current value from Intel_QM_CTR and multiply it with 49152 Bytes. CPU
  • Intel_QM_EVTSEL_Event
  • Intel_QM_CTR
Performance and energy efficiency settings
Preface: The energy efficiency features of Haswell processors are complex and not documented in detail in the respective processor manuals. Please refer to the paper An Energy Efficiency Feature Survey of the Intel Haswell Processor by Hackenberg et al. The authors describe most of the tricky architecture details, like P-state and C-state transition latencies, memory bandwidth scaling, internal mechanisms like EPB and UFS, and much more.
Prefetcher settings Prefetchers are pieces of hardware that try to predict future memory accesses by recognizing patterns in previous memory accesses. They are by default enabled and can be disabled by writing "1" to the respective items. CPU
  • Intel_HW_Prefetch_Disable
  • Intel_AL_Prefetch_Disable
  • Intel_DCU_Prefetch_Disable
  • Intel_DCU_IP_Prefetch_Disable
Uncore frequency settings (Haswell) The uncore frequency of Haswell processors is set by the Power Control Unit based on several indicators (e.g. core frequencies, memory accesses). You can set the minimal and maximal allowed frequency by writing the respective items. The items represent a frequency in 100 MHz steps, e.g. 1 means 100 MHz, 12 means 1.2 GHz. Node
  • Intel_UNCORE_MIN_RATIO
  • Intel_UNCORE_MAX_RATIO
Core frequency settings

(Change at 2017-08-04) SLURM and x86_adapt do not work together with respect to resetting frequency decisions after a job has finished. Please use the interfaces provided by Linux, i.e.:

/sys/devices/system/cpu/cpu<CPU>/cpufreq/scaling_setspeed
/sys/devices/system/cpu/cpu<CPU>/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu<CPU>/cpufreq/scaling_min_freq
/sys/devices/system/cpu/cpu<CPU>/cpufreq/scaling_max_freq

CPU  
Software Controlled Clock Modulation Change the clock modulation setting of processor cores by writing to this setting. See Intel manual chapter 14.7. CPU
  • Intel_Clock_Modulation
Package Power Limit Set the upper bound for the average processor power consumption over a specific time period. See Intel manual chapter 14.9.3. The time and power units can be gathered with the item Intel_RAPL_Power_Unit. Node
  • Intel_RAPL_PKG_POWER_LIMIT_[*]

Command line tools

There are two command line tools, one for reading items (x86a_read), one for setting them (x86a_write).
NAME x86a_read x86a_write
SYNOPSIS x86a_read [ x86a_read-ARGS ...] x86a_write [ x86a_write-ARGS ...]
OUTPUT Invoking the x86a_read tool prints a list of items:

Item 1: Intel_Clock_Modulation_Value
----------------
Item 2: Intel_DCU_Prefetch_Disable
----------------
Item 3: Intel_Enhanced_SpeedStep
----------------
Item 4: Intel_Target_PState
[...]
Afterwards, the tool prints the current settings of each item or the selected item as CSV. The item numbers may change when the kernel module is reloaded (e.g. at boot-time)
(none)
OPTIONS -h --help: print this help
-H --hex: print readings in hexadecimal form
-n --node: print node options instead of CPU options
-c --cpu: Read item(s) from this CPU (default=all)
If -n is set, read item(s) from this node (default=all).
-i --item: Read this item (default=all)
-v --verbose: print more information
(The term ínodeí refers to a node in the Linux context, which is a NUMA node (e.g., a physical processor) ).
-h --help: print this help
-H --hex: provide value in hexadecimal form
-n --node: set node options instead of CPU options
-c --cpu: Set item from this CPU (default=all)
If -n is set, set item on this node (default=all).
-i --item: Set this item
-V --value: set to this value
-v --verbose: print more information
(The term ínodeí refers to a node in the Linux context, which is a NUMA node (e.g., a physical processor) ).
EXAMPLES
# prints all items and the readings for CPU 0
x86a_read -c 0
# prints all items and the readings for node 0
x86a_read -n 0 -c 0
# prints all items and the readings for node 0 in hexadecimal
x86a_read -n 0 -c 0 -X
# prints the item Intel_Clock_Modulation and its 
#reading for CPU 0 in hexadecimal
x86a_read -i Intel_Clock_Modulation -c 0 -H
# sets the item "Intel_Energy_Perf_Bias" on all
# cpus to 0
x86a_write -i Intel_Energy_Perf_Bias -V 0 
# sets the item "Intel_Energy_Perf_Bias" on cpu 2
# to 0
x86a_write -c 2 -i Intel_Energy_Perf_Bias -V 0
 # sets the item "Intel_Energy_Perf_Bias" on cpu 2
# to 15 (F in hexadecimal)
x86a_write -c 2 -i Intel_Energy_Perf_Bias -H -V F

library

There is also a library to access the kernel module and the items. Please have a look at the header file (/usr/local/include/x86_adapt.h) man page (man x86_adapt.h)

Known issues

The memory controller fixed counter control has to be shifted by 19 before writing and after reading it due to an internal read mask that avoids writing invalid settings. I.e. a x86_adapt setting of 1 is equal to a register setting of 0x80000.

Citing

If you used this software for your scientific work, please cite the paper Integrating performance analysis and energy efficiency optimizations in a unified environment by SchŲne and Molka.

-- Main.RobertSchoene - 2016-01-26