Understanding System Utilization
Understanding System UtilizationThe following chapter introduces and demonstrates tools that help out in understanding overall system utilisation.
Tools discussed in this posting
uptime – Print Load Average
The easiest way to gain an overview on:
-
how long a system has been running.
-
current CPU load averages.
-
how many active users.
is with the command uptime.
uptime – print CPU load averages
The numbers printed to the right of “load average: “ are the 1-, 5- and 15-minute load averages of the system. The load average numbers give a measure of the number of runnable threads and running threads. Therefore the number has to be put in relation with the number of active CPUs in a system. For example, a load average of three (3) on a single CPU system would indicate some CPU overloading, while the same load average on a thirty-two (32) way system would indicate an unloaded system.
perfbar [Tools CD] - A lightweight CPU Meter
perfbar is a tool that displays a single bar graph that color codes system activity. The colors are as follows:
-
Blue = system idle.
-
Red = System time.
-
Green = CPU time.
-
Yellow = I/O activity (obsolete on Solaris 10 and later).
perfbar – sample output of a system with 16 CPU cores
Perfbar has been enhancend in Version 1.2 to provide better support for servers with many CPUs through a multi line visualisation. See below
perfbar: Visualisation of a Sun T5240 System with 128 strands (execution units) without any load
perfbar can be called without specifying any command line arguments. perfbar provides a large number of options which can be viewed with the -h option:
$ perfbar -h perfbar 1.2 maintained by Ralph Bogendoerfer based on the original perfbar by: Joe Eykholt, George Cameron, Jeff Bonwick, Bob Larson Usage: perfbar [X-options] [tool-options] supported X-options: -display <display> or -disp <display> -geometry <geometry> or -geo <geometry> -background <background> or -bg <background> -foreground <foreground> or -fg <foreground> -font <font> or -fn <font> -title <title> or -t <title> -iconic or -icon -decoration or -deco supported tool-options: -h, -H, -? or -help: this help -v or -V: verbose -r or -rows: number of rows to display, default 1 -bw or -barwidth: width of CPU bar, default 12 -bh or -barheight: height of CPU bar, default 180 -i or -idle: idle color, default blue -u or -user: user color, default green -s or -system: system color, default red -w or -wait: wait color, default yellow -int or -interval: interval for display updates (in ms),default 100 -si or -statsint: interval for stats updates (in display intervals), default 1 -avg or -smooth: number of values for average calculation, default 8
There are also a number of key strokes understood by the tool:
-
Q or q: Quit
-
R or r: Resize - this changes the window to the default size according to the number of CPU bars, rows and the chosen bar width and height.
-
Number keys 1 - 9: Display this number of rows.
-
+ and -: Increase or decrease number of rows displayed.
The tool is currently available as a beta in version 1.2. This latest version is not yet part of the Performance Tools CD 3.0. The engineers from the Sun Solution Center in Langen/Germany made it available for free through:
cpubar [Tools CD] - A CPU Meter, showing Swap, and Run Queue
cpubar displays one bar-graph for each processor with the processor speed(s) displayed on top. Each bar-graph is divided in four areas (top to bottom):
-
Blue - CPU is available.
-
Yellow - CPU is waiting for one or more I/O to complete (N/A on Solaris 10 and later).
-
Red - CPU is running in kernel space.
-
Green - CPU is running in user space.
As with netbar and iobar, a red and a dashed black & white marker shows the maximum and average used ratios respectively.
The bar-graphs labeled 'r', 'b' and 'w' are displaying the run, blocked and wait queues. A non empty wait queue is usually a symptom of a previous persistent RAM shortage. The total number of processes is displayed on top of these three bars.
The bar-graph labeled 'p/s' is displaying the process creation rate per second.
The bar-graph labeled 'RAM' is displaying the RAM usage (red=kernel, yellow=user, blue=free), the total RAM is displayed on top.
The bar-graph ('sr') is displaying (using a logarithmic scale) the scan rate (a high level of scans is usually a symptom of RAM shortage).
The bar-graph labeled 'SWAP' is displaying the SWAP (a.k.a Virtual Memory) usage (red=used, yellow=reserved, blue=free), the total SWAP space is displayed on top.
cpubar – sample output
vmstat – System Glimpse
The vmstat tool provides a glimpse of the current system behavior in a one line summary including both CPU utilisation and saturation.
In its simplest form, the command vmstat <interval> (i.e. vmstat 5) will report one line of statistics every <interval> seconds. The first line can be ignored as it is the summary since boot, all other lines report statistics of samples taken every <interval> seconds. The underlying statistics collection mechanism is based on kstat (see kstat(1)).
Let's run two copies of a CPU intensive application (cc_usr) and look at the output of vmstat 5. First start two (2) instances of the cc_usr program.
two (2) instances of cc_usr started
Now let's run vmstat 5 and watch its output.
vmstat – vmstat 5 report
First observe the cpu:id column which represents the system idle time (here 0%). Then look at the kthr:r column which represents the total number of runnable threads on dispatcher queues (here 1).
From this simple experiment, one can conclude that the system idle time for the five second samples was always 0, indicating 100% utilisation. On the other hand, kthr:r was mostly one and sustained indicating a modest saturation for this single CPU system (remember we launched two (2) CPU intensive applications).
A couple of notes with regard to CPU utilisation:
-
100% utilisation may be fine for your system. Think about a high-performance computing job: the aim will be to maximise utilisation of the CPU.
-
Values of kthr:r greater than zero indicate some CPU saturation (i.e. more jobs would like to run but cannot because no CPU was available). However, performance degradation should be gradual.
-
Sampling interval is important. Don't choose too small or too large intervals.
vmstat reports some additional information that can be interesting such as:
Column |
Comments |
in |
Number of interrupts per second. |
sys |
Number of system calls per second. |
cs |
Number of context switches per second (both voluntary and involuntary). |
us |
Percent user time: time the CPUs spent processing user-mode threads. |
sy |
Percent system time: time the CPUs spent processing system calls on behalf of user-mode threads, plus the time spent processing kernel threads. |
id |
Percent of time the CPUs are waiting for runnable threads. |
mpstat - Report per-Processor or per-Processor Set Statistics
The mpstat command reports processor statistics in tabular form. Each row of the table represents the activity of one processor. The first table summarizes all activity since boot. Each subsequent table summarizes activity for the preceding interval. The output table includes:
Column |
Comments |
CPU |
Prints processor ID. |
minf |
Minor faults (per second). |
mjf |
Major faults (per second). |
xcal |
Inter-processor cross-calls (per second). |
intr |
Interrupts (per second). |
ithr |
Interrupts as threads (not counting clock interrupt) (per second). |
csw |
Context switches (per second). |
icsw |
Involuntary context switches (per second). |
migr |
Thread migrations (to another processor) (per second). |
smtx |
Spins on mutexes (lock not acquired on first try) (per second). |
srw |
Spins on readers/writer locks (lock not acquired on first try) (per second). |
syscl |
System calls (per second). |
usr |
Percent user time. |
sys |
Percent system time. |
wt |
Always 0. |
idl |
Percent idle time. |
The reported statistics can be broken down into following categories:
-
Processor utilisation: see columns usr, sys and idl for a measure of CPU utilisation on each CPU.
-
System call activity: see syscl column for the number of system call per second on each CPU.
-
Scheduler activity: see column csw and column icsw. As the ratio icsw/csw comes closer to one (1), threads get preempted because of higher priority threads or expiration of their time quantum. Also the column migr displays the number of times the OS scheduler moves ready-to-run threads to an idle processor. If possible, the OS tries to keep the threads on the last processor on which it ran. If that processor is busy, the thread migrates.
-
Locking activity: column smtx indicates the number of mutex contention events in the kernel. Column srw indicates the number of reader-writer lock contention events in the kernel.
Now, consider the following sixteen-way (16) system used for test. This time four (4) instances of the cc_usr program were started and the output of vmstat 5 and mpstat 5 recorded.
Below, observe the output of processor information. Then the starting of the four (4) copies of the program and last the output of vmstat 5.
vmstat – vmstat 5 output on sixteen way system
Rightly, vmstat reports a user time of 25% because one-fourth (¼) of the system is used (remember 4 programs started, 16 available CPUs, i.e. 4/16 or 25%).
Now let's look at the output of mpstat 5.
mpstat – mpstat 5 sample output on sixteen way system
In the above output (two sets of statistics), one can clearly identify the four running instances of cc_usr on CPUs 1, 3, 5 and 11. All these CPUs are reported with 100% user time.
vmstat – Monitoring paging Activity
The vm stat command can also be used to report on system paging activity with the -p option. Using this form of the command, one can quickly get a clear picture on whether the system is paging because of file I/O (OK) or paging because of physical memory shortage (BAD).
Use the command as follows: vmstat -p <interval in seconds> . The output format includes following information:
Column |
Description |
swap |
Available swap space in Kbytes. |
free |
Amount of free memory in Kbytes. |
re |
Page reclaims - number of page reclaims from the cache list (per second). |
mf |
Minor faults - number of pages attached to an address space (per second) |
fr |
Page frees in Kbytes per second. |
de |
Calculated anticipated short-term memory shortfall in Kbytes. |
sr |
Scan rate - number of pages scanned by the page scanner per second. |
epi |
Executable page-ins in Kbytes per second. |
epo |
Executable page-outs in Kbytes per second. |
epf |
Executable page-frees in Kbytes per second. |
api |
Anonymous page-ins in Kbytes per second. |
apo |
Anonymous page-outs in Kbytes per second. |
apf |
Anonymous page-frees in Kbytes per second. |
fpi |
File system page-ins in Kbytes per second. |
fpo |
File system page-outs in Kbytes per second. |
fpf |
File system page-frees in Kbytes per second. |
As an example of vmstat -p output, let's try following commands:
find / > /dev/null 2>&1
and then monitor paging activity with: vmstat -p 5
As can be seen from the output, the system is showing paging activity because of file system read I/O (column fpi).
vmstat – sample output reporting on paging activity
zonestat - [OpenSolaris.org] Monitoring Resource Consumption within Zones
Jeff Victor developed an Open Source Perl script to measure utilization within zones. The tool is freely available for download on the OpenSolaris.org project pages.
It may me called with the following syntax:
zonestat [-l] [interval [count]]
The output looks like:
|----Pool-----|------CPU-------|----------------Memory----------------| |---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---| Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap| Use| Cap| Use| Cap| Use| Cap| Use ------------------------------------------------------------------------------- global 0D 66K 2 0.1 1 25 986M 139K 18E 2M 18E 754M db01 0D 66K 2 0.1 2 50 1G 122M 536M 536M 0 1G 135M web02 0D 66K 2 0.42 0.0 1 25 100M 11M 20M 20M 0 268M 8M
zonestat allows as well to monitor zone limits (caps)
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 30342 views