Solaris Performance: Getting Started
Solaris Performance: Getting StartedThese pages are the Swiss Army knife for everyone in the need to analyze and tune a Solaris system.
The intention is to provide freely available tools and commands which help to make a quick assessment of the system.
This performance primer follows the standard process on how you will want to analyze a performance. You'll have to understand the individual stack from the hardware through the operating system to the application first. This is the foundation which will allow you to narrow down your performance problem and fix it.
The nature of all performance problems is basically straight forward: An algorithm (application) with a given load is exhausting the resources underneath.
This primer allows everyone to determine the Solaris and hardware resources which become the bottleneck. You'll then have two options
- provide more Solaris or hardware resources by reconfiguration or
- change your application in a way which makes it more effective (example: caching) or teach to use more resources (example: multi threading).
The Structure of the Primer
- 29437 views
Automate Sampling with dimstat
Automate Sampling with dimstatThe second chapter of the Solaris performance deals with a must have utility: dimSTAT. dimSTAT a freely available monitoring tool which will monitor entire data centers while you are gone fishing...
dimSTAT is a tool for general and/or detailed performance analysis and monitoring of Solaris and Linux systems. dimSTAT is a monitoring framework that offers flight-recorder type functionality. A central site can monitor a number of nodes for performance data and store the results for easy displaying and post-processing.
You can download the software and documentation from: http://dimitrik.free.fr/
dimSTAT - Installation
dimSTAT installation is straight forward. Get hold of the latest distribution files and then untar them into a directory of your choice:
cd /tmp tar -xvf <path_to_dimstat>/dim_STAT-v81-sol86.tar
|
Then run the INSTALL.sh file with /bin/sh and follow the instructions:
root@soldevx> sh INSTALL.sh
=========================================== ** Starting dim_STAT Server INSTALLATION ** =========================================== HOSTNAME: soldevx IP: ::1 DOMAINE: Is it correct? (y/n): n ** Hostname [soldevx]: localhost ** IP addres [::1]: ** Domainname []: ** Domainname []: ** Domainname []: ** Domainname []: . ** ** ATTENTION! ** ** On your host You have to assign a USER/GROUP pair as owner ** of all dim_STAT modules (default: dim/dim) User: dim Group: dim Is it correct? (y/n): y ** ** WARNING!!! ** ** User dim (group dim) is not created on your host... ** You may do it now by yourself or let me do it during ** installation... **
May I create this USER/GROUP on your host? (y/n): y ====================================== ** dim_STAT Directory Configuration ** ====================================== ** WebX root directory (5MB): => /WebX => /opt/WebX => /etc/WebX
[/opt/WebX]:
** HOME directory for dim_STAT staff [/apps]: /export/home/dimstat ** TEMP directory : /opt/WebX => HOME directory : /export/home/dimstat => TEMP directory : /tmp => HTTP Server Port : 80 => DataBase Server Port : 3306 => Default STAT-service Port : 5000 Is it correct? (y/n): y ** WARNING!!! ** ALL DATA WILL BE DELETED IN: /export/home/dimstat/* !!! ** AS WELL /WebX, /etc/WebX, /opt/WebX !!! Is it correct? (y/n): y ** Cleanup /export/home/dimstat ** Add User... ** WebX Setup... ** dim_STAT Server extract... ** HTTP Server Setup... ** Database Server Setup... ** ADMIN/Tools Setup... ** TEMP directory... ** Permissions... ** Crontab Setup... Sun Microsystems Inc. SunOS 5.11 snv_79a January 2008 Warning - Invalid account: 'dim' not allowed to execute cronjobs
** ** INSTALLATION is finished!!! ** May I create now a dim_STAT-Server start/stop script in /etc/rc*.d? (y/n): y ** =================================================================== ** ** You can start dim_STAT-Server now from /export/home/dimstat/ADMIN: ** ** # cd /export/home/dimstat/ADMIN ** # ./dim_STAT-Server start ** ** and access homepage via Web browser - http://localhost:80 ** ** To collect stats from any Solaris-SPARC/x86 or Linux-x86 machines ** just install & start on them [STAT-service] package... ** ** Enjoy! ;-) ** ** -Dimitri ** ===================================================================
root@soldevx> |
After installation, please proceed with installation of the STAT service on the nodes of your choice:
cd dimSTAT pkgadd -d dimSTAT-Solx86.pkg
|
Note: the dimSTAT STAT service needs to be installed on all nodes that you want to monitor and record for performance data.
dimSTAT – Configuration
The following steps will guide you through a simple dimSTAT configuration.
First, open a browser and navigate to http://localhost. You should see a similar screen:
Then click on the dim_STAT Main Page link (Welcome!). Following similar screen should now appear:
Let's start a new collect. Click on the “Start New Collect” link:
Enter the information to start a new collect on the host named localhost. Click the “Continue” button to move to the next screen:
Select the monitoring options of your choice (for example: vmstat, mpstat, iostat, netstat, etc...). Finally click the “Start STAT(s) Collect Now!!!” button to start monitoring.
The simple dimSTAT configuration is now complete. dimSTAT will record the selected data into a local database.
dimSTAT – Analysis
The following steps will guide you through a simple dimSTAT analysis session.
First, open a browser and navigate to http://localhost/ You should see a similar screen:
Next click on the “Welcome!” link to proceed to the next screen:
Click on the “Analyze” button to start analyzing recorded data. You should see a similar screen:
Select “Single-Host Analyze” and click on the “Analyze” button to proceed to the next screen:
Select the second line (with ID = 2) and click on the “VM stat” button to proceed to the next screen:
You should see a similar screen (top part).
Scroll to the bottom of the screen and select the tick boxes “CPU Usr%”, “CPU Sys%”, “CPU Idle%”. Then click on “Start” button to display the results.
et voila! dimSTAT displays a nice little graph that shows the percentage of usr, sys and idle time for the selected system.
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 26885 views
Explore your Solaris System
Explore your Solaris SystemExplore your System: Solaris System Inventory
Solaris Systems may have at least one CPU or up to hundreds. Solaris systems may have a single disk or entire farms. Anyone who deals with perfromance has to know what the quantitative aspects of the system are. The commands listed here will answer these questions.
It's pivotal that you get an understanding of the components of the system you want to tune. Knowing the hardware components and the installed software allows you to understand the quantitative limits of the system.
Solaris offers a wealth of commands to identify the characteristics of the running system. The following chapter discusses commands that help administrators and software developers understand and document accurately the hardware and software specifications.
This document reflects the state of the art of spring 2010.
- SunOS 5.10 known as Solaris 10
- SunOS 5.11 (build snv_111b) known through the distribution OpenSolaris 2009/06
Both operating system version are very similar. Commands which don't work in both versions are tagged as such.
uname - Printing Information about the Current System
The uname utility prints information about the current system on the standard output. The command outputs detailed information about the current system on operating system software revision level, processor architecture and platform attributes.
The table below lists selected options to uname:
Option |
Comments |
-a |
Prints basic information currently available from the system. |
-s |
Prints the name of the operating system. |
-r |
Prints the operating system release level. |
-i |
Prints the name of the platform. |
-p |
Prints the processor type or ISA [Instruction Set Architecture]. |
uname – selected options
uname – sample output
/etc/release – Detailed Information about the Operating System
The file /etc/release contains detailed information about the operating system. The content provided allows engineering or support staff to unambiguously identify the Solaris release running on the current system.
/etc/release file – sample output
showrev - Show Machine, Software and Patch Revision (Solaris 10 and older)
The showrev command shows machine, software revision and patch revision information. With no arguments, showrev shows the system revision information including hostname, hostid, release, kernel architecture, application architecture, hardware provider, domain and kernel version.
showrev – machine and software revision
To list patches installed on the current system, use the showrev command with the -p argument.
showrev -p – patch information
This command doesn't exist anymore in newer Sun OS 5.11 builds. The new packaging system (IPS) comes with a completely new set of commands.
pkg - IPS Packages (SunOS 5.11 only!)
List the installed packages with the $pkg list . the $pkg list -a option will list all packages wether installed or not.
pkg list command
isainfo - describe instruction set architectures
The isainfo command describes instruction set architectures. The isainfo utility is used to identify various attributes of the instruction set architectures supported on the currently running system. It can answer whether 64-bit applications are supported, or if the running kernel uses 32-bit or 64-bit device drivers.
The table below lists selected options to isainfo:
Option |
Comments |
<none> |
Prints the names of the native instruction sets for portable applications. |
-n |
Prints the name of the native instruction set used by portable applications. |
-k |
Prints the name of the instruction set(s) used by the operating system kernel components such as device drivers and STREAMS modules. |
-b |
Prints the number of bits in the address space of the native instruction set. |
isainfo – selected options
isainfo – describe instruction set architectures
isalist - Display native Instruction Sets Executable on this Platform
The isalist command displays the native instruction sets executable on this platform. The names are space-separated and are ordered in the sense of best performance. Earlier-named instruction sets may contain more instructions than later-named instruction sets. A program that is compiled for an earlier-named instruction sets will most likely run faster on this machine than the same program compiled for a later-named instruction set.
isalist – display native instruction sets
psrinfo - Display Information about Processors
The psrinfo command displays information about processors. Each physical processor may support multiple virtual processors. Each virtual processor is an entity with its own interrupt ID, capable of executing independent threads.
The table below lists selected options to psrinfo:
Option |
Comments |
<none> |
Prints one line for each configured processor, displaying whether it is online, non-interruptible (designated by no-intr), spare, off-line, faulted or powered off, and when that status last changed. |
-p |
Prints the number of physical processors in a system. |
-v |
Verbose mode. Prints additional information about the specified processors, including: processor type, floating point unit type and clock speed. If any of this information cannot be determined, psrinfo displays unknown. |
psrinfo – selected options
psrinfo – display processor information
prtdiag - Display System Diagnostic Information
The prtdiag command displays system diagnostic information. On Solaris 10 for x86/x64 systems, the command is only available with Solaris 10 01/06 or higher.
prtdiag – print system diagnostic information
prtconf - Print System Configuration
The prtconf command prints system configuration information. The output includes the total amount of memory, and the configuration of system peripherals formatted as a device tree.
prtconf –
print system configuration
cpuinfo [Tools CD] – Display CPU Configuration
The cpuinfo utility prints detailed information about the CPU type and characteristics (number, type, clock and strands) of the running system.
cpuinfo – sample output
meminfo [Tools CD] - Display physical Memory, Swap Devices, Files
The meminfo is a tool to display configuration of physical memory and swap devices or files.
meminfo – sample output
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 30995 views
Understanding System Utilization
Understanding System UtilizationThe following chapter introduces and demonstrates tools that help out in understanding overall system utilisation.
Tools discussed in this posting
uptime – Print Load Average
The easiest way to gain an overview on:
-
how long a system has been running.
-
current CPU load averages.
-
how many active users.
is with the command uptime.
uptime – print CPU load averages
The numbers printed to the right of “load average: “ are the 1-, 5- and 15-minute load averages of the system. The load average numbers give a measure of the number of runnable threads and running threads. Therefore the number has to be put in relation with the number of active CPUs in a system. For example, a load average of three (3) on a single CPU system would indicate some CPU overloading, while the same load average on a thirty-two (32) way system would indicate an unloaded system.
perfbar [Tools CD] - A lightweight CPU Meter
perfbar is a tool that displays a single bar graph that color codes system activity. The colors are as follows:
-
Blue = system idle.
-
Red = System time.
-
Green = CPU time.
-
Yellow = I/O activity (obsolete on Solaris 10 and later).
perfbar – sample output of a system with 16 CPU cores
Perfbar has been enhancend in Version 1.2 to provide better support for servers with many CPUs through a multi line visualisation. See below
perfbar: Visualisation of a Sun T5240 System with 128 strands (execution units) without any load
perfbar can be called without specifying any command line arguments. perfbar provides a large number of options which can be viewed with the -h option:
$ perfbar -h perfbar 1.2 maintained by Ralph Bogendoerfer based on the original perfbar by: Joe Eykholt, George Cameron, Jeff Bonwick, Bob Larson Usage: perfbar [X-options] [tool-options] supported X-options: -display <display> or -disp <display> -geometry <geometry> or -geo <geometry> -background <background> or -bg <background> -foreground <foreground> or -fg <foreground> -font <font> or -fn <font> -title <title> or -t <title> -iconic or -icon -decoration or -deco supported tool-options: -h, -H, -? or -help: this help -v or -V: verbose -r or -rows: number of rows to display, default 1 -bw or -barwidth: width of CPU bar, default 12 -bh or -barheight: height of CPU bar, default 180 -i or -idle: idle color, default blue -u or -user: user color, default green -s or -system: system color, default red -w or -wait: wait color, default yellow -int or -interval: interval for display updates (in ms),default 100 -si or -statsint: interval for stats updates (in display intervals), default 1 -avg or -smooth: number of values for average calculation, default 8
There are also a number of key strokes understood by the tool:
-
Q or q: Quit
-
R or r: Resize - this changes the window to the default size according to the number of CPU bars, rows and the chosen bar width and height.
-
Number keys 1 - 9: Display this number of rows.
-
+ and -: Increase or decrease number of rows displayed.
The tool is currently available as a beta in version 1.2. This latest version is not yet part of the Performance Tools CD 3.0. The engineers from the Sun Solution Center in Langen/Germany made it available for free through:
cpubar [Tools CD] - A CPU Meter, showing Swap, and Run Queue
cpubar displays one bar-graph for each processor with the processor speed(s) displayed on top. Each bar-graph is divided in four areas (top to bottom):
-
Blue - CPU is available.
-
Yellow - CPU is waiting for one or more I/O to complete (N/A on Solaris 10 and later).
-
Red - CPU is running in kernel space.
-
Green - CPU is running in user space.
As with netbar and iobar, a red and a dashed black & white marker shows the maximum and average used ratios respectively.
The bar-graphs labeled 'r', 'b' and 'w' are displaying the run, blocked and wait queues. A non empty wait queue is usually a symptom of a previous persistent RAM shortage. The total number of processes is displayed on top of these three bars.
The bar-graph labeled 'p/s' is displaying the process creation rate per second.
The bar-graph labeled 'RAM' is displaying the RAM usage (red=kernel, yellow=user, blue=free), the total RAM is displayed on top.
The bar-graph ('sr') is displaying (using a logarithmic scale) the scan rate (a high level of scans is usually a symptom of RAM shortage).
The bar-graph labeled 'SWAP' is displaying the SWAP (a.k.a Virtual Memory) usage (red=used, yellow=reserved, blue=free), the total SWAP space is displayed on top.
cpubar – sample output
vmstat – System Glimpse
The vmstat tool provides a glimpse of the current system behavior in a one line summary including both CPU utilisation and saturation.
In its simplest form, the command vmstat <interval> (i.e. vmstat 5) will report one line of statistics every <interval> seconds. The first line can be ignored as it is the summary since boot, all other lines report statistics of samples taken every <interval> seconds. The underlying statistics collection mechanism is based on kstat (see kstat(1)).
Let's run two copies of a CPU intensive application (cc_usr) and look at the output of vmstat 5. First start two (2) instances of the cc_usr program.
two (2) instances of cc_usr started
Now let's run vmstat 5 and watch its output.
vmstat – vmstat 5 report
First observe the cpu:id column which represents the system idle time (here 0%). Then look at the kthr:r column which represents the total number of runnable threads on dispatcher queues (here 1).
From this simple experiment, one can conclude that the system idle time for the five second samples was always 0, indicating 100% utilisation. On the other hand, kthr:r was mostly one and sustained indicating a modest saturation for this single CPU system (remember we launched two (2) CPU intensive applications).
A couple of notes with regard to CPU utilisation:
-
100% utilisation may be fine for your system. Think about a high-performance computing job: the aim will be to maximise utilisation of the CPU.
-
Values of kthr:r greater than zero indicate some CPU saturation (i.e. more jobs would like to run but cannot because no CPU was available). However, performance degradation should be gradual.
-
Sampling interval is important. Don't choose too small or too large intervals.
vmstat reports some additional information that can be interesting such as:
Column |
Comments |
in |
Number of interrupts per second. |
sys |
Number of system calls per second. |
cs |
Number of context switches per second (both voluntary and involuntary). |
us |
Percent user time: time the CPUs spent processing user-mode threads. |
sy |
Percent system time: time the CPUs spent processing system calls on behalf of user-mode threads, plus the time spent processing kernel threads. |
id |
Percent of time the CPUs are waiting for runnable threads. |
mpstat - Report per-Processor or per-Processor Set Statistics
The mpstat command reports processor statistics in tabular form. Each row of the table represents the activity of one processor. The first table summarizes all activity since boot. Each subsequent table summarizes activity for the preceding interval. The output table includes:
Column |
Comments |
CPU |
Prints processor ID. |
minf |
Minor faults (per second). |
mjf |
Major faults (per second). |
xcal |
Inter-processor cross-calls (per second). |
intr |
Interrupts (per second). |
ithr |
Interrupts as threads (not counting clock interrupt) (per second). |
csw |
Context switches (per second). |
icsw |
Involuntary context switches (per second). |
migr |
Thread migrations (to another processor) (per second). |
smtx |
Spins on mutexes (lock not acquired on first try) (per second). |
srw |
Spins on readers/writer locks (lock not acquired on first try) (per second). |
syscl |
System calls (per second). |
usr |
Percent user time. |
sys |
Percent system time. |
wt |
Always 0. |
idl |
Percent idle time. |
The reported statistics can be broken down into following categories:
-
Processor utilisation: see columns usr, sys and idl for a measure of CPU utilisation on each CPU.
-
System call activity: see syscl column for the number of system call per second on each CPU.
-
Scheduler activity: see column csw and column icsw. As the ratio icsw/csw comes closer to one (1), threads get preempted because of higher priority threads or expiration of their time quantum. Also the column migr displays the number of times the OS scheduler moves ready-to-run threads to an idle processor. If possible, the OS tries to keep the threads on the last processor on which it ran. If that processor is busy, the thread migrates.
-
Locking activity: column smtx indicates the number of mutex contention events in the kernel. Column srw indicates the number of reader-writer lock contention events in the kernel.
Now, consider the following sixteen-way (16) system used for test. This time four (4) instances of the cc_usr program were started and the output of vmstat 5 and mpstat 5 recorded.
Below, observe the output of processor information. Then the starting of the four (4) copies of the program and last the output of vmstat 5.
vmstat – vmstat 5 output on sixteen way system
Rightly, vmstat reports a user time of 25% because one-fourth (¼) of the system is used (remember 4 programs started, 16 available CPUs, i.e. 4/16 or 25%).
Now let's look at the output of mpstat 5.
mpstat – mpstat 5 sample output on sixteen way system
In the above output (two sets of statistics), one can clearly identify the four running instances of cc_usr on CPUs 1, 3, 5 and 11. All these CPUs are reported with 100% user time.
vmstat – Monitoring paging Activity
The vm stat command can also be used to report on system paging activity with the -p option. Using this form of the command, one can quickly get a clear picture on whether the system is paging because of file I/O (OK) or paging because of physical memory shortage (BAD).
Use the command as follows: vmstat -p <interval in seconds> . The output format includes following information:
Column |
Description |
swap |
Available swap space in Kbytes. |
free |
Amount of free memory in Kbytes. |
re |
Page reclaims - number of page reclaims from the cache list (per second). |
mf |
Minor faults - number of pages attached to an address space (per second) |
fr |
Page frees in Kbytes per second. |
de |
Calculated anticipated short-term memory shortfall in Kbytes. |
sr |
Scan rate - number of pages scanned by the page scanner per second. |
epi |
Executable page-ins in Kbytes per second. |
epo |
Executable page-outs in Kbytes per second. |
epf |
Executable page-frees in Kbytes per second. |
api |
Anonymous page-ins in Kbytes per second. |
apo |
Anonymous page-outs in Kbytes per second. |
apf |
Anonymous page-frees in Kbytes per second. |
fpi |
File system page-ins in Kbytes per second. |
fpo |
File system page-outs in Kbytes per second. |
fpf |
File system page-frees in Kbytes per second. |
As an example of vmstat -p output, let's try following commands:
find / > /dev/null 2>&1
and then monitor paging activity with: vmstat -p 5
As can be seen from the output, the system is showing paging activity because of file system read I/O (column fpi).
vmstat – sample output reporting on paging activity
zonestat - [OpenSolaris.org] Monitoring Resource Consumption within Zones
Jeff Victor developed an Open Source Perl script to measure utilization within zones. The tool is freely available for download on the OpenSolaris.org project pages.
It may me called with the following syntax:
zonestat [-l] [interval [count]]
The output looks like:
|----Pool-----|------CPU-------|----------------Memory----------------| |---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---| Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap| Use| Cap| Use| Cap| Use| Cap| Use ------------------------------------------------------------------------------- global 0D 66K 2 0.1 1 25 986M 139K 18E 2M 18E 754M db01 0D 66K 2 0.1 2 50 1G 122M 536M 536M 0 1G 135M web02 0D 66K 2 0.42 0.0 1 25 100M 11M 20M 20M 0 268M 8M
zonestat allows as well to monitor zone limits (caps)
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 30384 views
Process Introspection
Process IntrospectionThe next step in a performance analysis is to figure out what the application is doing. Configuring application is one thing. Checking if the application actually pulled all configration information is another thing. The tools below tell you what your application is doing.
Solaris provides a large collection of tools to list and control processes. For an overview and detailed description, please refer to the manual pages of proc(1). The following chapter introduces the most commonly used commands.
pgrep – Find Processes by Name and other Attributes
The pgrep command finds processes by name and other attributes. For that, the pgrep utility examines the active processes on the system and reports the process IDs of the processes whose attributes match the criteria specified on the command line. Each process ID is printed as a decimal value and is separated from the next ID by a delimiter string, which defaults to a newline.
pgrep – find processes by name and other attributes
pkill – Signal Processes by Name and other Attributes
The pkill signals processes by name and other attributes. pkill functions identically to pgrep, except that each matching process is signaled as if by kill(1) instead of having its process ID printed. A signal name or number may be specified as the first command line option to pkill.
pkill – signal processes by name and other attributes
ptree - Print Process Trees
The ptree prints parent-child relationship of processes. For that, it prints the process trees containing the specified pids or users, with child processes indented from their respective parent processes. An argument of all digits is taken to be a process-ID, otherwise it is assumed to be a user login name. The default is all processes.
ptree – no options
sta [Tools CD] – Print Process Trees
The sta tool provides similar output to ptree. See example run below.
sta – sample output
pargs - Print Process Arguments, Environment, or auxiliary Vector
The pargs utility examines a target process or process core file and prints arguments, environment variables and values, or the process auxiliary vector.
pargs – sample output
pfiles – Report on open Files in Process
The pfiles command reports fstat(2) and fcntl(2) information for all open files in each process. In addition, a path to the file is reported if the information is available from /proc/pid/path. This is not necessarily the same name used to open the file. See proc(4) for more information.
pfiles – sample output
pstack – Print lwp/process Stack Trace
The pstack command prints a hex+symbolic stack trace for each process or specified lwps in each process.
Note: use jstack for java processes
pstack – sample output
jstack – Print Java Thread Stack Trace [see $JAVA_HOME/bin]
The jstack command prints Java stack traces of Java threads for a given Java process or core file or a remote debug server. For each Java frame, the full class name, method name, 'bci' (byte code index) and line number, if available, are printed.
jstack – sample output
pwdx – Print Process current Working Directory
The pwdx utility prints the current working directory of each process.
pwdx – sample output
pldd – Print Process dynamic Libraries
The pldd command lists the dynamic libraries linked into each process, including shared objects explicitly attached using dlopen(3C). See also ldd(1).
pldd – sample output
pmap - Display Information about the Address Space of a Process
The pmap utility prints information about the address space of a process. By default, pmap displays all of the mappings in the virtual address order they are mapped into the process. The mapping size, flags and mapped object name are shown.
pmap – default output
An extended output is available by adding the -x option (additional information about each mapping) and the -s option (additional HAT size information).
pmap – extended output
showmem [Tools CD] – Process private and shared Memory usage
The showmem utility wraps around pmap and ps todetermine how much private and shared memory a process is using.
showmem – sample output
plimit - Get or set the Resource Limits of running Processes
In the first form, the plimit utility prints the resource limits of running processes.
plimit – displaying process resource limits
In the second form, the plimit utility sets the soft (current) limit and/or the hard (maximum) limit of the indicated resource(s) in the processes identified by the process-ID list, pid. As an example, let's limit the current (soft) core file size of the trashapplet process with PID 897 to five (5) MB, using the command: plimit -c 5m,unlimited 897.
plimit – setting the current (soft) core file limit to 5 MB
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 28179 views
Process Monitoring with prstat
Process Monitoring with prstatThe following chapter takes a deeper look at the Solaris tool prstat(1), the all round utility that helps understand system utilisation.
prstat– The Allround Utility
One of the most important and widely used utility found in Solaris is prstat (see prstat(1)). prstat gives fast answers to question:
-
How much is my system utilized in terms of CPU and memory?
-
Which processes (or users, zones, projects, tasks) are utilizing my system?
-
How are processes/threads using my system (user bound, I/O bound)?
In its simplest form, the command prstat <interval> (i.e. prstat 2) will examine all processes and report statistics sorted by CPU usage.
prstat – prstat 2 command reporting on all processes and sorting by CPU usage
As can be seen from the screen capture, processes are ordered from top (highest) to bottom (lowest) according to their current CPU usage (in % - 100% means all system CPUs are fully utilized). For each process in the list, following information is printed:
-
PID: the process ID of the process.
-
USERNAME: the real user (login) name or real user ID.
-
SIZE: the total virtual memory size of the process, including all mapped files and devices, in kilobytes (K), megabytes (M), or gigabytes (G).
-
RSS: the resident set size of the process (RSS), in kilobytes (K), megabytes (M), or gigabytes (G).
-
STATE: the state of the process (cpuN/sleep/wait/run/zombie/stop).
-
PRI: the priority of the process. Larger numbers mean higher priority.
-
NICE: nice value used in priority computation. Only processes in certain scheduling classes have a nice value.
-
TIME: the cumulative execution time for the process.
-
CPU: The percentage of recent CPU time used by the process. If executing in a non-global zone and the pools facility is active, the percentage will be that of the processors in the processor set in use by the pool to which the zone is bound.
-
PROCESS: the name of the process (name of executed file).
-
NLWP: the number of lwps in the process.
The <interval> argument given to prstat is the sampling/refresh interval in seconds.
Special Report – Sorting
The prstat output can be sorted by another criteria than CPU usage. Use the option -s (descending) or -S (ascending) with the criteria of choice (i.e. prstat -s time 2):
Criteria |
Comments |
cpu |
Sort by process CPU usage. This is the default. |
pri |
Sort by process priority. |
rss |
Sort by resident set size. |
size |
Sort by size of process image. |
time |
Sort by process execution time. |
Special report – Continuous Mode
With the option -c to prstat, new reports are printed below previous ones, instead of overprinting them. This is especially useful when gathering information to a file (i.e. prstat -c 2 > prstat.txt). The option -n <number of output lines> can be used to set the maximum length of a report.
prstat – continuous report sorted by ascending other of CPU usage
Special Report – by users
With the option -a or -t to prstat, additional reports about users are printed.
prstat – prstat -a 2 reports by user
Special Report – by Zones
With the option -Z to prstat, additional reports about zones are printed.
Special Report – by Projects (see projects(1))
With the option -J to prstat, additional reports about projects are printed.
prstat – prstat -J 2 reports about projects
Special Report – by Tasks (see newtask(1))
With the option -T to prstat, additional reports about tasks are printed.
prstat – prstat -T 2 reports by tasks
Special Report – Microstate Accounting
Unlike other operating systems that gather CPU statistics every clock tick or every fixed time interval (typically every hundredth of a second), Solaris 10 incorporates a technology called microstate accounting that uses high-resolution timestamps to measure CPU statistics for every event, thus producing extremely accurate statistics.
The microstate accounting system maintains accurate time counters for threads as well as CPUs. Thread-based microstate accounting tracks several meaningful states per thread in addition to user and system time, which include trap time, lock time, sleep time and latency time. prstat reports the per-process (option -m) or per-thread (option -mL) microstates.
prstat – prstat -m 2 reports on process microstates
The screen output shown above displays microstates for the running system. Looking at the top line with PID 693, one can see that the process Xorg spends 1.8% of its time in userland while sleeping (98%) the rest of its time. prstat – prstat -mL 2:
The screen output shown above displays per-thread microstates for the running system. Looking at the line with PID 1311 (display middle), one can see the microstates for LWP #9 and LWP #8 of the process firefox-bin.
prstat usage Scenario – cpu Latency
One important measure for CPU saturation is the latency (LAT column) output of prstat. Let's once again, start two (2) copies of our CPU intensive application.
prstat – observing latency with CPU intensive application
Now let's run prstat with microstate accounting reporting, i.e. prstat -m 2 and record the output:
prstat – prstat -m 2 output
Please observe the top two (2) lines of the output with PID 2223 and PID 2224. One can clearly see that both processes exhibit a high percentage of their time (50% and 52% respectively) in LAT microstate (CPU latency). The remaining time is spent in computation as expected (USR microstate). Clearly in this example, both CPU bound applications are fighting for the one CPU of the test system, resulting in high waiting times (latency) to gain access to a CPU.
prstat usage Scenario – High System Time
Let's run a system call intensive applicationand watch the output of prstat. First, start one instance of cc_sys:
prstat – system call intensive application
Then watch the prstat -m 2 output:
prstat – prstat -m 2 output for system call intensive application
Note the top line of the above output with PID 2310. One clearly identifies a high-system time usage (61%) for the process cc_sys. Also notice the high ratio of ICX/VCX (277/22) which shows that the process is frequently involuntarily context switched off the CPU.
prstat usage Scenario – Excessive Locking
Frequently, poor scaling is observed for applications on multi-processor systems. One possible root cause is badly designed locking inside the application resulting in large time spent waiting for synchronisation. The prstat column LCK reports on percentage of time spent waiting on user locks.
Let's look at an example with a sample program that implements a locking mechanism for a critical section using reader/writer locks. The programs has four (4) threads looking for access to the shared critical region as readers while one thread accesses the critical section in writer mode. To exhibit the problem, the writer has been slowed down on purpose, so that it spends some time holding the critical section (effectively barring access for the readers).
First start the program such as the writer spends zero (0) microseconds in the critical region (ideal case).
cc_lck 0 – running in ideal conditions
Now let's observe the per-thread microstates. Use prstat -mL -p 2626 2 for this.
cc_lck 0 – prstat output
One can observe, that all threads (5) are fighting almost equally for computing resources. Since nor the reader nor the writer hold the critical section for long there is no wait time registered.
Now let's restart, the whole test with a writer wait time of ten (10) microseconds.
Again, let's observe the microstates. Use prstat -mL -p 2656 2 for this.
cc_lck 10 – prstat output
Now the picture looks different. The four (4) reader threads are spending 84% of their time waiting on the lock for the critical region. The writer (LWP #1) on the other hand, is spending most of its time sleeping (82%).
While in the case of this sample application, the locking problems are obvious when looking at the source code, prstat microstate accounting capabilities can help pin-point locking weaknesses in larger applications.
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 178996 views
RE:LCK
I have a process with high LCK causing kernel contention in Solaris 10 (high smtx in mpstat, high sys/user ratio, high mutex_adenters in kstat -m cpu). I can only say that in this scenario, sys time is increased because the process cannot get a mutex and it either do active wait (penalty for cpu time on user account) or otherwise wait for interruption (then probably the penalty goes in context switching and kernel processing).
jose "at" irisel.com
- Log in to post comments
RE:LCK
I have a process with high LCK causing kernel contention in Solaris 10 (high smtx in mpstat, high sys/user ratio, high mutex_adenters in kstat -m cpu). I can only say that in this scenario, sys time is increased because the process cannot get a mutex and it either do active wait (penalty for cpu time on user account) or otherwise wait for interruption (then probably the penalty goes in context switching and kernel processing).
- Log in to post comments
Understanding IO
Understanding IOiobar [Tools CD] - Display I/O for Disk Devices Graphically
iobar displays two bar-graphs for each disk device. The read and write data rate are displayed in green in the left and right areas. The disk utilization is shown in the middle (red). At the bottom of the bars, input and output rates are displayed numerically, the value can be selected between last (green), average (white) and maximum (red) with the mouse middle button. The display mode can be toggled between logarithmic and linear with the left mouse button. In linear mode, scaling is automatic. All values are in bytes per second.
iobar – sample output
iotop [Tools CD] – Display iostat -x in a top-like Fashion
iotop is a binary that collects I/O statistics for disks, tapes, NFS-mounts, partitions (slices), SVM meta-devices and disk-paths. The display of those statistics can be filtered by device class or using regular expressions. Also the sorting order can be modified.
iotop – sample output
iostat – I/O Wizard
If you are looking at understanding I/O behaviour on a running system, your first stop will be the command iostat. iostat gives fast answers to question:
-
How much I/O in terms of input/output operations/second (IOPS) and throughput (MB/second)?
-
How busy are my I/O subsystems (latency and utilisation)?
In its simplest form, the command iostat -x <interval> (i.e. iostat -x 2) will examine all I/O channels and report statistics. See iostat -xc 2:
As can be seen from the screen capture, the iostat -x <interval> command will report device statistics every <interval> seconds. Every device is reported on a separate line and includes following information:
-
device: device name
-
r/s: device reads per second, i.e. read IOPS.
-
w/s: device writes per second, i.e. write IOPS.
-
kr/s: kilobytes read per second.
-
kw/s: kilobytes write per second.
-
wait: average number of transactions waiting for service (queue length)
-
actv: average number of transactions actively being serviced (removed from the queue but not yet completed) . This is the number of I/O operations accepted, but not yet serviced, by the device.
-
svc_t: average response time of transactions, in milliseconds . The svc_t output reports the overall response time, rather than the service time of a device. The overall time includes the time that transactions are in queue and the time that transactions are being serviced.
-
%w: percent of time there are transactions waiting for service (queue non-empty).
-
%b: percent of time the disk is busy (transactions in progress).
By adding the option -M to iostat, the report outputs megabytes instead of kilobytes.
iostat Usage Scenario – Sequential I/O
Let's study the output of iostat when doing sequential I/O on the system. For that, become super-user and in a terminal window, start the command:
-
dd if=/dev/rdsk/c1d0s0 of=/dev/null bs=128k &
Then start the iostat command with iostat -xM 10 and watch the output. After a minute stop the iostat and dd processes.
As can be seen from the screen capture above, the disk in the test system can sustain a read throughput of just over 25 MB/second, with an average service time below 5 milliseconds.
iostat Usage Scenario – Random I/O
Let's study the output of iostat when doing random I/O on the system. For that, start the command:
-
find / >/dev/null 2>&1 &
Then start the iostat command with iostat -xM 10 and watch the output. After a minute stop the iostat and find processes.
iostat – random I/O
As can be seen from the screen capture above, the same disk in the test system delivers just less than 1 MB/second on random I/O.
Properly sizing an I/O subsystem is not a trivial exercise. One has to take into considerations factors like:
-
Number of I/O operations per second (IOPS)
-
Throughput in Megabytes per second (MB/s)
-
Service times (in milliseconds)
-
I/O pattern (sequential or random)
-
Availability of caching
zpool iostat: iostat for zfs pools
ZFS comes with it's own version of iostat. It's build into the zpool command since the IO is a feature of the pool. The behavior is very similar to iostat. There are however less options:
zpool iostat [-T u | d ] [-v] [pool] ... [interval[count]]
The -T option allows to specify time formats. The version option (-v) shows the IO on a vdev device.
zpool iostat with a 10 second sample time
The verbose option is the option to go in more complex environments:
zpool iostat with verbose option and 10s sampletime
iosnoop [DtraceToolkit] – Print Disk I/O Events
iosnoop is a program that prints disk I/O events as they happen, with useful details such as UID, PID, filename, command, etc. iosnoop is measuring disk events that have made it past system caches.
Let's study the output of iosnoop when doing random I/O on the system. For that, start the command:
-
find / >/dev/null 2>&1 &
Then start the iosnoop command and watch the output. After a minute stop the iosnoop and find processes.
iosnoop – sample output
iopattern [DtraceToolkit] – Print Disk I/O Pattern
iopattern prints details on the I/O access pattern for disks, such as percentage of events that were of a random or sequential nature. By default, totals for all disks are printed.
Let's study the output of iopattern when doing random I/O on the system. For that, start the command:
-
find / >/dev/null 2>&1 &
Then start the iopattern command and watch the output. After a minute stop the iopattern and find processes.
iopattern – sample output
iotop [DtraceToolkit] – Display top Disk I/O Events by Process
iotop prints details on the top I/O events by process.
Let's study the output of iotop when doing random I/O on the system. For that, start the command:
-
find / >/dev/null 2>&1 &
Then start the iotop command and watch the output. After a minute stop the iotop and find processes.
fsstat [Solaris 10+] – Report File System Statistics
fsstat reports kernel file operation activity by the file system type or by the path name, which is converted to a mount point. Please see the man page fsstat(1) for details on all options.
Let's study the output of fsstat when doing random I/O on the system. For that, start the command:
find / >/dev/null 2>&1 &
Then start the fsstat / 1 command and watch the output. After a minute stop the fsstat and find processes.
fsstat – sample output
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 65222 views
Tracing
TracingThe following chapter introduces further tools and techniques for tracing at large.
Dtrace is the Solaris framework in question everyone wants to use for queries in any depth. The technology is however rather complex to be learned.
I limited the number of Dtrace scripts to the ones in the Dtrace tool kit. The focus of this primer is to provide tools categorized by problem domain.
There are a number of visual tracing solutions build on top of Dtrace.
- Project D-Light
- Chime
- Project fishworks as used in the Oracle ZFS appliance (not available on general purpose systems)
truss – First Stop Tool
truss is one of the most valuable tools in Solaris to understand various issues with applications. truss can help understand which files are read and written, which system calls are called and much more. Although, truss is very useful, one has to understand that it is also quite intrusive on the applications traced and can therefore influence performance and timing.
A standard usage scenario for truss is to get a summary of system call activity of a process over a given window of time.
truss – system call summary
As can be seen from the output above, the find process issues large amounts of fstat(2), lstat(2), getdents(2) and fchdir(2) system calls. The getdents(2) system call consumes roughly 45% of the total system time (0.149 seconds of 0.331 seconds of total system time).
Another standard usage scenario of truss, is to get a detailed view of the system calls issued by a given process.
truss – detailed system call activity
The output above shows truss giving out details about the system calls issued and their parameters. Further details can be obtained with the -v option. For example:
-
truss -v all -p <pid>
Yet another standard usage scenario is to restrict output of truss to certain system calls:
-
truss -t fstat -p <pid>
would limit the output to fstat(2) system call activity.
truss -t – sample output
Finally, combining the -t option with -v, one gets an output like this:
truss -t -v – sample output
plockstat – Report User-Level Lock Statistics
The plockstat utility gathers and displays user-level locking statistics. By default, plockstat monitors all lock contention events, gathers frequency and timing data about those events, and displays the data in decreasing frequency order, so that the most common events appear first. plockstat gathers data until the specified command completes or the process specified with the -p option completes. plockstat relies on DTrace to instrument a running process or a command it invokes to trace events of interest. This imposes a small but measurable performance overhead on the processes being observed . Users must have the dtrace_proc privilege and have permission to observe a particular process with plockstat .
Let's study the output of plockstat by running our sample reader/writer locking program cc_lck. First start cc_lck with the writer blocking for ten microseconds:
-
cc_lck 10
Then run the plockstat tool for ten seconds:
-
plockstat -A -e 10 -p <pid>
The output should be similar to the screen shot below. From the output, one can observe some contention on the reader/writer lock.
plockstat – sample output
pfilestat [DtraceToolkit] – Trace Time spend in I/O
pfilestat prints I/O statistics for each file descriptor within a process. In particular, the time break down during read() and write() events is measured. This tool helps understanding the impact of I/O on the process.
To study the output of pfilestat , let's start as root the following command:
-
dd if=/dev/rdsk/c1d0s0 of=/dev/null bs=1k &
Then in another window, let's start the pfilestat tool with the pid of the dd command as argument:
-
pfilestat <pid of dd command>
The output should be similar to the screen shot below:
pfilestat – sample output
The pfilestat breaks down the process time in percentage spend for reading (read), writing (write), waiting for CPU (waitcpu), running on CPU (running), sleeping on read (sleep-r) and sleeping on write (sleep-w).
cputrack/cpustat – Monitor Process/System w/ CPU perf. counters
The cputrack utility allows CPU performance counters to be used to monitor the behavior of a process or family of processes running on the system. The cpustat utility allows CPU performance counters to be used to monitor the overall behavior of the CPUs in the system.
Using cputrack/cpustat requires intimate knowledge of the CPU and system under observation. Please consult the system/CPU documentation for details on the counters available. cpustat or cputrack with the -h option will list all available performance counters.
To observe the output of cputrack , let's run the tool with our sample program cc_usr .
Use the following command (all in one line):
-
cputrack -t -c pic0=FP_dispatched_fpu_ops,cmask0=0,umask0=0x7,pic1=FR_retired_x86_instr_w_excp_intr,cmask1=0 cc_usr
The output should look like this:
cputrack – sample output
In the above output, one can see that the cc_usr program executed roughly 600 million instructions per second with roughly 160 million floating point operations per second.
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 27745 views
Understand the Network
Understand the NetworkThe following chapter takes a deeper look at network utilisation.
netbar [Tools CD] - Display Network Traffic graphically
netbar displays two bar-graphs for each network interface.
-
The left one is the input bandwidth, the right one the output bandwidth.
-
The green area shows the used bandwidth and the blue area shows the available one.
On each bar-graph, a red marker shows the maximum bandwidth observed during the last period, and a dashed black & white marker shows the average bandwidth during the same period.
At the bottom of the bars, input and output rates are displayed numerically, the value can be selected between last (green), average (white) and maximum (red) with the mouse middle button. Between the bar-graphs, a white line displays the error rate while a red line displays the collision rate.
The display mode can be toggled between logarithmic and linear with the left mouse button. In linear mode, scaling is automatic.
A thin white line is showing the reported maximum interface speed, if this line spans the whole two bars, the interface is in full-duplex mode, while if the line is limited to the half of the bars, the interface is in half-duplex mode. All values are in bits per second.
netbar – sample output
netsum [Tools CD] – Displays Network Traffic
netsum is a netstat like tool, however, its display output is in kilobytes per second, packets per second, errors, collisions, and multicast.
netsum – sample output
nicstat [Tools CD] - Print Statistics for Network Interfaces
nicstat prints statistics for the network interfaces such as kilobytes per second read and written, packets per second read and written, average packet size, estimated utilisation in percent and interface saturation.
nicstat – sample output
netstat – Network Wizard
If you are looking at understanding network behavior on a running system, your first stop may be the command netstat. netstat gives fast answers to question:
-
How many TCP/IP sockets are open on my system?
-
Who communicates with whom? And with what parameters?
The netstat command has many options that will satisfy everyone's needs. Please refer to netstat(1) for details.
netstat Usage Scenario – List open Sockets
Often, one will want to look at the list of network sockets on a system. The netstat command delivers this type of information for the protocol TCP with the following command:
-
netstat -af inet -P tcp
If you are interested in the protocol UDP, replace tcp with udp, i.e.
-
netstat -af inet -P udp
As an example, let's run the following command and capture the output:
-
netstat -af inet -P tcp
netstat – sample output list TCP network sockets
The command outputs one line per socket in the system. Included information is about:
-
Local Address: the local socket endpoint with interface and protocol port.
-
Remote Address: the remote socket endpoint with interface and protocol port.
-
Swind: sending window size in bytes.
-
Send-Q: sending queue size in bytes.
-
Rwind: receiving window size in bytes.
-
Recv-Q: receiving queue size in bytes.
-
State: protocol state (i.e. LISTEN, IDLE, TIME_WAIT, etc...).
tcptop/tcptop_snv [DtraceToolkit] – network “ top”
tcptop (Solaris 10) and tcptop_snv (OpenSolaris) display top TCP network packets by process. To do so, the tool analyses TCP network packets and prints the responsible PID and UID, plus standard details such as IP address and port. The utility can help identify which processes are causing TCP traffic.
You can start the tool with the command tcptop on Solaris 10 and tcptop_snv on OpenSolaris. Let's study the output of tcptop_snv. For that, start tcptop_snv in one window and in another one, generate some network traffic with the command:
-
scp /kernel/genunix localhost:/tmp
The output should be similar to this screen:
tcptop_snv – sample output
tcpsnoop/tcpsnoop_snv [DtraceToolkit] – Network Snooping
tcpsnoop (Solaris 10) and tcpsnoop_snv (OpenSolaris) snoops TCP network packets by process. The tool operates in a similar way than tcptop and tcptop_snv, however information is displayed continuously.
You can start the tool with the command tcpsnoop on Solaris 10 and tcpsnoop_snv on OpenSolaris. Let's study the output of tcpsnoop_snv. For that, start tcpsnoop_snv in one window and in another one, generate some network traffic with the command:
-
scp /kernel/genunix localhost:/tmp
The output should be similar to this screen:
tcpsnoop_snv – sample output
nfsstat – NFS statistics
nfsstat displays statistical information about the NFS and RPC (Remote Procedure Call) interfaces to the kernel. It can be used to view client and/or server side statistics broken down by NFS version (2, 3 or 4).
Thomas Bastian was a coauthor of an earlier version of this document. The earlier version of this page has been published in the "The Developers Edge" in 2009.
- 24851 views