Verbose Data

Data is reported in this form when either --verbose is used OR if there is at least one type of data requested that doesn't have a brief form such as any detail data or ionodes, processes or slabs. Specifying some of the lustre output options with -O such as B, D and M will also force verbose format.

CPU, collectl -sc

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER  NICE   SYS  WAIT   IRQ  SOFT STEAL  IDLE  INTR  CTXSW  PROC  RUNQ   RUN   AVG1  AVG5 AVG15
These are the percentage of time the system in running is one of the modes, noting that these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others may not be:

User Time spent in User mode, not including time spend in "nice" mode.
Nice Time spent in Nice mode, that is lower priority as adjusted by the nice command and have the "N" status flag set when examined with "ps".
Sys This is time spent in "pure" system time.
Wait Also known as "iowait", this is the time the CPU was idle during an outstanding disk I/O request. This is not considered to be part of the total or system times reported in brief mode.
Irq Time spent processing interrupts and also considered to be part of the summary system time reported in "brief" mode.
Soft Time spent processing soft interrupts and also considered to be part of the summary system time reported in "brief" mode.
Steal Time spend in involuntary wait state while the hypervisor was servicing another virtual processor.

This next set of fields apply to processes

ProcProcess creations/sec.
RunqNumber of processes in the run queue.
RunNumber of processes in the run state.
Avg1, Avg5, Avg15Load average over the last 1,5 and 15 minutes.

Disks, collectl -sd

# DISK SUMMARY (/sec)
#Reads  R-Merged  R-KBytes  SizeKB  Writes  W-Merged  W-KBytes  SizeKB
ReadsNumber of reads/sec
R-Merged Read requests merged per second when being dequeued. These statistics are not available in older kernels which only record disk statistics in /proc/stat.
R-KBytesKB read/sec
SizeKBAverage read size in KB
WritesNumber of writes/sec
W-Merged Write requests merged per second when being dequeued.
W-KBytesKB written/sec
SizeKBAverage write size in KB

Inodes/Filesystem, collectl -si

# INODE SUMMARY
#    Dentries      File Handles    Inodes
# Number  Unused   Alloc   % Max   Number
   40585   39442     576    0.17    38348
DCache
NumberNumber of entries in directory cache
UnusedNumber of unused entries in directory cache
HandlesNumber of allocated file handles
% MaxPercentage of maximum available file handles
InodeNumber of used inode handles
NOTE - as of this writing I'm baffled by the dentry unused field. No matter how many files and/or directories I create, this number goes up! Sholdn't it go down?

Infiniband, collectl -sx

# INFINIBAND SUMMARY (/sec)
# OpsIn  OpsOut   KB-In  KB-Out  IOSizeI IOSizeO  Errors
OpsInPackets received/sec.
OpsOutPackets transmitted/sec.
KB-InKB received/sec.
KB-OutKB transmitted/sec.
IOSizeIAverage incoming packet size in KB
IOSizeOAverage outgoing packet size in KB
ErrsCount of current errors. Since these are typically infrequent, it is felt that reporting them as a rate would result in either not seeing them OR round-off hiding their values.

Lustre

Lustre Client, collectl -sl

There are several formats here controlled by the -O switch. There is also detail data for these available as well. Specifying -sL results in data broken out by the file system and -sLL breaks it out by OST.

# LUSTRE CLIENT SUMMARY
# Reads ReadKB  Writes WriteKB
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
ReadKBKB/sec delivered to the client.
WritesWrites/sec delievered to the storage servers.
WriteKBKB Writes/sec delievered to the storage servers.
# LUSTRE CLIENT SUMMARY: METADATA
# Reads ReadKB  Writes WriteKB  Open Close GAttr SAttr  Seek Fsync DrtHit DrtMis
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
ReadKBKB/sec delivered to the client.
WritesWrites/sec delievered to the storage servers.
WriteKBKB Writes/sec delievered to the storage servers.
OpenFile opens/sec
CloseFile closes/sec
GAttrgetattrs/sec
Seekseeks/sec
Fsyncfsyncs/sec
DrtHitdirty hits/sec
DrtMisdirty misses/sec
# LUSTRE CLIENT SUMMARY: READAHEAD
# Reads ReadKB  Writes WriteKB  Pend  Hits Misses NotCon MisWin LckFal  Discrd ZFile ZerWin RA2Eof HitMax
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
ReadKBKB/sec delivered to the client.
WritesWrites/sec delievered to the storage servers.
WriteKBKB Writes/sec delievered to the storage servers.
PendPending issued pages
Hitsprefetch cache hits
Missesprefetch cache misses
NotConThe current pages read that were not consecutive with the previous ones./td>
MisWinMiss inside window. The pages that were expected to be in the prefetch cache but weren't. They were probably reclaimed due to memory pressure
LckFalFailed grab_cache_pages. Tried to prefetch page but it was locked.
DiscrdRead but discarded. Prefetched pages (but not read by applicatin) have been discarded either becuase of memory pressure or lock revocation.
ZFileZero length file.
ZerWinZero size window.
RA2EofRead ahead to end of file
HitMaxHit maximum readahead issue. The read-ahead window has grown to the maximum specified by max_read_ahead_mb
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#Rds  RdK   1K   2K   ...  Wrts WrtK   1K   2K   ...
This display shows the size of rpc buffer distribution buckets in K-pages. You can find the page size for you system in the header (collectl --showheader).

RdsReads/sec
RdKKBs read/sec
nKNumber of pages of of this size read
WrtsWrites/sec
WrtKKBs written/sec
nKNumber of pages of of this size written

Lustre Meta-Data Server, collectl -sl

# LUSTRE FILESYSTEM SUMMARY
#<------------- MDS --------------->
#CLOSE   GETATTR     REINT      SYNC
CloseNumber of file closes/sec.
GetattrNumber of getattrs/sec.
ReintReintegrated operations/sec which are inode modifications and unklinks.
SyncNumber of syncs/sec.

This display is very similar the the RPC buffers in that the sizes of different size I/O requests are reported. In this case there are requests sent to the disk driver. Note that this report is only available for HP's SFS.

# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds  RdK 0.5K   1K   ...  Wrts WrtK 0.5K   1K   ...
RdsReads/sec
RdKKBs read/sec
nKNumber of blocks of of this size read
WrtsWrites/sec
WrtKKBs written/sec
nKNumber of blocks of of this size written

Lustre Object Storage Server, collectl -sl

# LUSTRE FILESYSTEM SUMMARY
#<------------------- OST ------------------>
#READ OPS   READ KB      WRITE OPS   WRITE KB
Read OpsReads/sec
Read KBKB/sec read
Write OpsWrites/sec
Write KBKB/sec written

Lustre Object Storage Server, collectl -sl -OB

# LUSTRE FILESYSTEM SUMMARY
#<--------read----------------writes-----------------
#Rds  RdK   1K   2K   ...  Wrts WrtK   1K   2K   ....
RdsReads/sec
RdKKBs read/sec
nKNumber of pages of of this size read
WrtsWrites/sec
WrtKKBs written/sec
nKNumber of pages of of this size written

Lustre Object Storage Server, collectl -sl -OD

# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds  RdK 0.5K   1K   ...   Wrts WrtK 0.5K   1K   ...
RdsReads/sec
RdKKBs read/sec
nKNumber of blocks of of this size read
WrtsWrites/sec
WrtKKBs written/sec
nKNumber of blocks of of this size written

Memory, collectl -sm

# MEMORY STATISTICS
#<------------------------Physical Memory-----------------------><-----------Swap----------><-Inactive->
#   TOTAL    USED    FREE    BUFF  CACHED    SLAB  MAPPED  COMMIT     TOTAL    USED    FREE     TOTAL     IN    OUT
Total Total physical memory
Used Used physical memory. This does not include memory used by the kernel itself.
Commit Accorting to RedHat: "An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload."
Swap Total Total Swap
Swap Used Used Swap
Swap Free Free Swap
Inactive Inactive pages. On ealier kernels this number is the sum of the clean, dirty and laundry pages.
Pages/Sec In Total number of pages read by block devices
Pages/Sec Out Total number of pages written by block devices

Network, collectl -sn

The entries for error counts are actually the total of several types of errors. To get individual error counts, you must report details on individual interfaces in plot format by specifying -P. Transmission errors are categorized by errors, dropped, fifo, collisions and carrier. Receive errors are broken out for errors, dropped, fifo and fragments.

# NETWORK SUMMARY (/sec)
#InPck  InErr OutPck OutErr   Mult   ICmp   OCmp    IKB    OKB  ISize  OSize
InPck Incoming packets/sec
InErr Incoming errors/sec
OutPck Outgoing packets/sec
OutErr Outgoing errors/sec
Mult Outgoing multicast packets/sec
ICmp Incoming compressed packets/sec
OCmp Outgoing compressed packets/sec
IKB Incoming KB/sec
OKB Outgoing KB/sec
ISize Average incoming packet size in bytes
OSize Average outgoing packet size in bytes

NFS, collectl -sf

These statistics will be reported for V3 servers by default but you can choose a different version and/or client data via -O. They correspond to the net, rpc and protocol specific sections of the nfsstat utility.

# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS V3--->
#PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
PktsTotal network packets, which is the sum of UDP and TCP
UDPNumber of UDP packets/sec
TCPNumber of TCP packets/sec
TCPConnNumber of TCP connections/sec
CallsNumber of RPC calls/sec
BadAuthNumber of authentication failures/sec
BadClntNumber of unknown clients/sec
ReadNumber of reads/sec
WriteNumber of writes/sec

NFS, collectl -sf -OC

The data reported for clients is slightly different, specifically the retrans and authref fields.

# NFS CLIENT (/sec)
#<----------RPC---------><---NFS V3--->
#CALLS  RETRANS  AUTHREF    READ  WRITE
CallsNumber of RPC calls/sec
RetransRetransmitted calls
AuthrefAuthentication failed
ReadNumber of reads/sec
WriteNumber of writes/sec

Slabs, collectl -sy

As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since there is not a 1:1 mapping between what it reports and the older slab allocator, the format of this listing will depend on which allocator is being used. The following format is for the older allocator.

# SLAB SUMMARY
#<------------Objects------------><--------Slab Allocation-------><--Caches--->
#  InUse   Bytes    Alloc   Bytes   InUse   Bytes   Total   Bytes  InUse  Total
Objects
InUse Total number of objects that are currently in use.
Bytes Total size of all the objects in use.
Alloc Total number of objects that have been allocated but not necessarily in use.
Bytes Total size of all the allocated objects whether in use or not.
Slab Allocation
InUse Number of slabs that have at least one active object in them.
Bytes Total size of all the slabs.
Total Total number of slabs that have been allocated whether in use or not.
Bytes Total size of all the slabs that have been allocted whether in use or not.
Caches
InUse Not all caches are actully in use. This included only those with non-zero counts.
Total This is the count of all caches, whether currently in use or not.

This is format for the new slub allocator

# SLAB SUMMARY
#<---Objects---><-Slabs-><-----memory----->
# In Use   Avail  Number      Used    Total
One should note that this report summarizes those slabs being monitored. In general this represents all slabs, but if filering is being used these numbers will only apply to those slabs that have matched the filter.

Objects
InUse The total number of objects that have been allocated to processes.
Avail The total number of objects that are available in the currently allocated slabs. This includes those that have already been allocated toprocesses.
Slabs
Number This is the number of individual slabs that have been allocated and taking physical memory.
Memory
Used Used memory corresponds to those objects that have been allocated to processes.
Total Total physical memory allocated to processes. When there is no filtering in effect, this number will be equal to the Slabs field reported by -sm.

Sockets, collectl -ss

# SOCKET STATISTICS
#      <-------------Tcp------------->   Udp   Raw   <---Frag-->
#Used  Inuse Orphan    Tw  Alloc   Mem  Inuse Inuse  Inuse   Mem
UsedTotal number if socket allocated which can include additional types such as domain.
Tcp
InuseNumber of TCP connections in use
OrphanNumber of TCP orphaned connections
TwNumber of connections in TIME_WAIT
AllocTCP sockets allocated
Mem
Udp
InuseNumber of UCP connections in use
Raw
InuseNumber of RAW connections in use
Frag
Inuse
Mem

TCP, collectl -st

# TCP SUMMARY (/sec)
# PureAcks HPAcks   Loss FTrans
PureAcksACKs/sec that only contain acks (ie no data).
HPAcksFast-path acks/sec.
LossPackets/sec TCP thinks have been lost coming in.
FTransFast retransmissions/sec.