oclumon dumpnodeview

Use the oclumon dumpnodeview command to view log information from the system monitor service in the form of a node view.

A node view is a collection of all metrics collected by CHM for a node at a point in time. CHM attempts to collect metrics every five seconds on every node. Some metrics are static while other metrics are dynamic.

A node view consists of eight views when you display verbose output:

  • SYSTEM: Lists system metrics such as CPU COUNT, CPU USAGE, and MEM USAGE

  • TOP CONSUMERS: Lists the top consuming processes in the following format:

    metric_name: 'process_name(process_identifier) utilization'
    
  • PROCESSES: Lists process metrics such as PID, name, number of threads, memory usage, and number of file descriptors

  • DEVICES: Lists device metrics such as disk read and write rates, queue length, and wait time per I/O

  • NICS: Lists network interface card metrics such as network receive and send rates, effective bandwidth, and error rates

  • FILESYSTEMS: Lists file system metrics, such as total, used, and available space

  • PROTOCOL ERRORS: Lists any protocol errors

  • CPUS: Lists statistics for each CPU

You can generate a summary report that only contains the SYSTEM and TOP CONSUMERS views.

"Metric Descriptions" lists descriptions for all the metrics associated with each of the views in the preceding list.

Note:

Metrics displayed in the TOP CONSUMERS view are described in Table J-4.

Example J-1 shows an example of a node view.

Syntax

oclumon dumpnodeview [[-allnodes] | [-n node1 node2 noden] [-last "duration"] | 
[-s "time_stamp" -e "time_stamp"] [-i interval] [-v]] [-h]

Parameters

Table J-2 oclumon dumpnodeview Command Parameters

Parameter Description
-allnodes

Use this option to dump the node views of all the nodes in the cluster.

-n node1 node2

Specify one node (or several nodes in a space-delimited list) for which you want to dump the node view.

-last "duration"

Use this option to specify a time, given in HH24:MM:SS format surrounded by double quotation marks (""), to retrieve the last metrics. For example:

"23:05:00"
-s "time_stamp" -e "time_stamp"

Use the -s option to specify a time stamp from which to start a range of queries and use the -e option to specify a time stamp to end the range of queries. Specify time in YYYY-MM-DD HH24:MM:SS format surrounded by double quotation marks ("").

"2011-05-10 23:05:00"

Note: You must specify these two options together to obtain a range.

-i interval

Specify a collection interval, in five-second increments.

-v

Displays verbose node view output.

-h

Displays online help for the oclumon dumpnodeview command.

Usage Notes

  • In certain circumstances, data can be delayed for some time before it is replayed by this command. For example, the crsctl stop cluster -all command can cause data delay. After running crsctl start cluster -all, it may take several minutes before oclumon dumpnodeview shows any data collected during the interval.

  • The default is to continuously dump node views. To stop continuous display, use Ctrl+C on Linux and Windows.

  • Both the local system monitor service (osysmond) and the cluster logger service (ologgerd) must be running to obtain node view dumps.

Examples

The following example dumps node views from node1, node2, and node3 collected over the last twelve hours:

$ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00"

The following example displays node views from all nodes collected over the last fifteen minutes at a 30 second interval:

$ oclumon dumpnodeview -allnodes -last "00:15:00" -i 30

Metric Descriptions

This section includes descriptions of the metrics in each of the seven views that comprise a node view listed in the following tables.

Table J-3 SYSTEM View Metric Descriptions

Metric Description
#pcpus

The number of physical CPUs

#vcpus

Number of logical compute units

cpuht

CPU hyperthreading enabled (Y) or disabled (N)

chipname

The name of the CPU vendor

cpu

Average CPU utilization per processing unit within the current sample interval (%).

cpuq

Number of processes waiting in the run queue within the current sample interval

physmemfree

Amount of free RAM (KB)

physmemtotal

Amount of total usable RAM (KB)

mcache

Amount of physical RAM used for file buffers plus the amount of physical RAM used as cache memory (KB)

On Windows systems, this is the number of bytes currently being used by the file system cache

Note: This metric is not available on Solaris.

swapfree

Amount of swap memory free (KB)

swaptotal

Total amount of physical swap memory (KB)

hugepagetotal

Total size of huge in KB

Note: This metric is not available on Solaris or Windows systems.

hugepagefree

Free size of huge page in KB

Note: This metric is not available on Solaris or Windows systems.

hugepagesize

Smallest unit size of huge page

Note: This metric is not available on Solaris or Windows systems.

ior

Average total disk read rate within the current sample interval (KB per second)

iow

Average total disk write rate within the current sample interval (KB per second)

ios

I/O operation average time to serve I/O request

swpin

Average swap in rate within the current sample interval (KB per second)

Note: This metric is not available on Windows systems.

swpout

Average swap out rate within the current sample interval (KB per second)

Note: This metric is not available on Windows systems.

pgin

Average page in rate within the current sample interval (pages per second)

pgout

Average page out rate within the current sample interval (pages per second)

netr

Average total network receive rate within the current sample interval (KB per second)

netw

Average total network send rate within the current sample interval (KB per second)

procs

Number of processes

procsoncpu

The current number of processes running on the CPU

rtprocs

Number of real-time processes

rtprocsoncpu

The current number of real-time processes running on the CPU

#fds

Number of open file descriptors

or

Number of open handles on Windows

#sysfdlimit

System limit on number of file descriptors

Note: This metric is not available on either Solaris or Windows systems.

#disks

Number of disks

#nics

Number of network interface cards

nicErrors

Average total network error rate within the current sample interval (errors per second)

Table J-4 PROCESSES View Metric Descriptions

Metric Description
name

The name of the process executable

pid

The process identifier assigned by the operating system

#procfdlimit

Limit on number of file descriptors for this process

Note: This metric is not available on Windows, AIX, and HP-UX systems.

cpuusage

Process CPU utilization (%)

Note: The utilization value can be up to 100 times the number of processing units.

privmem

Process private memory usage (KB)

shm

Process shared memory usage (KB)

Note: This metric is not available on Windows, Solaris, and AIX systems.

workingset

Working set of a program (KB)

Note: This metric is only available on Windows.

#fd

Number of file descriptors open by this process

or

Number of open handles by this process on Windows

#threads

Number of threads created by this process

priority

The process priority

nice

The nice value of the process

Note: This metric is not applicable to Windows systems.

state

The state of the process

Note: This metric is not applicable to Windows systems.

Table J-5 DEVICES View Metric Descriptions

Metric Description
ior

Average disk read rate within the current sample interval (KB per second)

iow

Average disk write rate within the current sample interval (KB per second)

ios

Average disk I/O operation rate within the current sample interval (I/O operations per second)

qlen

Number of I/O requests in wait state within the current sample interval

wait

Average wait time per I/O within the current sample interval (msec)

type

If applicable, identifies what the device is used for. Possible values are SWAP, SYS, OCR, ASM, and VOTING.

Table J-6 NICS View Metric Descriptions

Metric Description
netrr

Average network receive rate within the current sample interval (KB per second)

netwr

Average network sent rate within the current sample interval (KB per second)

neteff

Average effective bandwidth within the current sample interval (KB per second)

nicerrors

Average error rate within the current sample interval (errors per second)

pktsin

Average incoming packet rate within the current sample interval (packets per second)

pktsout

Average outgoing packet rate within the current sample interval (packets per second)

errsin

Average error rate for incoming packets within the current sample interval (errors per second)

errsout

Average error rate for outgoing packets within the current sample interval (errors per second)

indiscarded

Average drop rate for incoming packets within the current sample interval (packets per second)

outdiscarded

Average drop rate for outgoing packets within the current sample interval (packets per second)

inunicast

Average packet receive rate for unicast within the current sample interval (packets per second)

type

Whether PUBLIC or PRIVATE

innonunicast

Average packet receive rate for multi-cast (packets per second)

latency

Estimated latency for this network interface card (msec)

Table J-7 FILESYSTEMS View Metric Descriptions

Metric Description
total

Total amount of space (KB)

mount

Mount point

type

File system type, whether local file system, NFS, or other

used

Amount of used space (KB)

available

Amount of available space (KB)

used%

Percentage of used space (%)

ifree%

Percentage of free file nodes (%)

Note: This metric is not available on Windows systems.

Table J-8 PROTOCOL ERRORS View Metric Descriptions

Metric Description
IPHdrErr

Number of input datagrams discarded due to errors in their IPv4 headers

IPAddrErr

Number of input datagrams discarded because the IPv4 address in their IPv4 header's destination field was not a valid address to be received at this entity

IPUnkProto

Number of locally-addressed datagrams received successfully but discarded because of an unknown or unsupported protocol

IPReasFail

Number of failures detected by the IPv4 reassembly algorithm

IPFragFail

Number of IPv4 discarded datagrams due to fragmentation failures

TCPFailedConn

Number of times that TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times that TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state

TCPEstRst

Number of times that TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state

TCPRetraSeg

Total number of TCP segments retransmitted

UDPUnkPort

Total number of received UDP datagrams for which there was no application at the destination port

UDPRcvErr

Number of received UDP datagrams that could not be delivered for reasons other than the lack of an application at the destination port

Table J-9 CPUS View Metric Descriptions

Metric Description
cpuid

Virtual CPU

sys-usage

CPU usage in system space

user-usage

CPU usage in user space

nice

Value of NIC for a specific CPU

usage

CPU usage for a specific CPU

iowait

CPU wait time for I/O operations

Example J-1 Sample Node View

----------------------------------------
Node: rwsak10 Clock: '14-04-16 18.47.25 PST8PDT' SerialNo:155631
----------------------------------------

SYSTEM:
#pcpus: 2 #vcpus: 24 cpuht: Y chipname: Intel(R) cpu: 1.23 cpuq: 0
physmemfree: 8889492 physmemtotal: 74369536 mcache: 55081824 swapfree: 18480404
swaptotal: 18480408 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 132
iow: 236 ios: 23 swpin: 0 swpout: 0 pgin: 131 pgout: 235 netr: 72.404
netw: 97.511 procs: 969 procsoncpu: 6 rtprocs: 62 rtprocsoncpu N/A #fds: 32640
#sysfdlimit: 6815744 #disks: 9 #nics: 5 nicErrors: 0

TOP CONSUMERS:
topcpu: 'osysmond.bin(30981) 2.40' topprivmem: 'oraagent.bin(14599) 682496'
topshm: 'ora_dbw2_oss_3(7049) 2156136' topfd: 'ocssd.bin(29986) 274'
topthread: 'java(32255) 53'

CPUS:

cpu18: sys-2.93 user-2.15 nice-0.0 usage-5.8 iowait-0.0 steal-0.0
.
.
.

PROCESSES:

name: 'osysmond.bin' pid: 30891 #procfdlimit: 65536 cpuusage: 2.40 privmem: 35808
shm: 81964 #fd: 119 #threads: 13 priority: -100 nice: 0 state: S
.
.
.

DEVICES:

sdi ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS
sda1 ior: 0.000 iow: 61.495 ios: 629 qlen: 0 wait: 0 type: SYS
.
.
.

NICS:

lo netrr: 39.935  netwr: 39.935  neteff: 79.869  nicerrors: 0 pktsin: 25
pktsout: 25  errsin: 0  errsout: 0  indiscarded: 0  outdiscarded: 0
inunicast: 25 innonunicast: 0  type: PUBLIC
eth0 netrr: 1.412  netwr: 0.527  neteff: 1.939  nicerrors: 0 pktsin: 15
pktsout: 4  errsin: 0  errsout: 0  indiscarded: 0  outdiscarded: 0
inunicast: 15  innonunicast: 0  type: PUBLIC  latency: <1

FILESYSTEMS:

mount: / type: rootfs total: 563657948 used: 78592012 available: 455971824
used%: 14 ifree%: 99 GRID_HOME
.
.
.

PROTOCOL ERRORS:

IPHdrErr: 0 IPAddrErr: 0 IPUnkProto: 0 IPReasFail: 0 IPFragFail: 0
TCPFailedConn: 5197 TCPEstRst: 717163 TCPRetraSeg: 592 UDPUnkPort: 103306
UDPRcvErr: 70