Wednesday 17 July 2013

Identifying High CPU Usage with vmstat

We can also easily detect when we are experiencing a busy CPU on the Oracle database server. Whenever the us (user) plus sy (system) times approach 100, the CPUs are operating at full capacity as shown in the next listing.
Also see tips for 100% CPU utilization.
Please note that it is not uncommon to see the CPU approach 100 percent even when the server is not overwhelmed with work. This is because the UNIX internal dispatchers will always attempt to keep the CPUs as busy as possible. This maximizes task throughput, but it can be misleading for a neophyte.
Remember, it is not a cause for concern when the user + system CPU values approach 100 percent. This just means that the CPUs are working to their full potential. The only metric that identifies a CPU bottleneck is when the run queue (r value) exceeds the number of CPUs on the server.
vmstat 5 1

kthr     memory             page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  0 217485   386  0   0   0   4   14   0 202  300 210 20 75  3  2
Now let's take a look at measuring the memory utilization on the server.

Identifying RAM Memory Bottlenecks

Contention for random access memory (RAM) has always been a problem for Oracle. All database servers have a limited amount of available RAM, and this RAM must be shared between the Oracle database and all external sessions that connect to the server and consume RAM in their Program Global Area (PGA).

Understanding UNIX RAM memory paging

As we have noted in chapter 2, most Oracle DBA’s rely on the  pi column in vmstat to signal when the server is swapping RAM memory.  However, there is more to the story.
There are times when the pi column will be non-zero, even though there is no real RAM swapping.  To illustrate this let’s take a simple example. Suppose that we invoke a 20 Megabyte Oracle executable program, such as a Pro*C program. We don't need to load all 20MB of the executable into RAM all at once.  Rather, we just want to load those pieces of the executable code that require immediate execution. Hence, UNIX will use memory frames as necessary later and rely on the principle of spatial locality to minimize the amount of pages in our RAM working set.
To manage the memory segments, the UNIX kernel builds a memory map of the entire program when it starts. Included in this map is a note on whether the storage is “in memory” or “on swap disk.”  
As the program starts, it begins accessing some of its pages that have never been loaded into RAM memory.   Hence, you may see vmstat page-in’s when a large number of programs are starting and allocating their RAM memory.
During normal operation we may see various points in time when paging in happens a lot and this is not always a cause for concern.  Remember, a UNIX process may page-in when the UNIX program is starting or is accessing parts of its code that it had not used before.
Paging out (the po column in vmstat) happens frequently as UNIX prepares for the possibility of a page-in. With UNIX virtual memory we are always anticipating running out of RAM memory, and a page-out is a method for being ready for a subsequent page-in. Also, as UNIX processes end they call the free() system call to free the RAM pages so they can be used by new processes.

Internals of RAM memory paging

So if RAM paging in pi may be acceptable and paging out po may be  acceptable, how do we tell when the RAM on a server is overstressed and swapping? One answer is to correlate the UNIX scan rate with page-in operations.  When an Oracle server begins to run low on RAM memory, the page stealing daemon process awakens and UNIX begins to treat the RAM memory as a sharable resource, moving memory frames to the swap disk with paging operations.
The page-stealing daemon operates in two modes.  When RAM memory shortages are not critical, the daemon will steal small chunks of least-recently-used RAM memory from a program.  As RAM resource demands continue to increase, the page-stealing daemon escalates and begins to page-out entire programs RAM regions. In short, we cannot always tell if the page-in operations that we see are normal housekeeping or a serious memory shortage unless we correlate the activity of the page stealing daemon with the page-in output.
To aid in this, the vmstat utility gives the sr column to designate the memory page scan rate. If we see the scan rate rising steadily, we will have hit the page-stealing daemon’s first threshold, indicating that entire program’s RAM memory regions are being paged-out to the swap disk.  As the process continues, we will begin to see high page-in numbers as the entire process in paged-back into RAM memory.
Carefully review the list below from HP/UX vmstat.  The scan rate is the furthest right column, and here we see the value of sr rising steadily as the page stealing daemon prepares for a page-in.  As the sr value peaks, we see the page-in operation (pi) as the real RAM memory on the Oracle server is exceeded.
root> vmstat 2
         procs           memory                   page                       
    r     b     w      avm    free   re   at    pi   po    fr   de    sr   
    3     0     0   144020   12778   17    9     0   14    29    0     3 
    3     0     0   144020   12737   15    0     1   34     4    0     8 
    3     0     0   144020   12360    9    0     1   46     2    0    13 
    1     0     0   142084   12360    5    0     3   17     0    0    21 
    1     0     0   142084   12360    3    0     8    0     0    0     8 
    1     0     0   140900   12360    1    0    10    0     0    0     0 
    1     0     0   140900   12360    0    0     9    0     0    0     0 
    1     0     0   140900   12204    0    0     3    0     0    0     0 
    1     0     0   137654   12204    0    0     0    0     0    0     0 
However, before we go into detecting if your server memory is exceeded, we must first give you a tool for determining how much memory you have available on your server. Next is a command that you can issue to see how much RAM exists on your server.

Display RAM Size on DEC-UNIX

In DEC-UNIX, you can use the uerf command in conjunction with grep to display memory size. For example:
L 5-11                                                                                    
 uerf -r 300 | grep -i mem
Here, the output of the uerf command is piped to grep to filter out and display the segments relating to memory. The –i option causes grep to find both uppercase and lowercase strings. With respect to the example shown here, grep –i mem looks for both “MEM” and “mem.”

Display RAM Size on HP/UX

In HP/UX, the dmesg command can display memory information:
L 5-12
dmesg
Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 5242880 Kbytes,lockable: 4051216 Kbytes,available: 4651796 Kbytes

Display RAM Size on AIX

In IBM's AIX dialect of UNIX, you must issue two separate commands. You start with the lsdev command followed by the lsattr command to display the amount of memory on a server. First, execute lsdev to list all devices. Then pipe that output through grep to filter out everything not related to memory. That will get you the name of the memory devices that are installed. For example:
L 5-13
>lsdev -C|grep mem

mem0       Available 00-00           Memory
Here you can see that mem0 is the name of the memory device. Now that you have the name, you can issue the lsattr –El command to see the amount of memory on the server. In the following example, the server has 3GB of RAM installed.
L 5-14
>lsattr -El mem0

size     3064 Total amount of physical memory …
You must issue the lsattr –El command separately for each memory device.

Display RAM Size on Solaris

The prtconf command can also be used on all Solaris servers to quickly see the amount of available memory:
L 5-15
>prtconf|grep -i mem

Memory size: 2048 Megabytes
    memory (driver not attached)
    virtual-memory (driver not attached)

Display RAM Size in Linux

In Linux, the free command can be used to quickly display the amount of RAM memory on the server:
L 5-16
>free
             total       used       free     shared    buffers     cached
Mem:       3728668     504688    3223980      41316     430072      29440
-/+ buffers/cache:      45176    3683492
Swap:       265032        608     264424

Using the Top Utility for Displaying RAM

While each dialect has unique mechanisms and commands to display RAM, most UNIX servers support the top utility. The top utility is invoked by issuing the top command 77777777from the UNIX prompt. This will display the total amount of RAM available, and will also show the usage of both RAM and virtual memory:
L 5-17
root> top

  9:43am  up 16 days, 22:33, 24 users,  load average: 0.00, 0.00, 0.00
123 processes: 122 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.0% user,  0.9% system,  0.0% nice, 99.0% idle
Mem:   257568K av, 244988K used, 12580K free, 88732K shrd,  179772K buff
Swap:  530104K av,   9972K used, 520132K free              15452K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
22417 oracle    14   0   904  904   668 R       0  0.9  0.3   0:00 top
    1 root       1   0   160  120    88 S       0  0.0  0.0   0:11 init
    2 root       0   0     0    0     0 SW      0  0.0  0.0   0:00 kflushd
    3 root       0   0     0    0     0 SW      0  0.0  0.0   0:08 kupdate
    4 root       0   0     0    0     0 SW      0  0.0  0.0   0:00 kpiod
In this example, we see from the top utility that we have 257,568 kilobytes of RAM on the server. The top utility is also great for seeing the top CPU consumer tasks on your server.

Using Glance to See Memory

In HP/UX and Solaris, you can enter the glance utility in order to see the amount of RAM available. The glance utility displays a screen showing CPU and memory utilization for the system as a whole, and for individual processes (see Figure 5-3). For more information on glance, look to the man pages on your UNIX server. Glance is started by entering glance from the UNIX prompt, and exited by entering ctrl-c.
Figure 5-25: A sample glance screen
Glance is a great tool for reactive tuning because it shows the current status of your Oracle server, along with consumption details for the CPU and RAM.

RAM Memory and the Swap Disk

Now that we know the amount of RAM on our server, we are ready to investigate the RAM and swap disk usage for the Oracle server. Whenever the memory demands of the server exceed that amount of RAM, the virtual memory facility is invoked. With virtual memory, segments of RAM are moved onto a special disk segment called the swap disk. The swap disk is a special segment of disk defined by the systems administrator to hold excess RAM memory contents. The virtual memory system commonly pages-out memory segments, and this is not an indicator of a memory problem. However, a page-in operation indicates that the server has exceeded the amount of available RAM and is recalling memory segments from the swap disk (see Figure 5-4).
Figure 5-26: RAM memory paging from the swap disk
Swapping (pi) slows down a server because it takes a large amount of time to copy memory segments from the swap disk back into RAM. On an Oracle database server, the solution to a page-in problem involves:
  • Smaller SGAReduce the demand for RAM by making the SGA smaller. The SGA can be made smaller by reducing the db_block_buffers on Oracle8i and earlier versions, db_cache_size, sga_max_size, db_xK_cache_size on Oracle9i or shared_pool_size or java_pool_size init.ora parameters.
  • More RAM memoryAdd additional RAM memory to the server. (Remember that some 32-bit versions of Oracle cannot use more than 1.7GB of RAM.)
  • Reduce RAM demandReduce RAM consumption of a database server by reducing the demands on Program Global Area (PGA) memory. Oracle parameters such as sort_area_size can greatly increase the amount of RAM allocated to each connected user's PGA.
A memory-bound database server always experiences paging from the swap disk. This is displayed in the vmstat utility as the po (page-out) and pi (page-in) columns of vmstat. Here we see that the database server is experiencing five page-in and nine page-out operations. You can tell that a server is having excessive memory requests by looking at page-in operations.
L 5-18
Root> vmstat 1 2

kthr     memory             page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
0  0 218094   166   0  4   0   4   16   0 202  301 211 14 19 45 22
0  0 218094   166   0  5   9   4   14   0 202  301 211 14 19 45 22
In sum, page-out (po) operations are a normal part of virtual memory operation, but page-in (pi) operations indicate that the server has excessive RAM demands. Now let's take a look at a vmstat metric that is sometimes associated with a disk I/O bottleneck.

Detecting Potential I/O Bottlenecks in AIX

Using IBM's AIX dialect of UNIX, an I/O bound database server is usually evidenced by a high value in the wa (wait) column of vmstat. In the next listing, we see that 22 percent of the CPU time is being used waiting for database I/O.
L 5-19
root> vmstat
kthr     memory             page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
0  0 217485   386  0   0   0   4   14   0 202  300 210 14 19 45 22
It is important to note that a high wait value does not always signal an I/O bottleneck. The CPU wait could be due to other factors, and this metric simply means that the CPU is waiting for external OS services.

Now that we understand the basics of the vmstat utility, let's explore an easy method for capturing vmstat data, storing it inside an Oracle table, and producing server performance reports.

No comments:

Post a Comment