We can also easily detect when we are experiencing a busy
CPU on the Oracle database server. Whenever the us (user) plus sy (system)
times approach 100, the CPUs are operating at full capacity as shown in the
next listing.
Also see tips for 100% CPU utilization.
Please note that it is not uncommon to see the CPU
approach 100 percent even when the server is not overwhelmed with work. This is
because the UNIX internal dispatchers will always attempt to keep the CPUs as
busy as possible. This maximizes task throughput, but it can be misleading for
a neophyte.
Remember, it is not a cause for concern when the user +
system CPU values approach 100 percent. This just means that the CPUs are
working to their full potential. The only metric that identifies a CPU
bottleneck is when the run queue (r value) exceeds the number of CPUs on the
server.
vmstat
5 1
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 20 75 3 2
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 20 75 3 2
Now let's take a look at measuring the memory utilization
on the server.
Identifying RAM Memory Bottlenecks
Contention for random access memory (RAM) has always been
a problem for Oracle. All database servers have a limited amount of available
RAM, and this RAM must be shared between the Oracle database and all external
sessions that connect to the server and consume RAM in their Program Global
Area (PGA).
Understanding UNIX RAM memory paging
As we have noted in chapter 2, most Oracle DBA’s rely on
the pi column in vmstat to signal when the server is
swapping RAM memory. However, there is more to the story.
There are times when the pi column will be non-zero, even
though there is no real RAM swapping. To illustrate this let’s take a
simple example. Suppose that we invoke a 20 Megabyte Oracle executable program,
such as a Pro*C program. We don't need to load all 20MB of the executable into
RAM all at once. Rather, we just want to load those pieces of the
executable code that require immediate execution. Hence, UNIX will use memory
frames as necessary later and rely on the principle of spatial locality to
minimize the amount of pages in our RAM working set.
To manage the memory segments, the UNIX kernel builds a
memory map of the entire program when it starts. Included in this map is a note
on whether the storage is “in memory” or “on swap disk.”
As the program starts, it begins accessing some of its
pages that have never been loaded into RAM memory. Hence, you may
see vmstat page-in’s when a large number of programs are starting and
allocating their RAM memory.
During normal operation we may see various points in time
when paging in happens a lot and this is not always a cause for concern.
Remember, a UNIX process may page-in when the UNIX program is starting or is
accessing parts of its code that it had not used before.
Paging out (the po column in vmstat) happens frequently as
UNIX prepares for the possibility of a page-in. With UNIX virtual memory we are
always anticipating running out of RAM memory, and a page-out is a method for
being ready for a subsequent page-in. Also, as UNIX processes end they call the
free() system call to free the RAM pages so they can be used by new processes.
Internals of RAM memory paging
So if RAM paging in pi may be acceptable and paging out po
may be acceptable, how do we tell when the RAM on a server is
overstressed and swapping? One answer is to correlate the UNIX scan rate with
page-in operations. When an Oracle server begins to run low on RAM
memory, the page stealing daemon process awakens and UNIX begins to treat the
RAM memory as a sharable resource, moving memory frames to the swap disk with
paging operations.
The page-stealing daemon operates in two modes. When
RAM memory shortages are not critical, the daemon will steal small chunks of
least-recently-used RAM memory from a program. As RAM resource demands
continue to increase, the page-stealing daemon escalates and begins to page-out
entire programs RAM regions. In short, we cannot always tell if the page-in
operations that we see are normal housekeeping or a serious memory shortage
unless we correlate the activity of the page stealing daemon with the page-in
output.
To aid in this, the vmstat utility gives the sr column to
designate the memory page scan rate. If we see the scan rate rising steadily,
we will have hit the page-stealing daemon’s first threshold, indicating that
entire program’s RAM memory regions are being paged-out to the swap disk.
As the process continues, we will begin to see high page-in numbers as the
entire process in paged-back into RAM memory.
Carefully review the list below from HP/UX vmstat.
The scan rate is the furthest right column, and here we see the value of sr rising
steadily as the page stealing daemon prepares for a page-in. As the sr
value peaks, we see the page-in operation (pi) as the real RAM memory on
the Oracle server is exceeded.
root>
vmstat 2
procs memory page
r b w avm free re at pi po fr de sr
3 0 0 144020 12778 17 9 0 14 29 0 3
3 0 0 144020 12737 15 0 1 34 4 0 8
3 0 0 144020 12360 9 0 1 46 2 0 13
1 0 0 142084 12360 5 0 3 17 0 0 21
1 0 0 142084 12360 3 0 8 0 0 0 8
1 0 0 140900 12360 1 0 10 0 0 0 0
1 0 0 140900 12360 0 0 9 0 0 0 0
1 0 0 140900 12204 0 0 3 0 0 0 0
1 0 0 137654 12204 0 0 0 0 0 0 0
procs memory page
r b w avm free re at pi po fr de sr
3 0 0 144020 12778 17 9 0 14 29 0 3
3 0 0 144020 12737 15 0 1 34 4 0 8
3 0 0 144020 12360 9 0 1 46 2 0 13
1 0 0 142084 12360 5 0 3 17 0 0 21
1 0 0 142084 12360 3 0 8 0 0 0 8
1 0 0 140900 12360 1 0 10 0 0 0 0
1 0 0 140900 12360 0 0 9 0 0 0 0
1 0 0 140900 12204 0 0 3 0 0 0 0
1 0 0 137654 12204 0 0 0 0 0 0 0
However, before we go into detecting if your server memory
is exceeded, we must first give you a tool for determining how much memory you
have available on your server. Next is a command that you can issue to see how
much RAM exists on your server.
Display RAM Size on DEC-UNIX
In DEC-UNIX, you can use the uerf command in
conjunction with grep to display memory size. For example:
L 5-11
uerf
-r 300 | grep -i mem
Here, the output of the uerf command is piped to
grep to filter out and display the segments relating to memory. The –i option
causes grep to find both uppercase and lowercase strings. With respect to the
example shown here, grep –i mem looks for both “MEM” and “mem.”
Display RAM Size on HP/UX
In HP/UX, the dmesg command can display memory
information:
L 5-12
dmesg
Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 5242880 Kbytes,lockable: 4051216 Kbytes,available: 4651796 Kbytes
Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 5242880 Kbytes,lockable: 4051216 Kbytes,available: 4651796 Kbytes
Display RAM Size on AIX
In IBM's AIX dialect of UNIX, you must issue two separate
commands. You start with the lsdev command followed by the lsattr
command to display the amount of memory on a server. First, execute lsdev
to list all devices. Then pipe that output through grep to filter out
everything not related to memory. That will get you the name of the memory
devices that are installed. For example:
L 5-13
>lsdev
-C|grep mem
mem0 Available 00-00 Memory
mem0 Available 00-00 Memory
Here you can see that mem0 is the name of the memory
device. Now that you have the name, you can issue the lsattr –El command
to see the amount of memory on the server. In the following example, the server
has 3GB of RAM installed.
L 5-14
>lsattr
-El mem0
size 3064 Total amount of physical memory …
size 3064 Total amount of physical memory …
You must issue the lsattr –El command separately
for each memory device.
Display RAM Size on Solaris
The prtconf command can also be used on all Solaris
servers to quickly see the amount of available memory:
L 5-15
>prtconf|grep
-i mem
Memory size: 2048 Megabytes
memory (driver not attached)
virtual-memory (driver not attached)
Memory size: 2048 Megabytes
memory (driver not attached)
virtual-memory (driver not attached)
Display RAM Size in Linux
In Linux, the free command can be used to quickly
display the amount of RAM memory on the server:
L 5-16
>free
total used free shared buffers cached
Mem: 3728668 504688 3223980 41316 430072 29440
-/+ buffers/cache: 45176 3683492
Swap: 265032 608 264424
total used free shared buffers cached
Mem: 3728668 504688 3223980 41316 430072 29440
-/+ buffers/cache: 45176 3683492
Swap: 265032 608 264424
Using the Top Utility for Displaying RAM
While each dialect has unique mechanisms and commands to
display RAM, most UNIX servers support the top utility. The top utility is
invoked by issuing the top command 77777777from the UNIX prompt. This
will display the total amount of RAM available, and will also show the usage of
both RAM and virtual memory:
L 5-17
root>
top
9:43am up 16 days, 22:33, 24 users, load average: 0.00, 0.00, 0.00
123 processes: 122 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 0.9% system, 0.0% nice, 99.0% idle
Mem: 257568K av, 244988K used, 12580K free, 88732K shrd, 179772K buff
Swap: 530104K av, 9972K used, 520132K free 15452K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
22417 oracle 14 0 904 904 668 R 0 0.9 0.3 0:00 top
1 root 1 0 160 120 88 S 0 0.0 0.0 0:11 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd
3 root 0 0 0 0 0 SW 0 0.0 0.0 0:08 kupdate
4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kpiod
9:43am up 16 days, 22:33, 24 users, load average: 0.00, 0.00, 0.00
123 processes: 122 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 0.9% system, 0.0% nice, 99.0% idle
Mem: 257568K av, 244988K used, 12580K free, 88732K shrd, 179772K buff
Swap: 530104K av, 9972K used, 520132K free 15452K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
22417 oracle 14 0 904 904 668 R 0 0.9 0.3 0:00 top
1 root 1 0 160 120 88 S 0 0.0 0.0 0:11 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd
3 root 0 0 0 0 0 SW 0 0.0 0.0 0:08 kupdate
4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kpiod
In this example, we see from the top utility that we have
257,568 kilobytes of RAM on the server. The top utility is also great for
seeing the top CPU consumer tasks on your server.
Using Glance to See Memory
In HP/UX and Solaris, you
can enter the glance utility in order to see the amount of RAM available. The
glance utility displays a screen showing CPU and memory utilization for the
system as a whole, and for individual processes (see Figure 5-3). For more
information on glance, look to the man pages on your UNIX server. Glance is
started by entering glance from the UNIX prompt, and exited by entering ctrl-c.
Figure 5-25: A sample glance screen
Glance is a great tool for reactive tuning because it
shows the current status of your Oracle server, along with consumption details
for the CPU and RAM.
RAM Memory and the Swap Disk
Now that we know the amount
of RAM on our server, we are ready to investigate the RAM and swap disk usage
for the Oracle server. Whenever the memory demands of the server exceed that
amount of RAM, the virtual memory facility is invoked. With virtual memory,
segments of RAM are moved onto a special disk segment called the swap disk.
The swap disk is a special segment of disk defined by the systems administrator
to hold excess RAM memory contents. The virtual memory system commonly
pages-out memory segments, and this is not an indicator of a memory problem.
However, a page-in operation indicates that the server has exceeded the amount
of available RAM and is recalling memory segments from the swap disk (see
Figure 5-4).
Figure 5-26: RAM memory paging from the swap disk
Swapping (pi) slows down a server because it takes a large
amount of time to copy memory segments from the swap disk back into RAM. On an
Oracle database server, the solution to a page-in problem involves:
- Smaller SGAReduce the demand for RAM by making the SGA smaller. The SGA can be made smaller by reducing the db_block_buffers on Oracle8i and earlier versions, db_cache_size, sga_max_size, db_xK_cache_size on Oracle9i or shared_pool_size or java_pool_size init.ora parameters.
- More RAM memoryAdd additional RAM memory to the server. (Remember that some 32-bit versions of Oracle cannot use more than 1.7GB of RAM.)
- Reduce RAM demandReduce RAM consumption of a database server by reducing the demands on Program Global Area (PGA) memory. Oracle parameters such as sort_area_size can greatly increase the amount of RAM allocated to each connected user's PGA.
A memory-bound database server always experiences paging
from the swap disk. This is displayed in the vmstat utility as the po
(page-out) and pi (page-in) columns of vmstat. Here we see that the database
server is experiencing five page-in and nine page-out operations. You can tell
that a server is having excessive memory requests by looking at page-in
operations.
L 5-18
Root>
vmstat 1 2
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 218094 166 0 4 0 4 16 0 202 301 211 14 19 45 22
0 0 218094 166 0 5 9 4 14 0 202 301 211 14 19 45 22
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 218094 166 0 4 0 4 16 0 202 301 211 14 19 45 22
0 0 218094 166 0 5 9 4 14 0 202 301 211 14 19 45 22
In sum, page-out (po) operations are a normal part of
virtual memory operation, but page-in (pi) operations indicate that the server
has excessive RAM demands. Now let's take a look at a vmstat metric that is
sometimes associated with a disk I/O bottleneck.
Detecting Potential I/O Bottlenecks in AIX
Using IBM's AIX dialect of UNIX, an I/O bound database
server is usually evidenced by a high value in the wa (wait) column of vmstat.
In the next listing, we see that 22 percent of the CPU time is being used
waiting for database I/O.
L 5-19
root>
vmstat
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 14 19 45 22
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 14 19 45 22
It is important to note that a high wait value does not
always signal an I/O bottleneck. The CPU wait could be due to other factors,
and this metric simply means that the CPU is waiting for external OS services.
Now that we understand the basics of the vmstat utility,
let's explore an easy method for capturing vmstat data, storing it inside an
Oracle table, and producing server performance reports.
No comments:
Post a Comment