常见latch的争用分析以及解决办法
一、buffer cache 中的CBC latch(cache buffer chains),这表名在buffer中查找数据块时发生了latch争用,则有两种可能,一种是系统的latch确实不够用,二是大量进程集中访问某些数据块(热点快)导致latch争用,现实中可能大多是这种情况;怎么判断到底是那种情况导致latch争用呢?我们可以查询v$latch_children视图,如果发现latch平均分布在各个子latch中说明latch不够可以增加cpu(增加CPU可以增加latch)或修改隐藏参数,如果分布不均匀说明出现热点块,解决热点块可以通过下面的两个方法:1、对这个热点快对象进行分区;2、增加热点快对象的pctfree,把块分散开;为什么分区和增加pctfree可以解决热点块的问题呢? 原因如下:对表分区:假如一个表中有city字段,某个块中有北京、天津、青岛三个city的数据,这时有三个session分别访问这个块中三个城市的数据行,哪么这个块就会被三个session访问,如果对这张表city列进行范围分区,哪么北京、天津、青岛的数据肯定分布在不同的分区(不同的数据块)中,这时三个session就会访问不同的数据块了,从而解决热点块的问题;
增加pctfree:假如原pctfree由10%增加到20%,哪么数据块中用来存放数据的空间就减少10%,原来存放在这10%中的数据就会放到其他的数据块中,这样假如原来的session访问这10%的数据此时就会访问其他的数据块,从而解决热点块的问题;
解决热点快的流程
1、查找v$system_event查看系统等待事件,这里假设发现cbc latch争用严重
2、查找v$latch_children找到那些子latch争用比较厉害
select addr,name,gets,spin_gets,sleeps,hash from v$latch where name = 'cache buffers chains' order by sleeps
3、查找热点块属于那个对象
select t1.object_name,t2.file#,t2.dbablk
from dba_objects t1,(select file#,dbablk,obj,addr,hladdr from x$bh where hladdr = '?') t2 where t1.object_id = t2.obj;//?就是上一步查到的addr
4、定位导致热点快的SQL语句
select sql_text,executions from v$sqlarea where sql_text like '%?%';//?就是上一步查询到的object_name
我们根据SQL语句的执行次数就应该能够判断出是那些SQL语句导致的热点快
模拟发生热点块时的检查步骤:
SQL> select rowid,ename from emp;
ROWID ENAME
------------------ ----------
AAACYVAAEAAAAA6AAA SMITH
AAACYVAAEAAAAA6AAB ALLEN
AAACYVAAEAAAAA6AAC WARD
AAACYVAAEAAAAA6AAD JONES
AAACYVAAEAAAAA6AAE MARTIN
AAACYVAAEAAAAA6AAF BLAKE
AAACYVAAEAAAAA6AAG CLARK
AAACYVAAEAAAAA6AAH SCOTT
AAACYVAAEAAAAA6AAI KING //查询这行数据
AAACYVAAEAAAAA6AAJ TURNER
AAACYVAAEAAAAA6AAK ADAMS
AAACYVAAEAAAAA6AAL JAMES
AAACYVAAEAAAAA6AAM FORD
AAACYVAAEAAAAA6AAN MILLER
14 rows selected.
--find.sql
declare
name varchar2(100);
begin
loop
select ename into name from emp where rowid = 'AAACYVAAEAAAAA6AAI';
end loop;
end;
/
exit;
--run.sql
sqlplus -s scott/tiger@PROD @find.sql &
sqlplus -s scott/tiger@PROD @find.sql &
sqlplus -s scott/tiger@PROD @find.sql
chmod +x run.sql
select event,TOTAL_WAITS,TIME_WAITED from v$system_event where wait_class!='Idle' order by TIME_WAITED;
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: cache buffers chains 99 164
运行一段时间后
select event,TOTAL_WAITS,TIME_WAITED from v$system_event where wait_class!='Idle' order by TIME_WAITED;
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: cache buffers chains 477 835 //latch争用比较严重
--查看子latch,得到地址
select addr,latch#,child#,name,hash,gets,misses,sleeps,spin_gets from v$latch_children where name = 'cache buffers chains' order by sleeps;
ADDR LATCH# CHILD# NAME HASH GETS MISSES SLEEPSSPIN_GETS
-------- ---------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------
31D476B0 122 870 cache buffers chains 3563305585 6230630 63862 908 63095
--查看热点快属于那个对象
select t1.object_name from dba_objects t1,x$bh t2 where t1.object_id = t2.obj and t2.hladdr='31D476B0';
OBJECT_NAME
-----------------------
I_OBJ1
C_OBJ#_INTCOL#
I_H_OBJ#_COL#
SMON_SCN_TO_TIME
WRI$_ADV_TASKS_IDX_01
I_WRI$_OPTSTAT_H_ST
I_WRI$_OPTSTAT_H_ST
EMP
BIGTAB
BIGTAB
BIGTAB
BIGTAB
WRH$_LATCH
WRH$_WAITSTAT
WRH$_TABLESPACE_STAT
从上面的结果发现emp和bigtab两张表比较可疑,假如现在我们不知道是那张表发生热点快
--查询导致热点快的SQL
select sql_text,executions from v$sqlarea where upper(sql_text) like '%EMP%' or upper(sql_text) like '%BIGTAB%' order by executions;
SQL_TEXT EXECUTIONS
------------------------------------------------------------ ----------
declare name varchar2(100); begin loop select ename 0
into name from emp where rowid = 'AAACYVAAEAAAAA6AAI'; end
loop; end;
select sql_text,executions from v$sqlarea where upper(sql_te 2
xt) like '%EMP%' or upper(sql_text) like '%BIGTAB%' order by
executions
select value$ from props$ where name='DEFAULT_TEMP_TABLESPAC 3
E'
select sql_id,TEMPSEG_SIZE,MAX_MEM_USED,WORK_AREA_SIZE from 3
v$sql_workarea_active
select dbms_metadata.get_ddl('TABLESPACE','TEMP') from dual 3
select sql_id,MAX_TEMPSEG_SIZE,LAST_TEMPSEG_SIZE from v$sql_ 6
workarea order by MAX_TEMPSEG_SIZE,LAST_TEMPSEG_SIZE
select obj#, dataobj#, part#, hiboundlen, hiboundval, ts#, f 184
ile#, block#, pctfree$, pctused$, initrans, maxtrans, flags,
analyzetime, samplesize, rowcnt, blkcnt, empcnt, avgspc, ch
ncnt, avgrln, length(bhiboundval), bhiboundval from tabpart$
where bo# = :1 order by part#
select t.ts#,t.file#,t.block#,nvl(t.bobj#,0),nvl(t.tab#,0),t 16165
.intcols,nvl(t.clucols,0),t.audit$,t.flags,t.pctfree$,t.pctu
sed$,t.initrans,t.maxtrans,t.rowcnt,t.blkcnt,t.empcnt,t.avgs
pc,t.chncnt,t.avgrln,t.analyzetime,t.samplesize,t.cols,t.pro
perty,nvl(t.degree,1),nvl(t.instances,1),t.avgspc_flb,t.flbc
nt,t.kernelcols,nvl(t.trigflag, 0),nvl(t.spare1,0),nvl(t.spa
re2,0),t.spare4,t.spare6,ts.cachedblk,ts.cachehit,ts.logical
read from tab$ t, tab_stats$ ts where t.obj#= :1 and t.obj#
= ts.obj# (+)
SELECT ENAME FROM EMP WHERE ROWID = 'AAACYVAAEAAAAA6AAI' 13374107
至此发现是由于SELECT ENAME FROM EMP WHERE ROWID = 'AAACYVAAEAAAAA6AAI'这个SQL语句导致的CBC latch的争用,下一步就可以按照上面的步骤进行处理了;
二、shared pool latch: 从shared pool中找空闲空间时需要获取,排他latch,如果这个latch比较高,大部分原因是因为SQL没共享,这时一般情况下hard parse也相应比较高,解决方法是查询v$sql或v$sqlarea,查询是那些SQL。
模拟shared pool latch的测试
SQL> select count(*) from bigtab;
COUNT(*)
----------
1000000
SQL> desc bigtab;
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
NAME VARCHAR2(32)
--查看开始时的shared pool情况
select event,TOTAL_WAITS,TIME_WAITED from v$system_event where wait_class!='Idle' order by TIME_WAITED;
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: shared pool 318 161
--运行程序
declare
name varchar2(50);
sqlstr varchar2(100);
begin
for i in 1..1000000000 loop
sqlstr := 'select name from bigtab where id = ' || i;
execute immediate sqlstr into name; //未绑定变量,模拟硬解析
end loop;
end;
/
--再次查看shared pool情况,发现有争用但是不明显
select event,TOTAL_WAITS,TIME_WAITED from v$system_event where wait_class!='Idle' order by TIME_WAITED;
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: shared pool 324 162
--当前数据库设置,sga自动管理,可能是由于sga比较大,SQL有比较简单所以shared pool的争用不明显
SQL> show parameter shared_pool_size
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
shared_pool_size big integer 0
SQL> show parameter sga_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
sga_target big integer 300M
--换一种分析思路,如果shared pool争用比较厉害,那么一般是由于没有使用绑定变量导致大量硬解析造成的,这时我们查询系统硬解析的情况
SQL> select name,value from v$sysstat where name like '%parse%hard%';
NAME VALUE
------------------------------ ----------
parse count (hard) 142897
SQL> /
NAME VALUE
------------------------------ ----------
parse count (hard) 143182 //比较明显
SQL> /
NAME VALUE
------------------------------ ----------
parse count (hard) 143673 //比较明显
--查询是由于那些SQL语句导致的硬解析,从结果中发现有大量的select name from bigtab where id = ?的语句没有使用绑定变量,至此问题已经找到;
select sql_text,executions from v$sqlarea order by sql_text,executions desc;
SQL_TEXT EXECUTIONS
------------------------------------------------------------ ----------
......
select name from bigtab where id = 93510 1
select name from bigtab where id = 93511 1
select name from bigtab where id = 93512 1
select name from bigtab where id = 93513 1
select name from bigtab where id = 93514 1
select name from bigtab where id = 93515 1
.......
三、library cache latch:产生共享SQL和找共享SQL时需要获取,排他latch;
如果shared pool latch不高但是library cache latch很高时表示有很多相同的SQL要执行(软解析),需要到shared_pool的链中查找,这时查询v$sqlarea视图看那个SQL的executions比较大;
模拟library cache latch测试
--把session_cached_cursors设置为0,确保发生软解析
SQL> alter system set session_cached_cursors=0 scope=spfile;
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area314572800 bytes
Fixed Size 1219184 bytes
Variable Size 88081808 bytes
Database Buffers 218103808 bytes
Redo Buffers 7168000 bytes
Database mounted.
Database opened.
SQL> show parameter session_cached_cursors;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
session_cached_cursors integer 0
--表进行建立索引
SQL> select index_name from user_indexes where table_name = 'BIGTAB';
INDEX_NAME
------------------------------
IDX_BIGTAB_ID
--find.sql
declare
str varchar2(50);
n integer;
begin
for i in 1..10 loop
n := floor(dbms_random.value*1000000);
select name into str from bigtab where id = n;
end loop;
end;
/
exit;
--run.sql
sqlplus -s scott/tiger@prod @find.sql &
sqlplus -s scott/tiger@prod @find.sql &
sqlplus -s scott/tiger@prod @find.sql
chmod +x run.sql
--开始执行程序之前的library cache情况
SQL> select event,TOTAL_WAITS,TIME_WAITED from v$system_event where event like '%library cache%';
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
library cache load lock 2 1
--执行程序
./run.sql
--一段时间后再次查询library cache争用情况
SQL> select event,TOTAL_WAITS,TIME_WAITED from v$system_event where event like '%library cache%';
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: library cache 1 3
library cache pin 3 11
library cache load lock 2 1
/
EVENT TOTAL_WAITS TIME_WAITED
---------------------------------------------------------------- ----------- -----------
latch: library cache 23 52//争用出现
latch: library cache pin 5 12
library cache pin 3 11
library cache load lock 2 1
--查询子latch的地址和详细的争用情况
SQL> select addr,latch#,child#,name,hash,gets,misses,sleeps,spin_gets from v$latch_children where name = 'library cache' order by sleeps;
ADDR LATCH# CHILD# NAME HASH GETS MISSES SLEEPSSPIN_GETS
-------- ---------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------
313C3B3C 214 1 library cache 3055961779 24242 18 0 18
313C3A74 214 3 library cache 3055961779 41359 31 2 29
313C3AD8 214 2 library cache 3055961779 2707431 15096 79 15020 //争用显著
--查询执行次数多的SQL语句
select * from (select sql_text,executions from v$sqlarea order by executions desc) where rownum <= 10;
SQL_TEXT EXECUTIONS
------------------------------------------------------------ ----------
SELECT NAME FROM BIGTAB WHERE ID = :B1 2457736 //发现问题,实有与这个SQL引起的library cache latch争用
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, 1960
timestamp#, sample_size, minimum, maximum, distcnt, lowval,
hival, density, col#, spare1, spare2, avgcln from hist_head
$ where obj#=:1 and intcol#=:2
select /*+ rule */ bucket, endpoint, col#, epvalue from hist 744
grm$ where obj#=:1 and intcol#=:2 and row#=:3 order by bucke
t
select timestamp from fixed_obj$ where obj# = :1 597
select type#,blocks,extents,minexts,maxexts,extsize,extpct,u 555
ser#,iniexts,NVL(lists,65535),NVL(groups,65535),cachehint,hw
mincr, NVL(spare1,0),NVL(scanhint,0) from seg$ where ts#=:1
and file#=:2 and block#=:3
select intcol#,nvl(pos#,0),col#,nvl(spare1,0) from ccol$ whe 492
re con#=:1
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname, 324
o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
select obj#,type#,ctime,mtime,stime,status,dataobj#,flags,oi 320
d$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and
namespace=:3 and remoteowner is null and linkname is null a
nd subname is null
select pos#,intcol#,col#,spare1,bo#,spare2 from icol$ where 243
obj#=:1
select count(*) from sys.job$ where (next_date > sysdate) an 204
d (next_date < (sysdate+5/86400))
:):):):)不错,顶一个 上面是对三种latch的解决方案。
这三种latch是最常见的latch。
希望大家认真的阅读和理解,当然前提是大家对latch的工作机制以及两种pool的内存结构有比较清楚的理解。 不错,顶一个学习一下:)
页:
[1]