Monday, February 28, 2011

How can I confirm there was an I/O problem (2) ?

In a previous post, I have mentioned that I have tried to mine AWR data to confirm an I/O issue occurred around 2:18 PM of Feb 18; however I encountered some issue and instead found out 'sar' was helpful. Today, I worked out the query against AWR tables. The query is as follows:

rem  awr_io_file_smpl.sql  
rem   -- Adapted from the scripts written by Karl Arao: http://karlarao.wordpress.com
rem
rem  check if there are I/O problem from AWR. 
rem  normally atpr (Average time per read)  should be less than 20 ms
rem  if atpr > 100 ms, it obviously indicates that we have problem
rem   
rem  note: the statistics are based on  (tm-interval, tm)
rem    


col atpr format 999,999
col atpw format 999,999

set echo off
-- set markup HTML ON
-- SPOOL ON ENTMAP ON PREFORMAT OFF
-- spool awr_io_file_smpl.html
spool awr_io_file_smpl.log
select * from (
SELECT 
--      s0.snap_id snap_id,
       to_char(s1.END_INTERVAL_TIME, 'YYMMDD HH24:MI') tm,
--      s0.instance_number inst,
       ROUND(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440 + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60 + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2) dur,
--      e.tsname,
      e.file# ,
--      SUBSTR(e.filename, 1, 52) filename ,
--      e.readtim         - NVL(b.readtim,0) readtim ,
      e.phyrds          - NVL(b.phyrds,0) reads ,
      DECODE ((e.phyrds - NVL(b.phyrds, 0)), 0, to_number(NULL), ((e.readtim - NVL(b.readtim,0)) 
                   / (e.phyrds - NVL(b.phyrds,0)))*10) atpr ,
--      (e.phyrds         - NVL(b.phyrds,0)) / ((ROUND(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440 + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60 + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2))*60) rps  ,
--      DECODE ((e.phyrds  - NVL(b.phyrds, 0)), 0, to_number(NULL), (e.phyblkrd - NVL(b.phyblkrd,0)) / (e.phyrds - NVL(b.phyrds,0)) ) bpr ,
--      e.writetim         - NVL(b.writetim,0) writetim ,
      e.phywrts          - NVL(b.phywrts,0) writes ,
      DECODE ((e.phywrts - NVL(b.phywrts, 0)), 0, to_number(NULL), ((e.writetim - NVL(b.writetim,0)) / (e.phywrts - NVL(b.phywrts,0)))*10) atpw ,
--      (e.phywrts         - NVL(b.phywrts,0)) / ((ROUND(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440 + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60 + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2))*60) wps ,
--      DECODE ((e.phywrts    - NVL(b.phywrts, 0)), 0, to_number(NULL), (e.phyblkwrt - NVL(b.phyblkwrt,0)) / (e.phywrts - NVL(b.phywrts,0)) ) bpw ,
--      e.wait_count          - NVL(b.wait_count,0) waits ,
--      DECODE ((e.wait_count - NVL(b.wait_count, 0)), 0, 0, ((e.time - NVL(b.time,0)) / (e.wait_count - NVL(b.wait_count,0)))*10) atpwt,
      (e.phyrds             - NVL(b.phyrds,0)) + (e.phywrts - NVL(b.phywrts,0)) ios,
      ((e.phyrds            - NVL(b.phyrds,0)) + (e.phywrts - NVL(b.phywrts,0))) / ((ROUND(EXTRACT(DAY FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 1440 + EXTRACT(HOUR FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) * 60 + EXTRACT(MINUTE FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) + EXTRACT(SECOND FROM s1.END_INTERVAL_TIME - s0.END_INTERVAL_TIME) / 60, 2))*60) iops
    FROM dba_hist_snapshot s0,
      dba_hist_snapshot s1,
      dba_hist_filestatxs e,
      dba_hist_filestatxs b
    WHERE s0.dbid =(select dbid from v$database)
    AND s1.dbid            = s0.dbid
    AND b.dbid(+)          = s0.dbid -- begin dbid
    AND e.dbid             = s0.dbid -- end dbid
    AND b.dbid             = e.dbid  -- remove oj
    AND s0.instance_number =dbms_utility.CURRENT_INSTANCE
    AND s1.instance_number = s0.instance_number
    AND b.instance_number(+) = s0.instance_number 
    AND e.instance_number    = s0.instance_number 
    AND b.instance_number    = e.instance_number 
    AND s1.snap_id           = s0.snap_id + 1
    AND b.snap_id(+)         = s0.snap_id      
    AND e.snap_id            = s0.snap_id + 1 
    AND b.tsname             = e.tsname      
    AND b.file#              = e.file#           
    AND b.creation_change#   = e.creation_change# 
    AND ((e.phyrds - NVL(b.phyrds,0)) + (e.phywrts - NVL(b.phywrts,0))) > 0
    AND s0.END_INTERVAL_TIME >= to_date('2011-02-18 13:00:00', 'YYYY-MM-DD HH24:MI:SS') 
    AND s0.END_INTERVAL_TIME <= to_date('2011-02-18 16:00:00', 'YYYY-MM-DD HH24:MI:SS') 
    order by s1.END_INTERVAL_TIME, e.tsname
--    and b.tsname='xxxx'
)
where atpr > 100 
;
spool off
set markup html off spool off
set termout on



The output of above query is as follows:

TM                  DUR      FILE#      READS     ATPR     WRITES     ATPW        IOS       IOPS
------------ ---------- ---------- ---------- -------- ---------- -------- ---------- ----------
110218 14:00      15.04        268          8      124          1        0          9 .009973404
110218 14:00      15.04        283          7      141          1        0          8 .008865248
110218 14:00      15.04        281          8      118          1        0          9 .009973404
110218 14:00      15.04        296          8      116          1        0          9 .009973404
110218 14:00      15.04        273          8      123          1        0          9 .009973404
110218 14:00      15.04        293          8      123          1        0          9 .009973404
110218 14:00      15.04        288          8      125          1        0          9 .009973404
110218 14:00      15.04        276          8      120          1        0          9 .009973404
110218 14:00      15.04        286          7      133          1        0          8 .008865248
110218 14:00      15.04        278          8      123          1        0          9 .009973404
110218 14:00      15.04        308          8      125          1        0          9 .009973404
110218 14:00      15.04        271          8      116          1        0          9 .009973404
110218 14:00      15.04        248          7      139          1       10          8 .008865248
110218 14:30       14.6        502       3472      205        620      816       4092 4.67123288
110218 14:30       14.6        519        351    1,695         72    1,749        423 .482876712
110218 14:30       14.6        515        664      347         90        6        754 .860730594
110218 14:30       14.6        504       1174      214        114    1,107       1288 1.47031963
110218 14:30       14.6        448       8463      139        550      461       9013 10.2888128
110218 14:30       14.6        412       5772      131        361      353       6133 7.00114155
110218 14:30       14.6        462       5889      215        437      291       6326 7.22146119
110218 14:30       14.6        429      10276      313        356      360      10632 12.1369863
110218 14:30       14.6        427       4255      189        183      691       4438 5.06621005
110218 14:30       14.6        473       7355      298        865      149       8220 9.38356164
110218 14:30       14.6        513        867      148        218      587       1085 1.23858447
110218 14:30       14.6        479       5903      196        986      144       6889 7.86415525
110218 14:30       14.6        475       7702      263       1163      221       8865  10.119863
110218 14:30       14.6        471       9543      155        141      906       9684 11.0547945
110218 14:30       14.6        523       2187      378       1014      253       3201 3.65410959
110218 14:30       14.6        482       6674      150       1536      252       8210 9.37214612
110218 14:30       14.6        437      10793      271       1920      401      12713 14.5125571
110218 14:30       14.6        395       8062      151        555      233       8617 9.83675799
110218 14:30       14.6        464      19537      431       4212      144      23749 27.1107306
110218 14:30       14.6        491      14568      440       4247      220      18815 21.4783105
110218 14:30       14.6        444      20026      385       2541      212      22567 25.7614155
110218 14:30       14.6        481      15496      444       3763      287      19259 21.9851598
110218 14:30       14.6        421      14930      334       1642       85      16572 18.9178082
110218 14:30       14.6        514       5464      597       2542      360       8006 9.13926941
110218 14:30       14.6        435       3804      165        167      751       3971 4.53310502
110218 14:30       14.6        512       2065      237        474      276       2539 2.89840183
110218 14:30       14.6        498       4540      186        508      993       5048 5.76255708
110218 14:30       14.6        478       8505      159       1161      561       9666 11.0342466
110218 14:30       14.6        470       8134      182        454      283       8588 9.80365297
110218 14:30       14.6        460       4841      168        391      325       5232 5.97260274
110218 14:30       14.6        474       6486      130        779      166       7265   8.293379
110218 14:30       14.6        411       3368      291        186      678       3554 4.05707763
110218 14:30       14.6        521       2447      183        543      235       2990 3.41324201
110218 14:30       14.6        418       3837      124        128      991       3965 4.52625571
110218 14:30       14.6        100        160    1,370        670      213        830 .947488584
110218 14:30       14.6        139        408      273       1061      366       1469 1.67694064
110218 15:00      15.03         62          4      103          1       10          5 .005544467
110218 15:30      15.04         69          6      107          2        0          8 .008865248
110218 15:30      15.04         62          6      102          2        5          8 .008865248
110218 15:30      15.04         63          6      102          2        5          8 .008865248
110218 15:30      15.04         64          6      107          2        5          8 .008865248
110218 15:30      15.04         78          6      113          2        5          8 .008865248
110218 15:30      15.04         77          6      107          2       10          8 .008865248
110218 15:30      15.04         66          6      107          2        5          8 .008865248
110218 16:00      15.04        125          6      282          1      180          7 .007757092
110218 16:00      15.04        110          6      282          1       70          7 .007757092

59 rows selected.

From the above output, it can be seen from about 14:15 -14:30, there are many data files having APTR > 100 ms. This clearly indicates there was I/O problem during that period.

A simple concept map for some basic storage terms



Ref:
[1] NetApp University “Data ONTAP 7.3 Fundamentals” Version 5.0

[2] Using Oracle10 g™ Database Automatic Storage Management with Network Appliance™ Storage (2004 White paper, PDF)

Tuesday, February 22, 2011

Tip - List all of the tables in the Shareplex replication

Some time ago a co-worker showed me how to do this with a query. Today I have a need to do it, but I forget. I have to get the table list from the configuration file. Now I blog this as a reference for future:

--- for port 2200 ---
select owner, object_name, objid
from splex.shareplex_objmap s, dba_objects o
where s.objid = o.object_id 
order by owner, object_name;

--- for port 2201 ----
select owner, object_name, objid
from splex_cpf.shareplex_objmap s, dba_objects o
where s.objid = o.object_id 
order by owner, object_name;

How can I confirm there was an I/O problem?

Last Friday afternoon , our production support team ( this team manages application servers in my working environment) reported that arounnd 2:18 PM there were application queuing and some job time-out issues. Naturally, as a knee-jerk reaction, the problem was described as "Database is slow". Later, some "SCSI trans" errors are found somewhere , which indicated there were some I/O problem. So my question is as a DBA, how can I confirm there is an I/O problem?

At first, I tried to look into the AWR data. I modified awr_io_file.sql script by Karl Arao , in hope to report Average time per read time for the data file during the problem period. However, the script was running more than 20 min without results. I killed the job and decided to research on this mining AWR data approach later. BTW, one drawback I am aware of is that even I can get this data, it will be averaged over 15 min (our AWR snapshot interval). Averaging could hide the real problem.

After googling a while, I found that 'sar' utility in my Solaris platform can shed some light on I/O issue in the recent past history.

If we do 'man sar', we can found the following description about sar -f option:

sar [-aAbcdgkmpqruvwy] [-e time] [-f filename]  [-i sec]  [-s time]

sar  extracts  data  from  a  previously  recorded filename,
either the one specified by the -f option  or,  by  default,
the    standard    system    activity    daily   data   file
/var/adm/sa/sadd for the current day dd.  The  starting  and
ending  times  of the report can be bounded using the -e and
-s arguments with time specified in the  form  hh[:mm[:ss]].
The  -i option selects records at sec second intervals. Oth-
 erwise, all intervals found in the data file are reported.


This is the first time that I am aware of that there are 'standard system activity daily data file'
at /var/adm/sa/. As of now, I can see the following files:


myservnm:/var/adm/sa [orclsid] $ ls -l 
total 2177984
-rw-r--r--   1 sys      sys      241097472 Feb 18 23:55 sa18
-rw-r--r--   1 sys      sys      241097472 Feb 19 23:55 sa19
-rw-r--r--   1 sys      sys      241097472 Feb 20 23:55 sa20
-rw-r--r--   1 sys      sys      241097472 Feb 21 23:55 sa21
-rw-r--r--   1 sys      sys      78691536 Feb 22 07:45 sa22

So it looks like we have 5 days data.

What I did last Friday afternoon was:

sar -A  -e 14:28  -s 14:10   > ~/sar1410_1428.txt

If you don't specify the -f, it will extract from the current day's file, i.e. sa18 last Friday.
In the output file, we can see various info recorded at 5 min snapshot

For example, CPU utilization:

SunOS myservnm 5.10 Generic_142900-05 sun4u    02/18/2011
 
14:10:02    %usr    %sys    %wio   %idle
14:15:03      17      14       0      69
14:20:04      10      11       0      78
14:25:05      20      18       0      62
 
Average       16      15       0      70


In the device activity section, I can find at 14:20 snapshot some lines have very high avserv time.
while normally I saw avserv < 30 ms in my system. for examples:
14:10:02   device        %busy   avque   r+w/s  blks/s  avwait  avserv
 
14:15:03   sd7               1     0.0       1      12     0.0     9.0
           sd7,e             1     0.0       1      12     0.0     9.2
           sd8              32     0.3      46     426     0.0     7.4
....
14:20:04   sd7               1     0.0       1      16     0.0     9.1
           sd7,e             1     0.0       1      16     0.0     9.3
...          
sd1133,e          1     0.0       1      26     0.0    12.9
           sd1134           44     0.8       3      73    78.6   238.1
           sd1134,e         44     0.8       3      73    79.1   239.6
           sd1135           42     0.4       1      20     0.0   485.3
           sd1135,e         42     0.4       1      20     0.0   494.7
           sd1136           42     0.4       0      25     0.0  1000.9
           sd1136,e         42     0.4       0      25     0.0  1042.2
           sd1142            1     0.0       1      40     0.0    13.0
...
  sd1204,e          4     0.0       3      81     0.0    16.0
           sd1205           43     0.4       1      22     0.0   440.3
           sd1205,e         43     0.4       1      22     0.0   448.0
           sd1206           43     0.8       2      60    85.3   254.3
           sd1206,e         43     0.8       2      60    85.9   256.1
           sd1207            1     0.0       1      25     0.0    12.7



This webpage  describes the meaning of some sar options and report fields.
This article called "Identifying disk I/O-bound systems" has information about intepreting sar report.

I am happy I have another tool under my tool-belt.

Tuesday, February 15, 2011

Tuning an inefficient SQL with semi-join

Found a sql look like the following in a batch job:

SELECT    
 distinct
 C.INT_PSEFS_ID
,C.ACCOUNT_ID
,C.PSEFS_ID
,C.IS_PQ
,C.psefs_save_date
FROM     
   MYSCH.PSEFS_MSTR A
  ,MYSCH.PSEFS_DTLS B
  ,MYSCH.PSEFS_MSTR C
WHERE    
    A.INT_PSEFS_ID = B.INT_PSEFS_ID
  AND C.PSEFS_ID LIKE SUBSTR (A.PSEFS_ID, 1, 13)|| '%'
  AND A.IS_VALID = 'N' 
  AND B.PSEFS_STATUS_ID IN (4, 6)
  AND A.PSEFS_COMP_DATE  <= SYSDATE - 45
;
The execution plan is as follows:
---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    66T|  4861T|       |  3064G  (2)|999:59:59 |
|   1 |  HASH UNIQUE                 |                  |    66T|  4861T|    11P|  3064G  (2)|999:59:59 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PSEFS_MSTR       |  3287K|   122M|       |   149K  (1)| 00:29:49 |
|   3 |    NESTED LOOPS              |                  |    66T|  4861T|       |  3027G  (1)|999:59:59 |
|*  4 |     HASH JOIN                |                  |    20M|   794M|   550M|   839K  (3)| 02:47:49 |
|*  5 |      TABLE ACCESS FULL       | PSEFS_DTLS       |    26M|   250M|       |   612K  (2)| 02:02:29 |
|*  6 |      TABLE ACCESS FULL       | PSEFS_MSTR       |    20M|   600M|       |   156K  (6)| 00:31:17 |
|*  7 |     INDEX RANGE SCAN         | PSEFS_MSTR_IX5   |   591K|       |       |  5154   (1)| 00:01:02 |
---------------------------------------------------------------------------------------------------------


INT_PSEFS_ID is a PK of A, and a FK of B, so A join B on INT_PSEFS_ID will generate a result set (refer to as R1) with duplicated records of A; Then we selected out row of C (same table as A) by join it with R1t based on an unusual join condition with LIKE operator. To see the problem by way of example, supposing we have a row in C with PSRFS_ID='ABCDEFGHIJKLM_2'; In R1, we could have several rows with PSRFS_ID in('ABCDEFGHIJKLM_3', 'ABCDEFGHIJKLM_4', 'ABCDEFGHIJKLM_4', 'ABCDEFGHIJKLM'), due to the LIKE, the row in C will match 4 rows in R1, thus generate 4 duplicated rows in the final result set. I rewrote the SQL to use sub-queries to enable the more efficient semi-joins intead of normal joins
select 
C.INT_PSEFS_ID
, C.ACCOUNT_ID
, C.PSEFS_ID,IS_PQ
, C.psefs_save_date
from MYSCH.PSEFS_MSTR C
where
   SUBSTR(C.PSEFS_ID,1,13) in
   (
   select  SUBSTR(A.PSEFS_ID,1,13)
   from MYSCH.PSEFS_MSTR  A
    where  A.IS_VALID = 'N'
     and A.PSEFS_COMP_DATE  <= SYSDATE - 45 
     and exists ( select 1 from MYSCH.PSEFS_DTLS B
     where A.int_psefs_id = B.int_psefs_id
       and B.PSEFS_STATUS_ID IN (4, 6) )
   ) ;

Now the execution plan looks like:
------------------------------------------------------------------------------------------------
| Id  | Operation              | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |               |    26M|  1182M|       |  1174K  (3)| 03:54:49 |
|*  1 |  HASH JOIN RIGHT SEMI  |               |    26M|  1182M|   387M|  1174K  (3)| 03:54:49 |
|   2 |   VIEW                 | VW_NSO_1      |    20M|   155M|       |   839K  (3)| 02:47:49 |
|*  3 |    HASH JOIN RIGHT SEMI|               |    20M|   794M|   550M|   839K  (3)| 02:47:49 |
|*  4 |     TABLE ACCESS FULL  | PSEFS_DTLS    |    26M|   250M|       |   612K  (2)| 02:02:29 |
|*  5 |     TABLE ACCESS FULL  | PSEFS_MSTR    |    20M|   600M|       |   156K  (6)| 00:31:17 |
|   6 |   TABLE ACCESS FULL    | PSEFS_MSTR    |    65M|  2445M|       |   153K  (4)| 00:30:45 |
------------------------------------------------------------------------------------------------


Fig 1. Good query visual representation
           



Fig 2. Bad query visual representaion


Note: though this sql is actually rather simple, the visual approach advocated by Jonathan Lewis in this article really helps me to do the analysis

Monday, February 14, 2011

Open a HTML file from Cygwin command line

Cygwin allows me have a UNIX-link enviroment inside my Windows XP environment. As a Cygwin user, sometimes I would like to open a HTML file from Cygwin command line. To achieve this, first of all I create the following file and put under ~/bin.


---- start of ie.bat -----
@start "" /b "C:\Program Files\Internet Explorer\iexplore.exe" %*
---- end of ie.bat  ------


Secondly, in Cygwin, if we type pwd, we get UNIX-like current path name, i.e. with forward slashs, for example:
$ pwd
/cygdrive/c/Denis_Files/VZ2009

I need to convert it to Windows path name, so I create a shell script as follows:

$ cat ~/bin/pwdw
pwd | sed s#/cygdrive/c#c:# |sed  -e 's#\/#\\#g'


Now if I type 'pwdw' in the Cygwin command windows, I got:

$ pwdw
c:\Denis_Files\VZ2009

To start a HTML file, such as AWR report, what I need to do is:

ie.bat 'c:\Denis_Files\VZ2009\awrrpt_20101003_1300_1330.html'


Just a small tip that makes my life easier :-).

Wednesday, February 09, 2011

Quick notes about troubleshooting a blocker issue involving an insert

Called to help troubleshooting a production issue: an insert statemnt blocked many others. However, when I checked the database on 7:30 PM, the symptoms have gone.

In retrospective, the problem is most  likely due to a configuration issue in context of DMT. This database was upgraded to 10g a year ago from 8i. A few tablespaces is still dictionary-managed.


Here is the AWR "Top 5 Timed Events" during the problem period:

Top 5 Timed Events (18:30 -19:00)

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class

enq: TX - contention 44,494 130,783 2,939 59.7 Other
enq: ST - contention 27,122 79,719 2,939 36.4 Configuration
CPU time 3,700 1.7
read by other session 839,265 2,843 3 1.3 User I/O
db file scattered read 405,142 2,121 5 1.0 User I/O


From: http://www.saptechies.com/faq-oracle-enqueues/,  we can read some info about Oracle enqueue.


System enqueues:

ST (space transaction enqueue): This enqueue is held in dictionary-managed tablespaces within extent allocations and releases.

TYPE = ST

Very large numbers of extents are allocated or deallocated.


You can solve the problem permanently by using LMTS instead of DMTS (refer to Notes 214995 and 659946). If you use DMTS, you must ensure that the extents of temporary segments, tables and indexes are sufficiently big so that you can avoid large quantities of extents being allocated and deallocated.




I wan able to find the blocker from the DBA_HIST_ACTIVE_SESS_HISTORY view. Drunig 18:30 - 19:00 Session 651 is blocking many other sessions, which either wait for enq: TX or enq:ST  for examples:

SAMPLE_TIME SESSION_ID SQL_ID EVENT BLOCKING_SESSION

---------------------------------------- ---------- ------------- ------------------------------ ----------------
08-FEB-11 06.59.56.243 PM 242 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 345 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 414 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 469 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 591 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 593 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 594 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 595 6wyrqa50whrjg enq: TX - contention 651
08-FEB-11 06.59.56.243 PM 596 6wyrqa50whrjg enq: TX - contention 651

...


SAMPLE_TIME SESSION_ID SQL_ID EVENT BLOCKING_SESSION
---------------------------------------- ---------- ------------- ------------------------------ ----------------
08-FEB-11 06.59.56.243 PM 1976 dpc1k1djg5gh0 enq: ST - contention 651
08-FEB-11 06.59.56.243 PM 49 dpc1k1djg5gh0 enq: ST - contention 651
08-FEB-11 06.59.56.243 PM 1841 dpc1k1djg5gh0 enq: ST - contention 651
08-FEB-11 06.59.56.243 PM 1763 dpc1k1djg5gh0 enq: ST - contention 651
08-FEB-11 06.59.56.243 PM 1693 dpc1k1djg5gh0 enq: ST - contention 651
08-FEB-11 06.59.56.243 PM 1660 dpc1k1djg5gh0 enq: ST - contention 651

....

Session 651 that was doing the same insert lasted about 1hour, from about 6:17 to 7:15

SAMPLE_TIME SESSION_ID SQL_ID EVENT BLOCKING_SESSION
---------------------------------------- ---------- ------------- ------------------------------ ----------------
08-FEB-11 06.17.23.778 PM 651 6wyrqa50whrjg
08-FEB-11 06.17.33.998 PM 651 6wyrqa50whrjg
08-FEB-11 06.17.44.198 PM 651 6wyrqa50whrjg
08-FEB-11 06.17.54.508 PM 651 6wyrqa50whrjg
...

08-FEB-11 07.14.45.919 PM 651 6wyrqa50whrjg
08-FEB-11 07.14.56.049 PM 651 6wyrqa50whrjg
08-FEB-11 07.15.06.159 PM 651 6wyrqa50whrjg
-- end --



After this sql ( 6wyrqa50whrjg  is an  simple insert into a table) was done, things become normal.  I am still not sure why this insert took so long. I was unable to know what wait event it was experiencing. Apparantly, Oracle was allocating a new extent to the table segment, but why took about an hour? The table referred in the SQL has two extents now, first one has 64000 blocks, the second has 16000 blocks.

So the scenario appeared to be that a session wants to insert a row into a table that resides in a DMT tablespace, but  the table segment is run out of space, Oracle firstly has to allocate a new extent to the table segment; the session is waiting for this allocation to complete; it blocks all other sessions doing the same insert into the same table with the latter waiting for enq:TX. It also blocks other sessions with enq:ST events,  which  are doing DML on  different tables  that resides in the same tablespace.

Anyway we really need to migrate DMT to LMT asap.