Friday, March 13, 2015

Just another example - Writing efficient SQL with analytic function

I have a monitoring job set up to alert me when the buffer gets per execution of a SQL above certain threshold. Today I received one as below:
=================================================
start :  13-MAR-15 02.00.23.885 PM
end   :  13-MAR-15 03.00.48.168 PM
snapid from     107851  to     107853
=================================================

********************************************************
list sql with buffer gets per execution > 100000
********************************************************


!!!! ##########  Expensive SQL found ####### !!!!                               
instance   : 1                                                                  
sql_id     : drdbm833ack3c                                                      
Buffer get : 260358                                                             
Execs      : 1                                                                  
BG/exec    : 260358                                                             
Gets/row   : 700                                                                
SQL TEXT   :                                                                    
SELECT p.rec_id, p.CREATE_DATE,                                                 
p.ordered_zzzyyy_type,p.qualified_zzzyyy_type,p.req_type,p.c_transid,p.status,p.
ERR_CODE,p.ERR_DESC,p.SVC_STATE,p.IS_DOWNGRADED,p.ETF_WAIVE_FLAG  FROM          
BL_XYZ_CHNG_PKG p WHERE p.req_type='R'  AND p.status || ''='IN'  AND            
p.jrs_indicator IN ('1','2','3')  and p.c_transid IN (SELECT MAX(C_TRANSID) FROM
BL_XYZ_CHNG_PKG  GROUP BY rec_id)                                               
 
...
After checking this sql, I've realized this may be a classical example where using analytic function can save resource.

 Original one -- Full table Scan on the same table twice 259k gets
SQL> SELECT p.rec_id,
  2         p.CREATE_DATE,
  3         p.ordered_zzzyyy_type,
  4         p.qualified_zzzyyy_type,
  5         p.req_type ,
  6         p.c_transid,p.status,
  7         p.ERR_CODE,p.ERR_DESC,p.SVC_STATE,p.IS_DOWNGRADED,p.ETF_WAIVE_FLAG
  8    FROM
  9         xyzu.BL_XYZ_CHNG_PKG p
 10  WHERE p.req_type='R'
 11     AND p.status || ''='IN'
 12     AND p.jrs_indicator IN ('1','2','3')
 13     and p.c_transid IN
 14          (SELECT MAX(C_TRANSID) FROM BL_XYZ_CHNG_PKG  GROUP BY rec_id)
 15  ;

242 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3808252951

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1613 |   127K|       | 83112   (3)| 00:16:38 |
|*  1 |  HASH JOIN           |                 |  1613 |   127K|       | 83112   (3)| 00:16:38 |
|*  2 |   TABLE ACCESS FULL  | BL_XYZ_CHNG_PKG |  1613 |   107K|       | 25463   (3)| 00:05:06 |
|   3 |   VIEW               | VW_NSO_1        |  6443K|    79M|       | 57578   (2)| 00:11:31 |
|   4 |    HASH GROUP BY     |                 |  6443K|    86M|   148M| 57578   (2)| 00:11:31 |
|   5 |     TABLE ACCESS FULL| BL_XYZ_CHNG_PKG |  6459K|    86M|       | 25154   (2)| 00:05:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("P"."C_TRANSID"="$nso_col_1")
   2 - filter("P"."REQ_TYPE"='R' AND "P"."STATUS"||''='IN' AND ("P"."JRS_INDICATOR"='1'
              OR "P"."JRS_INDICATOR"='2' OR "P"."JRS_INDICATOR"='3'))


Statistics
----------------------------------------------------------
         95  recursive calls
          0  db block gets
     259618  consistent gets
     267154  physical reads
          0  redo size
      10348  bytes sent via SQL*Net to client
        261  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        242  rows processed

Rewrite with rank() over partition by  construct - one Full table scan with 129k

SQL> SELECT rec_id,
  2         CREATE_DATE,
  3         ordered_zzzyyy_type,
  4         qualified_zzzyyy_type,
  5         req_type ,
  6         c_transid,status,
  7         ERR_CODE,ERR_DESC,SVC_STATE,IS_DOWNGRADED,ETF_WAIVE_FLAG
  8  from
  9  (
 10     SELECT p.rec_id,
 11            p.CREATE_DATE,
 12            p.ordered_zzzyyy_type,
 13            p.qualified_zzzyyy_type,
 14            p.req_type ,
 15            p.c_transid,p.status,
 16            p.ERR_CODE,p.ERR_DESC,p.SVC_STATE,p.IS_DOWNGRADED,p.ETF_WAIVE_FLAG,
 17            rank() over ( partition by rec_id order by c_transid desc ) rank
 18       FROM
 19            xyzu.BL_XYZ_CHNG_PKG p
 20     WHERE p.req_type='R'
 21        AND p.status || ''='IN'
 22        AND p.jrs_indicator IN ('1','2','3')
 23  )  A
 24  where rank=1;

242 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3870177004

--------------------------------------------------------------------------------------------
| Id  | Operation                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                 |  1613 |   546K| 25464   (3)| 00:05:06 |
|*  1 |  VIEW                    |                 |  1613 |   546K| 25464   (3)| 00:05:06 |
|*  2 |   WINDOW SORT PUSHED RANK|                 |  1613 |   107K| 25464   (3)| 00:05:06 |
|*  3 |    TABLE ACCESS FULL     | BL_XYZ_CHNG_PKG |  1613 |   107K| 25463   (3)| 00:05:06 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RANK"=1)
   2 - filter(RANK() OVER ( PARTITION BY "REC_ID" ORDER BY
              INTERNAL_FUNCTION("C_TRANSID") DESC )<=1)
   3 - filter("P"."REQ_TYPE"='R' AND "P"."STATUS"||''='IN' AND
              ("P"."JRS_INDICATOR"='1' OR "P"."JRS_INDICATOR"='2' OR "P"."JRS_INDICATOR"='3'))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     129827  consistent gets
     127794  physical reads
          0  redo size
      10184  bytes sent via SQL*Net to client
        261  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        242  rows processed

Friday, March 06, 2015

Killing DBSNMP sessions not easy

We had a performane issue for a prodcution database today. The CPU utilization went up. When I logged into the server and tried to log into the database as sysdba, I got ORA-00020 error:
$sqlplus "/ as sysdba"

SQL*Plus: Release 11.2.0.2.0 Production on Fri Mar 6 10:55:10 2015

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-00020: maximum number of processes (700) exceeded

Another DBA reported that the session count for DBSNMP user went up high. This user is from OEM. I stopped the OEM agent and he killed all DBSNMP sessions at database level. However, almost 1 hour later, we still saw high DBSNMP session count:
Username           COUNT(*)
---------------- ----------
DBSNMP                  352
XXXX100                   1
XXXXSELECT                1
XXXXUSER                232
IVIEW2XXXX               25
ORACLE                    5
XX09195                   1
XXX6621                   6
XXX3294                   1
XXDBMON                   1
                         36
note: some usernames are marked to protect ther innocence.

Of course, we can see all DBSNMP sessions marked as "KILLED"
SQL> select username, status from v$session where username='DBSNMP';

Username         Status
---------------- --------
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
DBSNMP           KILLED
...

So It appeared to be a good idea to kill the processes at OS level. I fired the following script I usually use to get the KILL command:
set pages 0
spool /tmp/kill_os

select 'kill -9 '|| p.spid
from v$process p , v$session s
where p.background is null
and s.paddr = p.addr (+)
and p.spid is not null
and s.username='DBSNMP'
order by p.spid
/
Guess what, to my surprise, it returned nothing !. I got the count for those "LOCAL=NO" processes as below:
$ps -ef |grep "LOCAL=NO" | wc -l

659

Now the problem was how could I get the PIDs out of those 659 processes that belong to DBSNMP ? What I did finally was:

1. load the PIDs of all those "LOCAL=NO" processes into a table db_admin.pid with one number column called ID:
ps -ef | grep LOCAL=NO | awk '{print "insert into db_admin.pid values(" $2 "); }'

2. Run the following query to generate the KILL commands

select 'kill -9 ' ||  id from db_admin.pid x 
where not exists (
select 1 
from v$session s, v$process p
where s.paddr = p.addr (+)
  and p.spid = x.id
)
order by id
/

This trick worked. To prevent future troubles, we've also set up a profile limit for the number of sessions the user DBSNMP can have. We still don't know what caused the session count high though.

Just Learned: About listener.ora

Recently I have been planning to upgrade a 10.2.0.4 Oracle database to 12.1.0.2. For that, I created a test db called minidb to play with upgrade steps. At the end, I need to bring up a listener for that db. This is what I've just learned: do you know we can start a listener without listener.ora? The listener.ora is a file that stores listener configuration information. Because the configuration parameters have default values, it is possible to start and use a listener with no configuration. The below screenshot shows I don't have a listener.ora file at the normal location ($ORACLE_HOME/nework/admin), and I don't have listener up and running:
exxxxxspd02 /apps/opt/oracle/product/12.1.0/network/admin [minidb] > ls -l
total 7
drwxr-xr-x   2 oracle   dba            6 Mar  6 08:41 samples
-rw-r--r--   1 oracle   dba          373 Mar 16  2014 shrept.lst
-rw-r--r--   1 oracle   dba          127 Mar  6 08:18 tnsnames.ora
exxxxxspd02 /apps/opt/oracle/product/12.1.0/network/admin [minidb] > lsnrctl status

LSNRCTL for Solaris: Version 12.1.0.2.0 - Production on 06-MAR-2015 08:42:20

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Solaris Error: 146: Connection refused

Now I started up a listener:
exxxxxspd02 /apps/opt/oracle/product/12.1.0/network/admin [minidb] > lsnrctl start

LSNRCTL for Solaris: Version 12.1.0.2.0 - Production on 06-MAR-2015 08:44:59

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Starting /apps/opt/oracle/product/12.1.0/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 12.1.0.2.0 - Production
Log messages written to /apps/opt/oracle/diag/tnslsnr/exxxxxspd02/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exxxxxspd02)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Solaris: Version 12.1.0.2.0 - Production
Start Date                06-MAR-2015 08:44:59
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /apps/opt/oracle/diag/tnslsnr/exxxxxspd02/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exxxxxspd02)(PORT=1521)))
The listener supports no services
The command completed successfully
Notice by default this listener is listening on port 1521 and supports no services. I can register my minidb with this local listener through a mechanism called "Dynamic Service Registration". For that to happen, I need to configure an initialization parameter called "LOCAL_LISTENER":
exxxxxspd02 /apps/opt/oracle/product/12.1.0/network/admin [minidb] > sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Fri Mar 6 08:50:13 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter local_list

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
SQL> alter system set local_listener='(ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))' scope=both;

System altered.

SQL> ho lsnrctl status

LSNRCTL for Solaris: Version 12.1.0.2.0 - Production on 06-MAR-2015 08:51:47

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Solaris: Version 12.1.0.2.0 - Production
Start Date                06-MAR-2015 08:44:59
Uptime                    0 days 0 hr. 6 min. 48 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /apps/opt/oracle/diag/tnslsnr/exxxxxspd02/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exxxxxspd02)(PORT=1521)))
Services Summary...
Service "minidb" has 1 instance(s).
  Instance "minidb", status READY, has 1 handler(s) for this service...
The command completed successfully

What happened at background is that by default, the LREG background process can register service information with its local listener on the default local address of TCP/IP, port 1521.
exxxxxspd02 /apps/opt/oracle/product/12.1.0/network/admin [minidb] > ps -ef |grep lreg |grep -v grep
  oracle 28186     1   0 08:24:15 ?           0:00 ora_lreg_minidb
To ensure service registration works properly, the initialization parameter file should contain the following parameters:

  • SERVICE_NAMES for the database service name 
  •  INSTANCE_NAME for the instance name 
  •   LOCAL_LISTENER for the local listener 

Note the SERVICE_NAMES defaults to the global database name, a name comprising the DB_NAME and DB_DOMAIN parameters in the initialization parameter file. The value for the INSTANCE_NAME parameter defaults to the Oracle system identifier (SID).

Saturday, October 11, 2014

OEM 12c Database Monitoring Concepts and Framework

Prepared a concept map for me to understand OEM 12c monitoring concepts and framework:



Thursday, June 12, 2014

Using analytic function can improve query performance greatly

Let's say we have a table: t (x,y,z), how to output the rows of the table with max value of z based on y?

For example if we have (1,1,2) (1,1,3),(2,1,1), we should get (1,1,3).

Below is a test case that demonstrats two approaches: Approach 1 uses subquery resulting in 38 consistent gets; Approach 2 uses analytic approach resulting in 7 consistent gets
SQL>@test
SQL>set echo on
SQL>drop table t;

Table dropped.

SQL>create table t(x number, y number, z number);

Table created.

SQL>
SQL>insert into t values(1,1,2);

1 row created.

SQL>insert into t values(1,1,3);

1 row created.

SQL>insert into t values(2,1,1);

1 row created.

SQL>
SQL>
SQL>set autotrace on
SQL>-- approach 1
SQL>
SQL>select x, y, z
  2  from t
  3  where
  4   exists ( select 1 from
  5            ( select y, max(z) mz
  6              from t
  7               group by y
  8             ) zz
  9             where t.y=zz.y
 10               and t.z=zz.mz
 11          )
 12  ;

         X          Y          Z
---------- ---------- ----------
         1          1          3

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3359768323

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     3 |   117 |     5  (20)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL  | T    |     3 |   117 |     2   (0)| 00:00:01 |
|*  3 |   FILTER             |      |       |       |            |          |
|   4 |    HASH GROUP BY     |      |     1 |    26 |     3  (34)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T    |     1 |    26 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "T" "T" WHERE "Y"=:B1 GROUP
              BY "Y",:B2 HAVING MAX("Z")=:B3))
   3 - filter(MAX("Z")=:B1)
   5 - filter("Y"=:B1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         78  recursive calls
          0  db block gets
         38  consistent gets
          0  physical reads
          0  redo size
        270  bytes sent via SQL*Net to client
        247  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL>-- approach 2
SQL>
SQL>
SQL>select x,y,z from
  2  (
  3     select x,y, z, max(z) over (partition by y) mz
  4     from t
  5  )
  6  where z=mz
  7  ;

         X          Y          Z
---------- ---------- ----------
         1          1          3

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2206009079

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     3 |   156 |     3  (34)| 00:00:01 |
|*  1 |  VIEW               |      |     3 |   156 |     3  (34)| 00:00:01 |
|   2 |   WINDOW SORT       |      |     3 |   117 |     3  (34)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T    |     3 |   117 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("Z"="MZ")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        272  bytes sent via SQL*Net to client
        247  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

Approach 1 is essetially used in a query in one of our production databases that causes performance issue. I suggested we re-write the query using approach 2. My test results of the production-like query show that bad query needs 2M consistent gets to get 6382 rows out while the re-written query needs only about 6k.