Thursday, June 12, 2014

Using analytic function can improve query performance greatly

Let's say we have a table: t (x,y,z), how to output the rows of the table with max value of z based on y?

For example if we have (1,1,2) (1,1,3),(2,1,1), we should get (1,1,3).

Below is a test case that demonstrats two approaches: Approach 1 uses subquery resulting in 38 consistent gets; Approach 2 uses analytic approach resulting in 7 consistent gets
SQL>@test
SQL>set echo on
SQL>drop table t;

Table dropped.

SQL>create table t(x number, y number, z number);

Table created.

SQL>
SQL>insert into t values(1,1,2);

1 row created.

SQL>insert into t values(1,1,3);

1 row created.

SQL>insert into t values(2,1,1);

1 row created.

SQL>
SQL>
SQL>set autotrace on
SQL>-- approach 1
SQL>
SQL>select x, y, z
  2  from t
  3  where
  4   exists ( select 1 from
  5            ( select y, max(z) mz
  6              from t
  7               group by y
  8             ) zz
  9             where t.y=zz.y
 10               and t.z=zz.mz
 11          )
 12  ;

         X          Y          Z
---------- ---------- ----------
         1          1          3

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3359768323

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     3 |   117 |     5  (20)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL  | T    |     3 |   117 |     2   (0)| 00:00:01 |
|*  3 |   FILTER             |      |       |       |            |          |
|   4 |    HASH GROUP BY     |      |     1 |    26 |     3  (34)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T    |     1 |    26 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "T" "T" WHERE "Y"=:B1 GROUP
              BY "Y",:B2 HAVING MAX("Z")=:B3))
   3 - filter(MAX("Z")=:B1)
   5 - filter("Y"=:B1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         78  recursive calls
          0  db block gets
         38  consistent gets
          0  physical reads
          0  redo size
        270  bytes sent via SQL*Net to client
        247  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL>-- approach 2
SQL>
SQL>
SQL>select x,y,z from
  2  (
  3     select x,y, z, max(z) over (partition by y) mz
  4     from t
  5  )
  6  where z=mz
  7  ;

         X          Y          Z
---------- ---------- ----------
         1          1          3

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2206009079

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     3 |   156 |     3  (34)| 00:00:01 |
|*  1 |  VIEW               |      |     3 |   156 |     3  (34)| 00:00:01 |
|   2 |   WINDOW SORT       |      |     3 |   117 |     3  (34)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T    |     3 |   117 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("Z"="MZ")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        272  bytes sent via SQL*Net to client
        247  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

Approach 1 is essetially used in a query in one of our production databases that causes performance issue. I suggested we re-write the query using approach 2. My test results of the production-like query show that bad query needs 2M consistent gets to get 6382 rows out while the re-written query needs only about 6k.

No comments: