For example if we have (1,1,2) (1,1,3),(2,1,1), we should get (1,1,3).
Below is a test case that demonstrats two approaches: Approach 1 uses subquery resulting in 38 consistent gets; Approach 2 uses analytic approach resulting in 7 consistent gets
SQL>@test
SQL>set echo on
SQL>drop table t;
Table dropped.
SQL>create table t(x number, y number, z number);
Table created.
SQL>
SQL>insert into t values(1,1,2);
1 row created.
SQL>insert into t values(1,1,3);
1 row created.
SQL>insert into t values(2,1,1);
1 row created.
SQL>
SQL>
SQL>set autotrace on
SQL>-- approach 1
SQL>
SQL>select x, y, z
2 from t
3 where
4 exists ( select 1 from
5 ( select y, max(z) mz
6 from t
7 group by y
8 ) zz
9 where t.y=zz.y
10 and t.z=zz.mz
11 )
12 ;
X Y Z
---------- ---------- ----------
1 1 3
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3359768323
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 117 | 5 (20)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | T | 3 | 117 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | HASH GROUP BY | | 1 | 26 | 3 (34)| 00:00:01 |
|* 5 | TABLE ACCESS FULL| T | 1 | 26 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "T" "T" WHERE "Y"=:B1 GROUP
BY "Y",:B2 HAVING MAX("Z")=:B3))
3 - filter(MAX("Z")=:B1)
5 - filter("Y"=:B1)
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
78 recursive calls
0 db block gets
38 consistent gets
0 physical reads
0 redo size
270 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL>-- approach 2
SQL>
SQL>
SQL>select x,y,z from
2 (
3 select x,y, z, max(z) over (partition by y) mz
4 from t
5 )
6 where z=mz
7 ;
X Y Z
---------- ---------- ----------
1 1 3
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2206009079
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 156 | 3 (34)| 00:00:01 |
|* 1 | VIEW | | 3 | 156 | 3 (34)| 00:00:01 |
| 2 | WINDOW SORT | | 3 | 117 | 3 (34)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T | 3 | 117 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Z"="MZ")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
272 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
Approach 1 is essetially used in a query in one of our production databases that causes performance issue.
I suggested we re-write the query using approach 2. My test results of the production-like query show that
bad query needs 2M consistent gets to get 6382 rows out while the re-written query needs only about 6k.