Monday, May 25, 2009

Gather table column statistics

In a 9i production database, a table has a column that is subjected to have histogram. In the current script, there are two steps to gather the statistics of this table. The command looks similiar to the following:

exec dbms_stats.gather_table_stats('MY_SCHM','MY_TAB', -
estimate_percent =>20, -
cascade => TRUE,degree => 16);

exec dbms_stats.gather_table_stats('MY_SCHM','MY_TAB', -
method_opt => 'FOR COLUMNS SIZE auto MY_TAB_ID',-
degree => 8);

The problem of this approach is when the first gather_table_stats finishes, the cursors involving that table will be invalidated, and Oracle will not have correct statistics about that column untill the second gather_table_stats finishes. (Note: in 9i, the default value of method_opt parameter is 'FOR ALL COLUMNS SIZE 1', in 10g, it has been changed to 'FOR ALL COLUMNS SIZE AUTO')

Possible solutions:

(1) Keep the second step, change the first step to the use following command:
exec dbms_stats.gather_table_stats('MY_SCHM','MY_TAB', -
estimate_percent =>20, -
cascade => TRUE,degree => 16, -
no_invalidate=>true);

By specifying no_invalidate=>true, we tell Oracle not to invalidate the cursor currently in the shared pool.

(2) Combine the two steps into one, as shown here:

exec dbms_stats.gather_table_stats('MY_SCHM','MY_TAB', -
estimate_percent =>20, -
method_opt => 'FOR COLUMNS SIZE auto MY_TAB_ID, FOR ALL COLUMNS size 1',-
cascade => TRUE,degree => 16);


The following test demonstrated that we can have several 'FOR COLUMNS' for the method_opt parameter



SQL> create table t
2 as
3 select
4 level c1,
5 trunc(level/20) c2,
6 level+ 5 c3,
7 trunc(dbms_random.value(1, 100)) c4
8 from dual
9 connect by level <=5000; Table created. SQL>
SQL>
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(user,'T',
3 method_opt => 'FOR COLUMNS size 10 c1, FOR columns size 20 c2, FOR COLUMNS size 5 c3, FOR ALL COLUMNS size 1');
4 end;
5 /

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, column_name, num_distinct, num_buckets,
2 last_analyzed from user_tab_col_statistics where
3 table_name='T';

TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS LAST_ANALYZED
---------- ----------- ------------ ----------- -------------------
T C1 5000 10 2009-05-25 09:26:39
T C2 251 20 2009-05-25 09:26:39
T C3 5000 5 2009-05-25 09:26:39
T C4 99 1 2009-05-25 09:26:39

No comments: