Sunday, November 27, 2011

Reorganize Table and Index

I manage a database for the retail ordering application. The orders that were completed 90 days ago are subject to archiving. Due to various reasons there were about 80 million archiving backlog about two months ago. Since the backlog is caught up now, reorganization of the schema objects is desirable in order to reclaim space.

First of all, I need to estimate how much space can be saved potentially after the reorganization. For table segment, I use DBMS_SPACE.create_table_cost to estimate the size after reorganization. Given the number of row and average row size and the target tablespace, the following script can be used to estimate the size if the table is moved to the target tablespace.

--  script: tabsize_est.sql 
--  get the avg row size and row cnt from analyze job
--  dba_tables
-- @tabsize_est 'table_size' 100   10000000

set verify off feedback off
set SERVEROUTPUT ON
DECLARE
  l_ddl          VARCHAR2(500);
  l_used_bytes   NUMBER;
  l_alloc_bytes  NUMBER;
  l_message      varchar2(100);
BEGIN
  -- Estimate the size of a new table on the USERS tablespace.
  l_message := '&1';
  DBMS_SPACE.create_table_cost (
            tablespace_name => 'TTQ_ORD_DATA_4',
            avg_row_size    => &2,
            row_count       => &3 ,
            pct_free        => 10 ,
            used_bytes      => l_used_bytes,
            alloc_bytes     => l_alloc_bytes);

  DBMS_OUTPUT.put_line (  l_message  || ' Mbytes_allocated= ' || l_alloc_bytes/1024/1024 || ' Mbytes');
END;
/


I use the following script to generate command for each table:

--  script: tabsize_est_gen.sql
--  purpose: generate 
select '@@tabsize_est ' ||''' ' || t.owner || ' ' || t.table_name || ' ' || t.tablespace_name || ' ' 
     || trunc(s.bytes/1024/1024)  || ''' '||  t.avg_row_len || ' ' || t.num_rows
from
dba_segments s,
dba_tables t
where s.owner=t.owner
and  s.segment_name=t.table_name
and t.owner in  ('TTQ_ORD')
and  s.bytes > 1000000000
/



The output from above script looks like:

@@tabsize_est ' TTQ_ORD CUST_ORDER_ACTIVITY TTQ_ORD_DATA_2 2611' 139 15874503
@@tabsize_est ' TTQ_ORD CUST_HIGH_RISK_20110425 TTQ_ORD_DATA_2 1349' 82 14246227
@@tabsize_est ' TTQ_ORD ENCDEC_MIGRATION TTQ_ORD_DATA_4 1400' 29 2257790
@@tabsize_est ' TTQ_ORD ACCOUNT_COMMENTS TTQ_ORD_DATA_4 13600' 146 82486887
@@tabsize_est ' TTQ_ORD BUNDLE_RECON_TRACKING TTQ_ORD_DATA_3 7060' 957 6294040
@@tabsize_est ' TTQ_ORD ISP_MESSAGE_TRACKING TTQ_ORD_DATA_3 30480' 815 31089530
...

Then, executing above commands, I can obtain current and estimated table segment size as shown below:

TTQ_ORD CUST_ORDER_ACTIVITY TTQ_ORD_DATA_2 2611 Mbytes_allocated= 2600 Mbytes
TTQ_ORD CUST_HIGH_RISK_20110425 TTQ_ORD_DATA_2 1349 Mbytes_allocated= 1400 Mbytes
TTQ_ORD ENCDEC_MIGRATION TTQ_ORD_DATA_4 1400 Mbytes_allocated= 200 Mbytes
TTQ_ORD ACCOUNT_COMMENTS TTQ_ORD_DATA_4 13600 Mbytes_allocated= 13600 Mbytes
TTQ_ORD BUNDLE_RECON_TRACKING TTQ_ORD_DATA_3 7060 Mbytes_allocated= 7200 Mbytes
TTQ_ORD ISP_MESSAGE_TRACKING TTQ_ORD_DATA_3 30480 Mbytes_allocated= 30400 Mbytes
...

To estimate the size of index segment after rebuild, I use a script called index_est_proc_2.sql by Joanathan Lewis (http://jonathanlewis.wordpress.com/index-sizing/).

I use EXCEL to calcuate the size difference between the estimated size and current size for each index and table. As a result I can estimate the total space gain. Of course when I plan to do reoraganization, I can start with the segment which would give the most space gain.

The space gain estimated is at tabespace level. To actually release the space at OS level, the datafile needs to be shrinked. For this it is very useful to know which segments reside at the end of datafile. Those segments should be moved first to allow the datafile shrikable.

The following script is used to list the segment in each data file ordered by its max(block_id). The segments that reside at the end of datafile should have large max(block_id).


-- find segment at the end of the data files
-- so if this segment is removed, the data file could be resized down

spool shrinkdf_&tbs..log

col sum_mbytes format 999,999
col  diff format 999,999,999

select name from v$database;

break on file_id skip 1
select file_id, segment_name, sum_mbytes, count_blocks, max_block_id, 
max_block_id -  lead(max_block_id,1) over (partition by file_id order by max_block_id desc )  diff
from
(
select /*+ RULE */ 
file_id
, segment_name
, sum(bytes/1014/1024)  sum_mbytes
, count(block_id) count_blocks
, max(block_id) max_block_id
from dba_extents
where tablespace_name = upper('&&tbs') 
group by file_id, segment_name
order by 1, 5 desc,4
)
/

spool off

undef tbs


Sample outout from the above script looks like:

FILE_ID SEGMENT_NAME                   SUM_MBYTES COUNT_BLOCKS MAX_BLOCK_ID         DIFF
---------- ------------------------------ ---------- ------------ ------------ ------------
       227 ORDER_TRANS                         1,818            9      1868809      128,000
           AUDIT_TRAIL                         1,010            5      1740809       25,600
           ORD_ADDRESS                           808            4      1715209       51,200
           MASTER_ACCOUNT                        202            1      1664009      179,200
           ORD_TRANS_DETAILED_STATUS             202            1      1484809       51,200
           ORDER_MASTER                          202            1      1433609      153,600
           MASTER_SERVICE                        202            1      1280009       25,600
           ACCOUNT_HISTORY                       202            1      1254409       25,600
           ACCOUNT_COMMENTS                      202            1      1228809       51,200
           ORD_ACCOUNT                           202            1      1177609       51,200
           SUPP_HISTORY                          202            1      1126409      102,400
           CREDIT_CHECK                          202            1      1024009      153,600
           ORD_CONTACT                           404            2       870409       25,600
           ORD_TRANS_NOTES_HISTORY               404            2       844809      102,400
           ORD_DSL                               404            2       742409



It is clear that for the datafile with file_id=227, ORDER_TRANS is at the end, followed by AUDIT_TRAIL.

Finally, the following script can be used to generate the resize statement:


rem  script: red_df.sql 
rem 
rem  Purpose: This gives the Total size of datafiles in a tablespace
rem         and also the size to which it can be reduced.
rem     It generated rdf.log spool file with commands to resize datafiles
rem  Usage: @red_df.sql <TABLESPACE>
rem 
rem  Note: I got this script from a colleague but I once saw this script in AskTom website 

set verify off

col tsname  format         a24    justify c heading 'Tablespace'
col nfrags  format     9,999,990  justify c heading 'Free|Frags'
col mxfrag  format   999,990.999  justify c heading 'Largest|Free Frag'
col totsiz  format 9,999,990.999  justify c heading 'Total|Mbytes'
col avasiz  format 9,999,990.999  justify c heading 'Available|Mbytes'
col pctusd  format       990.99   justify c heading 'Percent|Used'

set pagesize 200
set linesize 120

select
total.tablespace_name                       tsname,
count(free.bytes)                           nfrags,
round(nvl(max(free.bytes)/1048576,0),2)              mxfrag,
total.bytes/1048576                         totsiz,
round(nvl(sum(free.bytes)/1048576,0),2)              avasiz,
round((1-nvl(sum(free.bytes),0)/total.bytes)*100,2)  pctusd
from
(select tablespace_name, sum(bytes) bytes
from dba_data_files
where tablespace_name = upper('&1')
group by tablespace_name)  total,
dba_free_space  free
where
total.tablespace_name = free.tablespace_name(+)
and total.tablespace_name = upper('&1')
group by
total.tablespace_name,
total.bytes;


set verify off
set linesize 150
set echo off

column file_name format a60 word_wrapped
column smallest format 999,990 heading "Smallest|Size|Poss."
column currsize format 999,990 heading "Current|Size"
column savings format 999,990 heading "Poss.|Savings" 
break on report
compute sum of savings on report

column value new_val blksize
select value from v$parameter where name = 'db_block_size'
/

select file_name,
ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) smallest, 
ceil( blocks*&&blksize/1024/1024) currsize,
ceil( blocks*&&blksize/1024/1024) -
ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) savings
from dba_data_files a,
( select file_id, max(block_id+blocks-1) hwm 
from dba_extents
group by file_id ) b
where a.file_id = b.file_id(+)
and a.tablespace_name = upper('&1')
/

spool rdf.log
column cmd format a75 word_wrapped
set pagesize 4000

select 'alter database datafile '''||file_name||''' resize ' ||
ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) || 'm;' cmd 
from dba_data_files a,
( select file_id, max(block_id+blocks-1) hwm
from dba_extents
group by file_id ) b
where a.file_id = b.file_id(+)
and ceil( blocks*&&blksize/1024/1024) - 
ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) > 0
and a.tablespace_name = upper('&1')
/
spool off

Monday, November 07, 2011

Cloud Computing and DBA career

Recently, in reply to a question in the MITBBS database forum( in Chinese) –“Will the DBA career become down-hill and the future dimmed with the growth of Cloud Computing and NoSQL movement ?”, I made the following comments:


My understanding till today: NOSQL is just a special purpose database (or data model), it cannot replace traditional general purpose RDBMS. (The market of NoSQL is small). It has little impact on DBA’s career.


Cloud computing provides more choices for a company for its IT infrastructure. Basically three choices: traditional in-house IT; outsource IT totally ( i.e. to public cloud provider, pay-per-use ); consolidating in house IT infrastructure on private cloud; or maybe the fourth choice, some kind of mixing of all above. I think the demanding for DBA role (or system admin, network, storage admin in this regard) will be reduced when a company move part (or all) of its IT to cloud (either public or private).


Demanding for traditional DBA role is not only being affected by emerging of cloud computing. It has been affected by more and more self-managing and automation of RDBMS anyway ( In USA that role is being outsourced to India and China etc)


Obviously if there is a trend to move to cloud with significance, there will be a demanding for “cloud system/support administrator” It should not so difficult for a true DBA ( means he is always willing to learn new things) to transition his role to a cloud administrator.


After making the comments, I googled more and found the following article interesting: “The tech jobs that the cloud will eliminate” ( http://www.infoworld.com/d/adventures-in-it/tech-jobs-cloud-will-eliminate-008 ). There is also a two-part article describing Cloud IT roles here: http://open.eucalyptus.com/learn/cloud-it-roles. It indicated a DBA’s skill-set is extendable to assume Cloud Data Architect role.

I wrote a Perl script to non-interactively search dice.com to get the count of job listing with respect to some keywords. I scheduled to do this count every day. Hope this will give me some kind of insights into the trend about the DBA job market and the impact by Cloud Computing.

For example, the output of the script of yesterday looked like:

2011-11-06 07:59 System+Administrator 5688
2011-11-06 07:59 DBA 2799
2011-11-06 07:59 Oracle+DBA 1570
2011-11-06 07:59 Cloud+Administrator 277
2011-11-06 07:59 Cloud+DBA 74