Friday, May 10, 2013

A Case of Changing or Creating OSDBA and OSOPER Groups

Goal

Created an Unix user (e.g. tier1 ) in the Oracle database server that can perform some database administration tasks but cannot be used to view or modify any application data.


Current typical Unix/Liux Environment:

- OS user: "oracle" is the Oracle software owner and its primary group is "dba" (i.e. Oracle database files have ownership oracle:dba )

- Assuming OSDBA group is "dba" (i.e. every os user that belongs to "dba" group can login as sysdba and view any data)

Suggested changes


- add a Unix group "oper" if not exist
# /usr/sbin/groupadd -g 503 oper

- add a Unix group "dbax" if not exist
# /usr/sbin/groupadd -g 504 dbax

- Change OSDBA group to "dbax" and define OSOPER group in $ORACLE_HOME/rdbms/lib/config.c, edit and ensure the following:
#define SS_DBA_GRP "dbax"
#define SS_OPER_GRP "oper"
shutdown instance and relink:
  cd $ORACLE_HOME/rdbms/lib/
   mv config.o config.o.orig
    make -f ins_rdbms.mk ioracle

- assign "dbax" as secondary unix group to "oracle"
# /usr/sbin/usermod -g dba -G dbax oracle

- create tier1 user with priamry group as "dba" and secondry group as "oper" # /usr/sbin/useradd -g dba -G oper tier1

Test cases


Test case 1 - "oracle" user can login as sysdba as it belongs to OSDBA group:dbax

[oracle@localhost ~]$ id
uid=500(oracle) gid=54322(dba) groups=504(dbax),54322(dba)
[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.2.0 Production on Thu May 9 13:37:51 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> shutdown 
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup 
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             390073016 bytes
Database Buffers           58720256 bytes
Redo Buffers                6008832 bytes
Database mounted.
Database opened.
SQL>

Test case 2 - "tier1" user can login as sysoper but not as sysdba

[tier1@localhost ~]$ id
uid=502(tier1) gid=54322(dba) groups=54322(dba),54324(oper)
[tier1@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.2.0 Production on Thu May 9 13:50:41 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

SQL> conn / as sysdba
ERROR:
ORA-01031: insufficient privileges


SQL> conn / as sysoper
Connected to an idle instance.
SQL> startup 
ORACLE instance started.
Database mounted.
Database opened.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

Test case 3 - "tier1" and "oracle" belong to same primary OS group "dba", so "tier1" can operate on various Oracle files based on group privillege:

[tier1@localhost 2013_05_09]$ id
uid=502(tier1) gid=54322(dba) groups=54322(dba),54324(oper)
[tier1@localhost 2013_05_09]$ ls -lh o1_mf_1_401_8rr62tq0_.arc
-rw-rw---- 1 oracle dba 7.2M May  9 14:42 o1_mf_1_401_8rr62tq0_.arc
[tier1@localhost 2013_05_09]$ mv o1_mf_1_401_8rr62tq0_.arc /tmp
[tier1@localhost 2013_05_09]$ ls -lh o1_mf_1_401_8rr62tq0_.arc
ls: o1_mf_1_401_8rr62tq0_.arc: No such file or directory


Thursday, March 28, 2013

Problems of Temporary Tablespace Usage after a Recent Release

After a recent release, we’ve encountered two problems in a production database:  (1) Temporary tablespace usage high; (2) Instance shutdown when temporary tablespace space used up.
 
Problem 2 turns out to be an Oracle feature:

 

Root cause of crash appears to be "Bug" 7691270 - Crash the DB in case of write errors (rather than just offline files) [ID 7691270.8]

Datafile Write Errors (Oracle Database 11gR1 to 11gR2 Change):

Starting with the 11.2.0.2 patchset for Oracle Database 11g Release 2, a write error to any data file will cause the database to perform a shutdown abort of the instance.

In prior releases, I/O errors to datafiles not in the system tablespace would offline the respective datafiles when the database is in archivelog mode. This behavior is not always desirable. Some customers would prefer that the instance crash due to a datafile write error. A new hidden parameter called _datafile_write_errors_crash_instance has been introduced to control whether the instance should crash on a write error or if the datafiles should be taken offline on a write error.

If _datafile_write_errors_crash_instance = TRUE (default) then any write to a datafile which fails due to an IO error causes an instance crash.
If _datafile_write_errors_crash_instance = FALSE then the behavior reverts to the previous behavior (before this fix) such that a write error to a datafile offlines the file (provided the DB is in archivelog mode and the file is not in SYSTEM tablespace in which case the instance is aborted)

Best Practices: Monitor temporary tablespace usage with alerts, tune SQL to use less temporary space.

 
The increasing usage of temporary space after the release was due to the increasing usage of temporary lobs, which reside in the temporary tablespace. For example, there were 30+ queries using to_clob function in the release. Oracle won’t release temp lob space in the Weblogic connection unless we recycle the Weblogic app servers. There are several related MOS notes about this behavior:
 

  • How to Release the Temp LOB Space and Avoid Hitting ORA-1652 (Doc ID 802897.1)
  • How to Release Temporary LOB Segments without Closing the JDBC Connection (Doc ID 1384829.1)
  • Bug 5723140 - Temp LOB space not released after commit (Doc ID 5723140.8)
  • Temporary LOB space not released after commit: ora-1652 being hit (Doc ID 750209.1)


 

In the above notes, the fix to the temp lob space releasing problem is to set 60025 event. Below is a simple test to demonstrate this event works:







In Session 1, execute the following statement

select to_clob(text) from dba_source where rownum <100;



In session 2 monitoring session 1 temp space usage with:

SELECT   S.sid || ',' || S.serial# sid_serial,
         S.username,
         S.osuser,
         P.spid,
         S.module,
         S.machine,
         S.program,
         SUM (T.blocks) * TBS.block_size / 1024 / 1024 mb_used,
         T.tablespace,
         T.segtype,
         COUNT(*) sort_ops
FROM     v$sort_usage T, v$session S, dba_tablespaces TBS, v$process P
WHERE    T.session_addr = S.saddr
AND      S.paddr = P.addr
and    s.sid=2927   --- this is the sid of session 1
AND      T.tablespace = TBS.tablespace_name
GROUP BY S.sid, S.serial#, S.username, S.osuser, P.spid, S.module,
         S.machine, S.program, TBS.block_size, T.tablespace,T.segtype 
ORDER BY mb_used 
/


You will see even after session 1 finish and idle there, in session 2 you will have 1MB temp space allocated to it


Second test:

In session 1, before run that query do:
alter session set events '60025 trace name context forever';


Then in session 2, you will see after session 1 becomes idle, no temp space allocated.






We did not set this event in our production to fix the issue, instead, dev team change the code and avoid use temporary lob. To demonstrate difference of the temp space usage before and after release and after the fix, I queried dba_hist_active_sess_history view.  From 11g there is a field called TEMP_SPACE_ALLOCATED in this view. So it is possible to calculate the average temp space allocated per active session during a certain period of time:

 





===== Look at all active session between 16:00 ~ 16:59 in past 30 days in dba_hist_active_sess_history
select stime, size_m/c avg_mb_sess from
(
  select to_char(sample_time, 'YYYY-MM-DD HH24') stime,count(*) c, round(sum(TEMP_SPACE_ALLOCATED)/1024/1024) size_m
  from dba_hist_active_sess_history
  where to_char(sample_time, 'HH24') = 16 and sample_time > sysdate-30
  group by to_char(sample_time, 'YYYY-MM-DD HH24')
)
order by 1

STIME         AVG_MB_SESS
------------- -----------
2013-02-26 16   .00135459
2013-02-27 16  .002029264
2013-02-28 16  .000963298
2013-03-01 16  .000670166
2013-03-02 16  .001373417
2013-03-03 16   .00049838
2013-03-04 16  .000991616
2013-03-05 16  1.49810976
2013-03-06 16  1.12322684
2013-03-07 16  .000634719
2013-03-08 16  .001408054
2013-03-09 16  .081960706
2013-03-10 16  .000473485
2013-03-11 16  1.21929313
2013-03-12 16  .001113121
2013-03-13 16  .001084893
2013-03-14 16  .000864518
2013-03-15 16  1.70157371
2013-03-16 16  .114989041
2013-03-17 16  231.198656
2013-03-18 16  254.193626
2013-03-19 16  96.2858973
2013-03-20 16   93.731513
2013-03-21 16  135.284032
2013-03-22 16    1.462451
2013-03-23 16  .000423101
2013-03-24 16  .000462321
2013-03-25 16  .000395163
2013-03-26 16  16.1351924
2013-03-27 16  1.84635633



The release was happened on Saturday night of Mar 16 and Dev team fixed code on Mar 21 night completely. It can be seen, from Mar 17 till Mar 21, the average temp space allocated per active sessions were much higher compared to those before release and after the fix. During Mar 17 and Mar 21, what we did was that once the total temp space reached 150 GB, we recycled the app servers  in a controlled manner to release the temp space.

Friday, November 02, 2012

SQL Server: INSERT INTO ... SELECT vs SELECT ... INTO

In a forum discussion ( in Chinese), a general comment implies that 'SELECT ... INTO' is faster than 'INSERT INTO ..SELECT' to load data to a table because of the minimal logging. However, I checked BOL document and found that we can also achieve minimal logging with 'INSERT INTO ... SELECT' construct given the requirements met as described below:

Using INSERT INTO .. SELECT to Bulk Load Data with Minimal Logging

You can use INSERT INTO SELECT FROM to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction . Minimal logging for this statement has the following requirements:

- The recovery model of the database is set to simple or bulk-logged.
- The target table is an empty or nonempty heap.
- The target table is not used in replication.
- The TABLOCK hint is specified for the target table.


BOL also describes the logging behaviour of SELECT ... INTO clause:

The amount of logging for SELECT...INTO depends on the recovery model in effect for the database. Under the simple recovery model or bulk-logged recovery model, bulk operations are minimally logged. With minimal logging, using the SELECT... INTO statement can be more efficient than creating a table and then populating the table with an INSERT statement.

Seeing is believing. So I planed several test cases in my SQL Express 2012 instance. In these test cases, I used different methods to populate 1M rows from an existing table called big_table to a new table called my_table.

Test case 1 using "SELECT ... INTO"
use dbadmin
go

drop table my_table;
go
  
DBCC SHRINKFILE (N'DBAdmin_log' , 0, TRUNCATEONLY)
go

declare @size_s  float 
declare @size_e  float

SELECT  @size_s = cntr_value
FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'DBAdmin'

print 'start : ' + str(@size_s)

select  b.scattered_data, b.vc_small, b.id  into my_table
from big_table b 
 
SELECT  @size_e = cntr_value
FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'DBAdmin'

print 'end  : ' +  str(@size_e)

PRINT 'size used (KB) =' + convert(varchar(30), @size_e - @size_s) 


I configured the database DBADMIN to different recovery mode and run the above code, here are the output

/*  test run 1 - simple 

(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        428

(1000000 row(s) affected)
end  :        996
size used (KB) =568
*/

/* test run 2 - bulk-logged 
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        450

(1000000 row(s) affected)
end  :       1078
size used (KB) =628

*/

/* test run 3  - full 

(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        432

(1000000 row(s) affected)
end  :      33276
size used (KB) =32844

*/

It can be seen that with Simple or Bulk-logged mode, the transaction log used only 500-600 KB, while with Full recovery mode it is about 32 MB. After that, I have tested "INSERT INTO ... SELECT" with different recovery mode and also check the effect of 'TABLOCK' hint

Test case 2 using "INSERT INTO ... SELECT"
use dbadmin
go

drop table my_table;
go

create table my_table(
  A  int,
  B  varchar(10),
  C  int
  );

  
DBCC SHRINKFILE (N'DBAdmin_log' , 0, TRUNCATEONLY)
go

declare @size_s  float 
declare @size_e  float

SELECT  @size_s = cntr_value
FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'DBAdmin'

print 'start : ' + str(@size_s)

-- insert into my_table (A,B,C)
insert into my_table with (TABLOCK) (A,B,C) 
select  b.scattered_data, b.vc_small, b.id 
from big_table b 
 
SELECT  @size_e = cntr_value
FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'DBAdmin'

print 'end  : ' +  str(@size_e)

PRINT 'size used (KB) =' + convert(varchar(30), @size_e - @size_s)

Test results:
/* test run 1 - Simple
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        450

(1000000 row(s) affected)
end  :       1009
size used (KB) =559

*/

/* test run 2  - Bulk-logged
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        450

(1000000 row(s) affected)
end  :       1017
size used (KB) =567

*/

/*   test run 3 -- Full 
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        450

(1000000 row(s) affected)
end  :      33408
size used (KB) =32958

*/

/* test run 4  -- Simple but without TABLOCK hint (Full, Bulk-logged have similar values)

(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
start :        428

(1000000 row(s) affected)
end  :     125834
size used (KB) =125406
*/

In the first three test run, I used the "TABLOCK" hint. It can be seen that in terms of the amount of transaction log used, using "INSERT INTO ... SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO" regardless of recovery mode. Without the "TABLOCK" hint, regardless of the recovery mode, using "INSERT INTO ... SELECT" will generate same amount of transaction log and 4 times more than that in the case of FULL recovery mode with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "INSERT INTO ... SELECT" is slow may be because they don't always remember to add this hint when appropriate.

Wednesday, October 31, 2012

Drop indexes that have never been used


Here is the plan:


1. Identify the indexes that are not appeared in the execution plan


create table index_drop_candidates
as
select owner, index_name
from dba_indexes
where owner='INDEXOWNER'
minus
(
select object_owner, object_name
from v$sql_plan
where object_type ='INDEX'
and object_owner='INDEXOWNER'
union
select object_owner, object_name
from dba_hist_sql_plan
where object_type ='INDEX'
and object_owner='INDEXOWNER'
);


2. Exclude those indexes used for constraints


delete from index_drop_candidates
where index_name in
(
select constraint_name from dba_constraints where owner='INDEXOWNER' and constraint_type in ('P', 'U', 'R') ) ;



3. Run the following delete statement manually or put in the cron from time to time for some time

delete from index_drop_candidates
where index_name in
(
select object_name
from v$sql_plan
where object_type ='INDEX'
and object_owner='INDEXOWNER'
union
select object_name
from dba_hist_sql_plan
where object_type ='INDEX'
and object_owner='INDEXOWNER'
);


4. After some time, change those indexes still in the index_drop_candidates table to be invisible to observe any performance impact and decide if drop them accordingly



Note: There are cases that indexes have never been used in the execution plan, but they are needed to help CBO to choose a better plan. If such indexes are dropped, performance could be negatively impacted.

Friday, October 19, 2012

Install Sample AdventureWorksLT2012 OLTP database on my SQL Server 2012 Express

I downloaded the mdf file from http://msftdbprodsamples.codeplex.com/releases/view/55330 and placed it at C:\mssql_data. The in SSMS window, I issued:
CREATE DATABASE AdventureWorks2012
ON (FILENAME = 'C:\mssql_data\AdventureWorksLT2012_Data.mdf')
FOR ATTACH_REBUILD_LOG ;

The message recieved are as follows:
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorksLT2012_log.ldf" may be incorrect.
New log file 'C:\mssql_data\AdventureWorksLT2012_Data_log.ldf' was created.
Converting database 'AdventureWorks2012' from version 705 to the current version 706.
Database 'AdventureWorks2012' running the upgrade step from version 705 to version 706.

It should be noted that I have to change the file permission in order to install it sucessfully. The screen shot for the file permission is shown below:
The "full control" should be selected.