Monday, September 21, 2009

Use Ref Cursor in Java and Close it

In one of our production databases, I observed from v$open_cursor that some sessions from weblogic application server could have about 2000 opened cursor count for a particular ref cursor. Usually the weblogic servers are recycled every night. I can observe the count increasing from 0 to about 2000 before recycle each day.

This database has been crash every week from ORA-600 [17074]. In the trace file, the SQL of the process that crashes the database is shown to be that ref cursor. So the cursor was thought to be contributed into the crash at least. (Oracle Support suggests that the bug in the server combined with the condition in our code caused the crash). However, recently the database crash problem has mysteriously disappeared without any sound reasons.

Development team claimed that they close that cursor in their Java code, but I really doubted it as the count is still increasing every day. I am afraid that it is a bomb that will explode some day again.

I have used Java several years ago when I did some course works in school. However, since then, I have never touched Java. I assigned myself a task to simulate the cursor count increasing symptom. It took me some time to figure out how. I even asked that in some forum, but did not get what I want. I eventually found this AskTom post and this site helpful.

The test is done in the Oracle sample schema HR and in a 10G database.

First of all, I created a function in a package that returns a ref cursor. Given a Department ID, the function returns a ref cursor pointing to a list of last names and first names of the employees belonging to that department.

The function looks like:
function sfGetEmpDept
( pDeptID in EMPLOYEES.DEPARTMENT_ID%type)
return csGetResultSet is

csGetEmp csGetResultSet;

begin
open csGetEmp for
SELECT e.last_name, e.first_name, e.department_id
FROM employees e
WHERE e.department_id = pDeptID
ORDER BY e.last_name;

return csGetEmp;
end sfGetEmpDept;


Secondly, I figured out how to use this ref cursor in a Java program through JDeveloper. In the Java code, I called a method getEmpDept 10 times.

em.showOpenCnt( conn, "*** Before anything" );
for(int i = 10;  i <=100; i = i + 10 ) {        
emlist = em.getEmpDept(i);
System.out.println( "Dept ID: " + i +
" Number of employee is " + emlist.size()  );
}
em.showOpenCnt( conn, "*** After everything" );




There is a line in the method getEmpDept which controls whether close the result set or not:

rs.close();

I measure the open curosr count before and after call getEmpDept. The results are as expected:

1. Uncomment out rs.close(), i.e. we close the ref cursor after use it.

Get the Number of emplyee in a Department
*** Before anything
1 opened cursors current
-----------------------
Dept ID: 10 Number of employee is 1
Dept ID: 20 Number of employee is 2
Dept ID: 30 Number of employee is 6
Dept ID: 40 Number of employee is 1
Dept ID: 50 Number of employee is 45
Dept ID: 60 Number of employee is 5
Dept ID: 70 Number of employee is 1
Dept ID: 80 Number of employee is 34
Dept ID: 90 Number of employee is 3
Dept ID: 100 Number of employee is 6
*** After everything
1 opened cursors current

-----------------------
Process exited with exit code 0.

2. Comment out rs.close(), i.e. we don't close the ref cursor in every call
*** Before anything
1 opened cursors current
-----------------------
Dept ID: 10 Number of employee is 1
Dept ID: 20 Number of employee is 2
Dept ID: 30 Number of employee is 6
Dept ID: 40 Number of employee is 1
Dept ID: 50 Number of employee is 45
Dept ID: 60 Number of employee is 5
Dept ID: 70 Number of employee is 1
Dept ID: 80 Number of employee is 34
Dept ID: 90 Number of employee is 3
Dept ID: 100 Number of employee is 6
*** After everything
11 opened cursors current

-----------------------
Process exited with exit code 0.

It is obvious that without rs.close, every time we call the Java method, we will increase the count by 1. With this simulation, I feel a little bit confident to question the development team again as to whether that cursor is really closed

Here can be found the code for the ref cursor and Java class.

[Ed. Sep 28, 2009 -]:
Find another AskTom thread that is very related to this topic

[Updated, Feb 22, 2010 -]:
We had a near-crash siutation by same ORA-600 17074 last Monday night, database was completely hung. Development team eventually identified the problem in a packaged PL/SQL procedure, which returns three ref cursors when called by Java code , however developers only needs to use one of them in the application, so they use it and close that one, but leaving the other two open. After fix, i.e. close the other two also. I observed the max open cursor per session is not even exceeding  100.

Sunday, September 20, 2009

Rename and drop table columns

If we have a relatively big table and we want to modify a column from one type to another, i.e. number to varchar2(50), we may experience very long time. Note if we just change, for example, from varchar2(10) to varchar2(50), it will not be a problem, Oracle can finish in a split of second.

This was what we faced last night during the code release. The column was acutally added in the previous release, but for some reason it has not been used and is containing no data. The problem was that we would run out of the maintenance window if we were just waiting for it to complete (obsevered it would take 4-5 hours from v$session_longops view).

In this situation, the capability of renaming a column name comes to help. We renamed the column with a suffix '_old', and added a new column with the desired data type and name. It completed in seconds.

The syntax looks like:

SQL> alter table myschema.mytab rename column MYCOL to MYCOL_old;
SQL> alter table myschema.mytab add MYCOL varchar2(50);



It is noted in the Oracle online doc that when you rename a column, Oracle updates associated data dictionary tables to ensure that function-based indexes and check constraints remain valid.

So now comes to the question what is the best way to drop those *_old columns? - There are acutully 4 tables with 8 columns were addressed in this way.

Checking the online document again, there is a section in the Admin Guide (9i) that covers some options. I did not see we can do parallel drop of a column. I did notice there is an interesting option: checkpoint, which we may want to consider as our tables are very large. For example:

ALTER TABLE hr.admin_emp DROP UNUSED COLUMNS CHECKPOINT 250;


The doc describes:

" This option causes a checkpoint to be applied after processing the specified number of rows, in this case 250. Checkpointing cuts down on the amount of undo logs accumulated during the drop column operation to avoid a potential exhaustion of undo space."

Thursday, September 03, 2009

My Collection of DBA Scripts

In this post, I intend to list some scripts that I use often for daily DBA work and for trouble-shooting. I will update the list as necessary when new scripts come into play. I guess every DBA has his or her own handy scripts for work.

My List of DBA script:

event.sql

Show the overall picture of system activities by summarizing the session wait event count from v$session_wait. Used when doing health check or trouble-shooting. Check Tanel Poder's opinion about sampling v$session_wait.

get_ddl.sql

Generate DDL SQL statement for creating various objects such as tables, indexes, view etc. When want to know the constraints of a table or find the storage properties, I often find it pretty easy by obtaining the DDL of the table through this script. Also, when I find an index name looks like system-generated, I use this script to get the index DDL, which can tell me if the index is funtion-based and what kind of function it is.

login.sql

Used to give username@SID > as SQL* Plus prompte. Placed at the SQLPATH or working directory. Also give better AUTOTRACE explain plan output.

longops.sql

Display the progress of some long operations, such as table scan if it is long enough (> 6s). Some time it is useful to estimate how soon your SQL will finish.

perf_log_switch_history_daily.sql

Display number of log switch in every hour in a tabular format. Very useful to understand the workload distribution. From Jeff Hunter.
pxses.sql

I support a reporting database, on which parallel queries are frequently executed. This script displays the parallel execution server sessions, which is useful to verify if parallel execution happens and to check the degree of parallelism.

qlocks.sql

Display blocking and blocked session. An example of using this script can be found here.

sesevt.sql

Display a session information given wait event as input.

sessid.sql

Display a session information given SID as input.

sesusr.sql

Display a session information given USERNAME as input.

sqlsid.sql

Display the current running SQL of the session. Input session id (sid). Output the sql text, address, hash value (in 10g, can add sql_id)

sw.sql

Display wait event of a give session by SID. From Tanel Poder.

tabcols.sql

Display table column CBO statistics. Very useful when doing SQL tunning.

tabix.sql

List of the indexes of a table and show on which columns and in which order the indexes are.Very usefull when tune a SQL.

tbs.sql

Show a list of data files of a tablespace.

tf.sql

Show tablespace space usage. Used with tbs.sql when adding space or resize a data file of a tablespace.


xadr.sql

Obtain execution plan from the library cache. Good for 9i (The DB in my current working environment are mostly 9i). Need to input the address of the SQL (obtainable from v$sql or v$sqlarea) and child cursor number. I usually use sesusr.sql to find the SID of a user session given the user name, then use sqlsid.sql to find out the current running SQL, next using this script to find the execution plan.

Wednesday, September 02, 2009

Hit Bug 5888835 on a new Sun T5240 Server and somthing about dreadful latch

I copied a production database (9.2.0.8) to a new server at another data center. The database looked OK after open resetlogs; however, whenever I flushed the shared pool or shutdown the database, I would encounter the ORA-600 [504] error, i.e.

ORA-00600: internal error code, arguments: [504], [0x38006AC18], [160], [7], [shared pool], [2], [0], [0x38006AB28]


Per Oracle support, I hit a bug (Bug 5888835/5508574 ML428226.1 ), which applies to 9.2.0.8 to 10.2.0.3.

Based on ML 428226.1, there could be two causes of this bug:

(1) The latch directory size exceeds 255 when _kgl_latch_count > 31.
(2) Even when the _kgl_latch_count is equal to 0, if the cpu_count is >=32 the bug still applies.

This is due to the default value of _kgl_latch_count is calculated as next prime number after the value returned by CPU_COUNT. So, this bug could still apply if the cpu_count=32 as the _kgl_latch_count would be calculated to the next prime number that would be 37.


So this is about the dreadful latch, which remind me of an unpleasant interview experience, in which I was blind when was asked about this topic.

I did some searching and reading to understand further, below is a Q&A section about some basics:

Q: What is _kgl_latch_count?
A:
On this website ( http://www.ixora.com.au/q+a/library.htm), I found:

It sets the number of child library cache latches. The default is the least prime number greater than or equal to cpu_count. The maximum is 67. It can safely be increased to combat library cache latch contention, as long as you stick to prime numbers. However it is only effective if the activity across the existing child library cache latches is evenly distributed as shown in V$LATCH_CHILDREN


Q: What is parent latch and child latch?
A: A latch can be defined as a latch set. A latch set has a parent latch and several child latches. Parent latches are allocated statistically. Child latches are allocated dynamically. Both parent and child latches share the same latch name. The latches in a set are "chained" together to facilitate statistics gathering. Apparently, library cache latch is defined as a latch set.

Q: What is library cache ?
A:
The primary function of the library cache is to store shared cursors together with their parse trees and execution plans among other objects. The literary cache is structured as a hash table that is accessible through an array of hash buckets. Each hash bucket gives access to a linked list of library object handles. Library cache objects are made of several independent memory heaps and are accessed through the object handles.

Q: What is library cache latch?
A: When a library object is accessed, first, the lock is required on the handle, then the necessary object heaps are pinned. A library cache latch is acquired before the lock request and is released after it. Multiple library cache latch exist, and each one protects a range of hash buckets and the latch covers all associated heaps.

Some 'good' news about latch are also reached to me as the result of this searching and reading, the latch in Oracle is or will be gone! See:

Library cache latches gone in Oracle 11g

Library cache latch has gone?


OK, enough for latch for now. Hopefully, I could say something about latch if I would be interviewed again about this topic. Let's return to the bug, it can be fixed by:

1. set CPU_COUNT <=60.
I tested in this case, no ORA-600 error when shutdown or flush shared pool. Obviously we have 128 CPUs or Oracle thinks so, this setting will not take advantage of it.

2. Upgrade to at least 10.2.0.4

3. Apply the patch.
This was what we finally did. I tested that the patch worked as advertised.


As side notes, the following query can be used to check the value of _kgl_latch_count:

select a.ksppinm aa, b.ksppstvl bb
from x$ksppi a, x$ksppsv b
where a.indx=b.indx
and a.ksppinm like '%latch%count%';


If asked why we have 128 CPUs, I will refer to http://www.sun.com/servers/coolthreads/t5240/. It says this server can up to 128 compute threads. Oracle sees it as 128 CPUs.