Saturday, October 08, 2011

SQL Server - Troubleshooting Deadlock

There is a thread  in the SQLTeam forum that describes a deadlock scenario:
The application inserts a row into a status table. A trigger on the table fires so that when the status id inserted is a certain value it will call a stored procedure, which inserts a row into an accounting table. This whole thing is wrapped in a transaction so it should all be getting rolled back when there are errors. On occasion two threads of the application insert into the status table at nearly the same millisecond, and when both end up calling the stored procedure to insert into the accounting table a deadlock occurs and one is killed.

Two important details were unveiled during the discussion:
(1) the whole transaction took place with isolation level set to be serializable
(2) in the stored procedure, there is an existence check query doing full table scan  on the status table before inserting the row into it.

I try to simulate the deadlock situation for my own better understanding, especially this is the first time I know how to use DBCC trace to troubleshooting deadlock. It is a good practise.

First of all, I created the following test table:


-- create a test table

USE testdata
go

IF OBJECT_ID ( 'mytable', 'U' ) IS NOT NULL
DROP TABLE mytable;
GO


CREATE TABLE mytable
 (
  id  INT,
  val INT,
  padding VARCHAR(4000),
  );

create unique clustered index index01
on mytable(id);

INSERT INTO mytable
SELECT 1, 1, REPLICATE('a', 4000);

INSERT INTO mytable
SELECT 2, 2, REPLICATE('a', 4000);

INSERT INTO mytable
SELECT 3, 3, REPLICATE('a', 4000);

INSERT INTO mytable
SELECT 4, 4, REPLICATE('a', 4000);

go

delete from mytable where id in (2,4);

-- show the location of rows
select
       a.%%physloc%%                          AS Address,
       sys.fn_PhysLocFormatter(a.%%physloc%%) AS AddressText,
       a.id
FROM   mytable a
ORDER BY 2;
go


The remaining two rows in the mytable are located in different page as shown below:

Address    AddressText            id
---------- ---------------------- -----------
0x99000000 (1:153:0)                        1
0xFF420000 (1:17151:0)                      3

Secondly I turn on the deadlock trace:

1> DBCC traceon(1222, -1)
2> go
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Then I run the following two scripts in two different SSMS windows:


-- script: ins_1.sql 
use testdata
go

set transaction isolation level serializable
begin tran
go
if not exists (select id from mytable where val = 2)
begin
  waitfor delay '00:01:00'
  insert into mytable values (2,2, replicate('a','4000'))
end
go
commit tran
go

-- script: ins_2.sql
use testdata
go

set transaction isolation level serializable
begin tran
go
if not exists (select id from mytable where val = 4)
begin
  waitfor delay '00:01:00'
  insert into mytable values (4,4, replicate('a','4000'))
end
go
commit tran
go







Withing the 1 min delay, run sp_lock procedure, I can see the key-range locks are requested by the two sessions:

spid   dbid   ObjId       IndId  Type Resource                         Mode     Status
------ ------ ----------- ------ ---- -------------------------------- -------- ------
53     8      0           0      DB                                    S        GRANT
52     8      0           0      DB                                    S        GRANT
52     8      0           0      MD   49(90c8bd04:fb81833:777569)      Sch-S    GRANT
53     8      0           0      MD   49(ead7fb30:916efee:7da7e5)      Sch-S    GRANT
53     8      66099276    1      PAG  1:153                            IS       GRANT
52     8      66099276    1      PAG  1:153                            IS       GRANT
53     8      66099276    1      PAG  1:17151                          IS       GRANT
52     8      66099276    1      PAG  1:17151                          IS       GRANT
53     8      66099276    0      TAB                                   IS       GRANT
52     8      66099276    0      TAB                                   IS       GRANT
53     8      66099276    1      KEY  (ffffffffffff)                   RangeS-S GRANT
52     8      66099276    1      KEY  (ffffffffffff)                   RangeS-S GRANT
53     8      66099276    1      KEY  (03000d8f0ecc)                   RangeS-S GRANT
52     8      66099276    1      KEY  (03000d8f0ecc)                   RangeS-S GRANT
53     8      66099276    1      KEY  (010086470766)                   RangeS-S GRANT
52     8      66099276    1      KEY  (010086470766)                   RangeS-S GRANT



Key-range locks protect a range of rows implicitly included in a record set being read by a Transact-SQL statement while using the serializable transaction isolation level. The serializable isolation level requires that any query executed during a transaction must obtain the same set of rows every time it is executed during the transaction. A key range lock protects this requirement by preventing other transactions from inserting new rows whose keys would fall in the range of keys read by the serializable transaction.


After about 1 min, one of the sessions received the following message:

Msg 1205, Level 13, State 48, Line 4
Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3902, Level 16, State 1, Line 1
The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.


In the error log, I can find the following:
--------------------- start of error log  ------------------

2011-10-08 11:01:06.38 spid55      DBCC TRACEON 1222, server process ID (SPID) 55. This is an informational message only; no user action is required.
2011-10-08 11:14:42.43 spid16s     deadlock-list
2011-10-08 11:14:42.43 spid16s      deadlock victim=process3313558
2011-10-08 11:14:42.43 spid16s       process-list
2011-10-08 11:14:42.43 spid16s        process id=process3313558 taskpriority=0 logused=132 waitresource=KEY: 8:72057594047889408 (03000d8f0ecc) waittime=3935 ownerId=623111 transactionname=user_transaction lasttranstarted=2011-10-08T11:13:38.497 XDES=0x574bb30 lockMode=RangeI-N schedulerid=4 kpid=8728 status=suspended spid=52 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2011-10-08T11:13:38.497 lastbatchcompleted=2011-10-08T11:13:38.497 clientapp=Microsoft SQL Server Management Studio - Query hostname=TUSNC012LKVT006 hostpid=4412 loginname=US1\v983294 isolationlevel=serializable (4) xactid=623111 currentdb=8 lockTimeout=4294967295 clientoption1=671090784 clientoption2=390200
2011-10-08 11:14:42.43 spid16s         executionStack
2011-10-08 11:14:42.43 spid16s          frame procname=adhoc line=4 stmtstart=182 stmtend=296 sqlhandle=0x02000000d2ef471c04bdc8903318b80f6975779f09b41ba8
2011-10-08 11:14:42.43 spid16s     insert into mytable values (2,2, replicate('a','4000'))     
2011-10-08 11:14:42.43 spid16s         inputbuf
2011-10-08 11:14:42.43 spid16s     if not exists (select id from mytable where val = 2)
2011-10-08 11:14:42.43 spid16s     begin
2011-10-08 11:14:42.43 spid16s       waitfor delay '00:01:00'
2011-10-08 11:14:42.43 spid16s       insert into mytable values (2,2, replicate('a','4000')) 
2011-10-08 11:14:42.43 spid16s     end
2011-10-08 11:14:42.43 spid16s        process id=process69eaa8 taskpriority=0 logused=132 waitresource=KEY: 8:72057594047889408 (ffffffffffff) waittime=9293 ownerId=623056 transactionname=user_transaction lasttranstarted=2011-10-08T11:13:32.760 XDES=0xff48c10 lockMode=RangeI-N schedulerid=2 kpid=11340 status=suspended spid=53 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2011-10-08T11:13:32.760 lastbatchcompleted=2011-10-08T11:13:32.760 clientapp=Microsoft SQL Server Management Studio - Query hostname=TUSNC012LKVT006 hostpid=4412 loginname=US1\v983294 isolationlevel=serializable (4) xactid=623056 currentdb=8 lockTimeout=4294967295 clientoption1=671090784 clientoption2=390200
2011-10-08 11:14:42.43 spid16s         executionStack
2011-10-08 11:14:42.43 spid16s          frame procname=adhoc line=4 stmtstart=182 stmtend=296 sqlhandle=0x02000000eec4b91030fbd7eaeeef1609e5a77d124395f09a
2011-10-08 11:14:42.43 spid16s     insert into mytable values (4,4, replicate('a','4000'))     
2011-10-08 11:14:42.43 spid16s         inputbuf
2011-10-08 11:14:42.43 spid16s     if not exists (select id from mytable where val = 4)
2011-10-08 11:14:42.43 spid16s     begin
2011-10-08 11:14:42.43 spid16s       waitfor delay '00:01:00'
2011-10-08 11:14:42.43 spid16s       insert into mytable values (4,4, replicate('a','4000')) 
2011-10-08 11:14:42.43 spid16s     end
2011-10-08 11:14:42.43 spid16s       resource-list
2011-10-08 11:14:42.43 spid16s        keylock hobtid=72057594047889408 dbid=8 objectname=testdata.dbo.mytable indexname=index01 id=lock55da680 mode=RangeS-S associatedObjectId=72057594047889408
2011-10-08 11:14:42.43 spid16s         owner-list
2011-10-08 11:14:42.43 spid16s          owner id=process69eaa8 mode=RangeS-S
2011-10-08 11:14:42.43 spid16s         waiter-list
2011-10-08 11:14:42.43 spid16s          waiter id=process3313558 mode=RangeI-N requestType=convert
2011-10-08 11:14:42.43 spid16s        keylock hobtid=72057594047889408 dbid=8 objectname=testdata.dbo.mytable indexname=index01 id=lock627aa80 mode=RangeS-S associatedObjectId=72057594047889408
2011-10-08 11:14:42.43 spid16s         owner-list
2011-10-08 11:14:42.43 spid16s          owner id=process3313558 mode=RangeS-S
2011-10-08 11:14:42.43 spid16s         waiter-list
2011-10-08 11:14:42.43 spid16s          waiter id=process69eaa8 mode=RangeI-N requestType=convert


--------------------- end of error log  ------------------

The solution to this deadlock situation could be (1) using repeated read isolation level  and/or (2) create an index on the 'val' column. Bottom line, we have to fully understand the requirement to code properly: why we need to insert into a table then using trigger to insert into another table? But this is out of my reach.  Here I am just satisfied with being familiar with SQL Server lock types and DBCC trace command a little bit.

The deadlock shown in this test case belong to a type of deadlock named conversion deadlock. Looking closely at the "resource-list" section of the error log. We can see that both sessions hold the same resource in shared mode (RangeS-S) at the begining. This is no problem. But both sessions then request to convert the lock type to RangeI-N, this reqires the other session give up the RangeS-S lock first. Thus a deadlock happens.

No comments: