Thursday, September 08, 2011

SQL Server - test case showing rows moved when creating a clustered index

Recently I have determined to study SQL Server. Of course, this is not the first time I attempt to do so. Just this time I decided not to spend time on Oracle voluntarily and to concentrate on SQL server instead. 

I have noticed that in SQL Server it is highly recommended that every table should have a clustered index. A table with a clustered index (CI) conceptually corresponds to the Index Organized Table (IOT) in Oracle. Basically, the leaf node of the index contains the data block (called B+ tree index). I also found a blog post that attempted to explain why SQL Server needs to use CI to achieve performance while IOT is not used in Oracle as widely as the CI in SQL Sever. But I am still not very clear about the reasons after reading the post.

If a table does not have a CI and we want to create  CI on it, the existing rows should be reorganized to occupy the pages in the sorted key order. This is what I try to confirm with the following test case.

Test case:
-- script: rowmove.sql
-- purpose: demonstrate table rows moved when creating clustered index
-- env    : SQL Server 2008
--

:setvar sqlcmdheaders            40     -- "page size"
:setvar sqlcmdcolwidth          132     -- line width
:setvar sqlcmdmaxfixedtypewidth  32     -- max column width (fixed length)
:setvar sqlcmdmaxvartypewidth    32     -- max column width (varying length)

USE testdata
go

IF OBJECT_ID ( 'mytable', 'U' ) IS NOT NULL 
DROP TABLE mytable;
GO


CREATE TABLE mytable
 (
  id  INT,
  col1 VARCHAR(2000),
  col2 VARCHAR(2000));

INSERT INTO mytable
SELECT 1, REPLICATE('a', 2000), REPLICATE('b', 2000);

INSERT INTO mytable
SELECT 4, REPLICATE('a', 2000), REPLICATE('b', 2000);


INSERT INTO mytable
SELECT 2, REPLICATE('a', 2000), REPLICATE('b', 2000);

INSERT INTO mytable
SELECT 3, REPLICATE('a', 2000), REPLICATE('b', 2000);


go

select 
       a.%%physloc%%                          AS Address,
       sys.fn_PhysLocFormatter(a.%%physloc%%) AS AddressText,
       a.id
FROM   mytable a
ORDER BY 2;
go

print " "
print "  ----  After created clustered index on ID ----"
print " "

go

create unique clustered index index01 
on mytable(id);
go

select 
       a.%%physloc%%                          AS Address,
       sys.fn_PhysLocFormatter(a.%%physloc%%) AS AddressText,
       a.id
FROM   mytable a
ORDER BY 2;
go



Test output:

Changed database context to 'testdata'.

(1 rows affected)
Address    AddressText                      id         
---------- -------------------------------- -----------
0xFA420000 (1:17146:0)                                1
0xFA420000 (1:17146:1)                                4
0xFD420000 (1:17149:0)                                2
0xFD420000 (1:17149:1)                                3

(4 rows affected)
 
  ----  After created clustered index on ID ----
 
Address    AddressText                      id         
---------- -------------------------------- -----------
0x99000000 (1:153:0)                                  3
0x99000000 (1:153:1)                                  4
0xFE420000 (1:17150:0)                                1
0xFE420000 (1:17150:1)                                2

(4 rows affected)



It can be seen that in this test case one page can hold maximum 2 table rows. At the begining, table rows with id=1 and 4 reside on page 17146, and with id=2,3 resides on page 17149. This is understandable as we inserted the rows in the order of 1,4,2,3. After create a CI on ID column, the rows with id=1,2 moved to page 153 and rows with id=3,4 moved to page 17150. So the rows occupies the pages in the sorted order of ID. It should be noted that the pages that belong to the table do not need to physically on the disk files closely to each other.

3 comments:

Martin Preiss said...

Denis,
in 2010 Jonathan Lewis wrote a series of articles in the simple talk journal and translated some SQL Server Issues into Oracle. In Part 4 (http://www.simple-talk.com/sql/learn-sql-server/oracle-to-sql-server-putting-the-data-in-the-right-place/) he discussed clustered indexes. Perhaps you find some interesting details there.

Regards

Martin

yds said...

Martin
Thank you. I know this series but have not read the part 4 till now. It is interesting to read and get to know lots of details about CI.

yds said...

Today I read thoroughly about part 3 and part 4. JL gives clear explaination why heap tables are bad in SQL server.

part 3:
http://www.simple-talk.com/sql/learn-sql-server/oracle-to-sql-server,-crossing-the-great-divide,-part-3/