Friday, August 11, 2006

Average Number of Rows Per Data Block

(Note: Excerpt from a book that I can not remember)

Real World Scenario: How Can I Really Use SUBSTR?

A handy DBA use for SUBSTR is to count the average number of rows per data block in a table. Knowing the average number of rows per block will let you estimate disk space for that table.

A DBA frequently needs to estimate the disk space that a certain table will require. If you have a sample of a thousand or so rows of real data, you can load it, measure it, and accurately estimate the amount of disk space that will be required for the full data load. For example, if you know that an average of 100 rows fit in each 4KB data block, it becomes easy to estimate disk space for 1,000,000 rows as 1,000,000 rows / 100 rows per data block * 4KB per data block, which yields 40,000KB.

To count rows per data block, you need to use ROWIDs. ROWIDs have the format OOOOOOFFFBBBBBBRRR, where the Os represent the OID (object ID), the Fs represent the relative file number, the Bs the block number, and the Rs the row number within the block. You can count rows grouped on the O, F, and B parts of the ROWID (see Chapter 4 for more information on aggregate functions and grouping) to get the number of rows in each data block. This becomes the inline view or FROM subquery below (see Chapter 5, "Joins and Subqueries," for more information on subqueries). The main query then reports the minimum number of rows in a data block, the maximum number of rows in a data block, the average number of rows per data block, and the sum of all the rows in the table. (Note that if your table has chained rows, this technique will not properly count those chained blocks.)


SELECT MIN(cnt), MAX(cnt), AVG(cnt), SUM(cnt)
FROM (SELECT COUNT(*) cnt
FROM customer_orders
GROUP BY SUBSTR(ROWID,1,15));


MIN(CNT) MAX(CNT) AVG(CNT) SUM(CNT)

---------- ---------- ---------- ----------

60 332 202.145213 1446349

No comments: