Thursday, 3 July 2008

Generic and efficient paging query for SqlServer and Oracle

A friend just pointed at an old post of mine in my blog
in Turkish. Basically, it's a parameterized wrapper query
which returns rows in an interval among the results
of a given query.


It's possible to implement this the same way in
both Sql Server 2005 and Oracle 10i. I had tested it quite
well to assure that it's better than alternative methods.
I had implemented this solution for our XSLT-enabled
data and view framework for asp.net 2.0, while I was
working in Netsis.


Assumed that your raw query is:

SELECT * FROM friends;

And you need just last 10 records from a 4million-records
table. (you're very popular)

You need to wrap it this way:

WITH
RawTable as ( SELECT * FROM friends ),
NumberedTable as
(SELECT *, ROW_NUMBER() OVER
(ORDER BY @@IDENTITY) AS rownum
FROM RawTable)
SELECT * FROM NumberedTable
WHERE rownum <> 3499990

As far as I can remember, It was quite better than these
alternatives:

- Server-side cursors of ADO.NET on SQL2005
- Embedded subqueries with two seperate SELECT TOP N

And using ROW_NUMBER() over @@IDENTITY keyword
had helped a lot, instead of giving it a name of a column
to order. It was both fast and generic.

Use responsibly :)

2 comments:

Ozgur said...

First, where do you get that number "3499990"? As far as I understand, the initial problem was getting last 10 records from a large table, but this wrapper query gets the ones which have row number greater than some number. I know these two are quite similar, yet they are not identical. Maybe one should use a statement like "COUNT(*) - 10" as that threshold number.

Second - which i think was a small typo - what does "rownum<>3499990" mean? Shouldn't it be "rownum>3499990"?

Anyway, why are we dealing with such a performance problem, while those famous vendors do not offer an efficient way...

Görkem PAÇACI said...

Yes, you are right. That was a typo, and also the query was incomplete.

WHERE rownum <> 3499990

should be:

WHERE rownum < 3500000 AND rownum > 3499990

It was a silly copy-paste error because of <div> restrictions in the wordpress template.

You are certainly right on your second point as well. One needs to execute a COUNT query first to be able to prepare this query. I mostly perform these two in a single transaction, so it isn't a big deal over performance.

The sole purpose of this query is to fetch a number of rows in the middle of a result set. (e.g. 30..39 out of 100)

An initial approach is like this for most people:
SELECT TOP 10 * FROM
SELECT TOP 70 * FROM tblFriends ORDER BY id DESC

But it was almost three times slower than my example, as far as I can remember.(4sec to 17sec for a 4m records table)

A more advanced option is using a Server cursor. In this option you execute your query in a stored module, seek to a row in your result set, and fetch desired number of rows to a temporary table, and return that temporary table for result of the stored module. But this approach was quite slow as well.

Oh, those mighty "vendors"...

For example, the DataGridView control in the asp.net. You can easily tell it to fetch rows 3499990..3500000. What it does is loading all the table into the IIS cache, and fetching given interval manually. It takes ages to execute, and lots of memory space in your IIS worker process.