It is a little bit off topic for my blog, but I came across a situation today while at work where I needed to page the results of a stored proc, but at the same time I needed to make sure that the database wasn’t looking through so many rows that it would cause a slowdown. The solution that I came up with is not exactly the most novel solution that exists, but it certainly did the job. And from the testing I did, it is pretty efficient. I also realized how many new features in SQL Server 2005 it used (so old school), and so, I figured I’d throw it up here hoping that it might help someone. Also, hopefully someone will point out something that could be done better.
So, lets see some code!
DECLARE @Top int
SET @Top = 100
DECLARE @StartDate datetime
SET @StartDate = '1/1/1900'
DECLARE @EndDate datetime
SET @EndDate = '12/31/2010'
DECLARE @PageNum int
SET @PageNum = 1
DECLARE @PageSize int
SET @PageSize = 10
;WITH orders_limit AS
(
SELECT TOP (@TOP)
[Id],[OrderDate],[SubTotal],[TaxTotal],[OrderTotal]
FROM [dbo].[Orders]
WHERE
OrderDate > @StartDate
AND OrderDate < @EndDate
),
orders AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY OrderDate DESC) AS [row_num],
COUNT(*) OVER () AS [total_rows],
*
FROM orders_limit
)
SELECT * FROM orders
WHERE row_num BETWEEN (((@PageNum - 1) * @PageSize) + 1)
AND (@PageNum * @PageSize)
ORDER BY row_num
If you just ignore the top part where I am declaring variables, then the code really isn’t all that long. I am going to walk through each part real quick, but let me first just say that this code only has a certain level of applicability. The idea here is that this code could be used for areas where are user is allowed to enter sort criteria for a table, but you don’t want them entering criteria that would select out 10 million rows. Even though the query is using paging, it is still getting a total row count to use in our pager. So even though we are not returning that many rows, we would still filter them to get a count. The problem is that for really large queries, the simple act of filtering to get a total row count could kill the query entirely.
So, the first thing that you may notice here is the “WITH” statement. It is actually a new feature in SQL Server 2005 (so old school) called Common Table Expressions or CTEs as you usually see them referred. The semi-color before the WHERE is only there because any statement before a CTE needs to be terminated with a semi-colon. You’ll often see people always put a semi-color before the WITH in order to reduce confusion. A Common Table Expression simply allows you to define a variable that you can later use just like a table to query over. In order to make it look a bit more simple, check this out:
WITH orders AS
(
SELECT Id
FROM [dbo].[Orders]
)
SELECT * FROM orders WHERE id % 2 = 0
Obviously you would not want to do this. You can easily do this without the CTE, I just wanted to get all of the junk out of the way so that you could see it in its most simple form. This CTE will select all Ids from the [Orders] table and put it into a variable called “orders” which our SELECT statement uses to filter it down by even ids. Not sure why you would want the even Ids, but here you go anyways.
Now, in the first example above you’ll see that after the first CTE there is a comma “,” and then “orders AS”. Well, this is just a way to define multiple CTEs in a row, and you can query the previous CTE from within the next CTE. A CTE must be immediately followed by a query that uses it, so this is the only way in which to chain multiple CTEs together. To let you see it a bit easier, check this out:
WITH orders AS
(
SELECT Id
FROM [dbo].[Orders]
)
,sub_orders AS
(
SELECT * FROM orders WHERE id % 2 = 0
)
SELECT * FROM sub_orders WHERE id % 3 = 0
Again, you do not want to do this. But what is happening here is that first we select only even numbers, then we select only multiples of three. As you can see though, the second CTE uses the results of the first CTE. Quite neat, and very easy to read.
So, now that CTEs are out of the way, the next new feature that was used is ROW_NUMBER. ROW_NUMBER is a function that allows you to apply an incrementing series of digits across a set. Here we are using it in conjunction with the OVER statement in order to get a row number over the OrderDate descending. Notice at the end that we will order by “row_num” which will give us all of our newest orders first. ROW_NUMBER can also be used over partitions, and has other uses, so if you want to know more go here.
Yet another new SQL Server 2005 feature is the COUNT(*) OVER () call. This uses OVER with COUNT so that we can get a count of rows over a partition, but here we aren’t passing anything to it, so we are going to get a count of the total number of rows that match the query. If we had entered a PARTITION BY into the parentheses here, then we could have gotten counts based on a predicate. For example, if our query had a user Id column, then we could have done COUNT(*) OVER (PARTITION BY UserId) and we have received a COUNT of the total number of orders for that User Id with each row containing that User Id. Another very useful new feature.
And last, but not least, is the ability to specify the TOP in a variable now, and not have to hardcode it. Not a big feature, but certainly opens up new possibilities.
So, back to the original query. You can now see that we are filtering out our result set by using a TOP statement. This way we can limit the total number of rows that will ever get returned. Once the rows leave the “orders_limit” CTE, they are going to move into the “orders” CTE where we will apply ROW_NUMBER and get our COUNT. And finally the rows will head down to our final query where they will be ordered by row_num and then filtered out by only the page that we need.
I know that for a lot of you this may be super old news, but for those who haven’t had much experience with SQL Server 2005 yet this will serve as a primer so that you can get up to speed when SQL Server 2008 hits later this year!
No comments:
Post a Comment