A followup to a post about subqueries with an estimation for how long it would take for the query to run.

In a recent post about subqueries, I compared 4 queries and how they handled 4 million rows of data. The queries involved:

1) Joins with AND conditions.
2) Subqueries with AND conditions.
3) Joins with OR conditions.
4) Subqueries with OR conditions.

I compared the speed differential between all of them [on SQL Server], but the last of the queries never finished. As I indicated I might do, I left it running overnight – it still never finished!

Well, my curiosity about how long it would take for the query to finish wasn’t diminished – it was only enhanced. How could I find out and at least be able to estimate how long it would take? I needed to have some sort of understanding of how the performance was degrading as more records were added to the table. So I did that. I ran the query using a smaller number of records in the StudentTestData table. Here’s the results:

# records Run Time
25,000 6 seconds
50,000 23 seconds
100,000 99 seconds
200,000 505 seconds

My anticipation is SQL Server was doing some sort of cross-join type comparison in the background, and the numbers above seem to support this. The efficiency, in Big-O notation, is around O(n^2). Since I want to work with actual numbers to estimate how long the query would run with N = 4 million rows, here’s a rough function:

Number of Seconds = 6*(N/25000)^2, where N is the number of rows.

This function underestimates the time requirement as you add more records, as it is slightly too low for N=100,000 and it is about 2 minutes off for N=200,000. Despite this, it gives me something to work with.

As a spot-check, I tried N = 80,000 students. The function expectation is 61 seconds, and the actual time the query ran was 60 seconds. That’s about as close as I can ask for.

But what about if N is 4 million rows as it was in the previous post? Based on the function, the expectation is over 43 hours. Also, this is an understatement given the results above for N=200,000. It’s a good thing I didn’t wait for it to finish!

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架
新浪微博粉丝精灵,刷粉丝、刷评论、刷转发、企业商家微博营销必备工具"