A discussion about a forum post for handling a running subtraction in a SQL query.

For my first Forum Fun post, I'm going to discuss a running calculation SQL question posted by a user, asyssolvers, on the SQLTeam website here. Here's the question [slightly shortened]:

I need to get result as in second chart. Basically I have a total of received quantity in each line and I need to show received quantity against Expected Quantity; if there is any shortage I need to show in last line.

First Result Set [Chart]:

Item     ExpectedQty     ReceivedQty     Short
Item01     30         45         5
Item01     20         45         5
Item02     40         38         2
item03     50         90         10
item03     30         90         10
item03     20         90         10

Query for first Result Set:

select a.Item, a.ExpectedQty,b.ReceivedQty, b.Short
from a join b on a.Item = b.Item

Second Result Set [Chart]:

Item     ExpectedQty     ReceivedQty     Short
item01     30         30         0
item01     20         15         5
item02     40         38         2
item03     50         50         0
item03     30         30         0
item03     20         10         10

Below is the query I came up with as a response. But first the disclaimer: This query will not work in situations where the Expected table has many records because of the < in the join condition for the derived table. If there are under 300,000 or so rows in the Expected table, then this query will work without issues. Once there's more than 500,000 to 600,000 records, it starts to get somewhat slow, and it shouldn't be used if there are over a few million records. In other words, it doesn't scale very will. Depending on the task at hand, this may be a deal breaker or no big deal.

-- Use table names Expected and Received instead of a and b 
-- First, set up the tables and add dummy data to work with: 
CREATE TABLE Expected(dt datetime, Item VARCHAR(10), ExpectedQty INT) 
CREATE TABLE Received(Item VARCHAR(10), ReceivedQty INT, Short INT) 
 
INSERT INTO Expected VALUES('1/1/2011','Item01','30') 
INSERT INTO Expected VALUES('1/2/2011','Item01','20') 
INSERT INTO Expected VALUES('1/1/2011','Item02','40') 
INSERT INTO Expected VALUES('1/1/2011','Item03','50') 
INSERT INTO Expected VALUES('1/2/2011','Item03','30') 
INSERT INTO Expected VALUES('1/3/2011','Item03','20') 
INSERT INTO Received VALUES('Item01','45', '5') 
INSERT INTO Received VALUES('Item02','38', '2') 
INSERT INTO Received VALUES('Item03','90', '10') 
 
-- The primary query: 
SELECT 
    ex.item, 
    ex.ExpectedQty, 
    ex.ExpectedQty - CASE WHEN Short > threshholdForShort 
        AND Short-threshholdForShort <= ExpectedQty THEN Short-threshholdForShort 
        WHEN Short > threshholdForShort AND Short-threshholdForShort > ExpectedQty 
        THEN ExpectedQty ELSE 0 END AS 'ReceivedQty', 
    CASE WHEN Short > threshholdForShort AND Short-threshholdForShort <= ExpectedQty 
        THEN Short-threshholdForShort 
        WHEN Short > threshholdForShort AND Short-threshholdForShort > ExpectedQty 
THEN ExpectedQty ELSE 0 END AS 'TotalShort' 
FROM ( 
        -- This derived table gives an indication of how significant 
        --the 'Short' value needs to be before it impacts this particular record. 
        SELECT e1.dt, e1.item, isnull(SUM(e2.ExpectedQty), 0) AS 'threshholdForShort' 
        FROM Expected e1 
        LEFT JOIN Expected e2 ON e1.item = e2.item AND e1.dt < e2.dt 
        GROUP BY e1.dt, e1.item 
) minShortVal 
INNER JOIN Expected ex ON ex.item = minShortVal.item AND ex.dt = minShortVal.dt 
INNER JOIN Received rec ON minShortVal.item = rec.item

How did I come up with this query? After reading the forum question, my first task was to get a solid understanding of the requirements. After reviewing the provided SQL and output tables, I soon realized that one ReceivedQty value for an item in the first result set was the sum of the expected quantity values minus one of the short values. I next needed some sort of ordering for the records [none was provided in the question, but I assumed one was likely available - either a date or an identity field]. I decided to use a date field for my response. The ordering is needed so the received values can be matched to the expected quantities in a certain order [earlier records match first, later records match afterwards].

With a better understanding of the problem, I went about coming up with a solution. I first created the tables and some dummy data to work with. I tweaked the data until the original simple query returned the first result set in the original question.

I needed a query to keep track of the order, so I came up with the derived table in the above solution to handle this. For example, if the data is the following [with a combined expected quantity of 170] and the quantity short is 100, then the thresholds are:

order    item    ExpectedQty    ReceivedQty    Short        ThreshholdForShort
1    xyz     20         20        0        170-20    = 170
2    xyz     40         40        0        170-(20+40) = 110
3    xyz     30         10        20        170-(20+40+30) = 80
4    xyz     50         0        50        170-(20+40+30+50) = 30
5    xyz     30         0        30        0

Here, we would have to be 170 short to fully impact record 1, and 170-20 = 150 or lower to safely guarantee that record 1 isn't short at all. In this example, since we are 100 short, record 3 is partially short, while for the records 1 and 2 we matched the expected to the received, and for the latter records, the quantity was short/unfulfilled. With this logic in place, I then came up with the SQL in the case statements to show the short and received quantities as appropriate.

After building out the query and testing, I was able to get the data from the second result set. A success! I tested a few variations of the input data just to make sure I got the logic right and didn't miss anything. All checks worked, so I posted the solution.

I've since done some additional performance testing of the query with tables of larger sizes, and while it doesn't scale too well [see the disclaimer above], the approach can be useful in cases where the amount of data isn't too great for problems similar in structure to this one.

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架
新浪微博粉丝精灵,刷粉丝、刷评论、刷转发、企业商家微博营销必备工具"