What is faster inside SUM & AVG: 0 or NULL?

January 18th, 2016

There are three aggregate functions that are most often used in practice: COUNT, SUM and AVG. The first one has already been discussed previously, while the other two have interesting performance nuances. But first, some theory…

When using aggregate functions in the execution plan, there may be two operators: Stream Aggregate and Hash Match, depending on the input stream.

The first may require pre-sorted set of input values, while Stream Aggregate does not block the execution of subsequent operators.

In turn, Hash Match is a blocking operator (with rare exceptions) and does not require sorting of the input stream. Hash Match uses a hash table that is created in memory, and in the case of incorrect assessment of the expected number of rows, the operator can spill the results into tempdb.
In summary, Stream Aggregate works well with small sorted data sets, and Hash Match copes well with large unsorted sets and can be easily subject to parallel processing.

Now that we have mastered the theory, let’s look how the aggregate functions work.

Suppose we need to calculate the average price of all products:

using the table with a fairly simple structure:

Since we have a scalar aggregation here, we expect to see Stream Aggregate on the execution plan:

This operator executes two aggregation operations COUNT_BIG and SUM (although, on the physical layer, this is executed as a single operation) on the Price column:

We should not forget that the average is calculated only for NOT NULL, because the COUNT_BIG operation uses a column, not an asterisk. Accordingly, the query is:

returns 6, not 4.

Now, let’s look at Compute Scalar, which has an interesting expression to check division by zero:

And try to calculate the total sum:

The execution plan remains the same:

But if you look at the operations performed by Stream Aggregate

you can be slightly surprised. Why SQL Server counts, if I only need the sum? The answer lies in Compute Scalar:

If you do not take into account COUNT, than according to T-SQL semantics, when there are no rows in the input stream, we should get NULL, and not 0. This behavior true for both (scalar and vector) aggregations:

Moreover, such a check is made both for NULL, and NOT NULL columns. Now let’s consider examples, in which the above-described features of SUM and AVG are effective.

If you want to calculate the average, do not use COUNT + SUM:

Since such a request would be less effective than the explicit use of AVG.

Furthermore … there is no need to explicitly pass NULL to the aggregate function:

Since in this query:

The optimizer automatically makes a substitution:

But what if I want to get 0 in the results, instead of NULL? People often use ELSE without hesitation:

Obviously, in this case, we will achieve the desired result … and one warning will no longer be an eyesore:

Although, it is better to write the query like this:

And this is good, not because the CASE operator will work faster. We already know that the optimizer automatically inserts ELSE NULL there … So what are the advantages of the last variant?

As it turned out, the aggregation operations, in which NULL values dominate, are processed faster.

The execution took:

Now, let’s change value:

And execute again:

It is not that essential, but in certain situations it still provides a reason for optimization.
The end of the play, the curtain falls? No. That’s not all…
As one of my friends says: “There is neither black nor white… the world is multicolored” and so I will provide an interesting example, where NULL can do harm.

Let’s create a slow function and a test table:

And execute the query:

Now let’s try to add ISNULL for expression that is passed to SUM:

The execution speed decreased twofold. I must say that it’s not magic… but a bug in the SQL Server engine that has been “fixed” by Microsoft in SQL Server 2012 CTP.

The essence of the problem is as follows: the result of the expression inside the SUM or AVG function can be evaluate twice if the optimizer believes that it can return NULL.

Everything was tested on Microsoft SQL Server 2012 (SP3) (KB3072779) – 11.0.6020.0 (X64).
All execution plans from dbForge Studio for SQL Server.

Leave a Comment