When working with databases, we often encounter rows that lack data. This missing data may result from unknown or inapplicable values, errors during data import or input, or specific calculations involving non-existent values. In such cases, there are two ways to represent missing data: NULLs and empty (or blank) values.
While these might seem identical at first glance, they are different and impact essential database operations in distinct ways. This article explores the differences between NULLs and empty values in SQL Server and discusses how to handle them effectively.
Contents
- Null and empty values in SQL Server
- Standard approaches to finding NULL or empty values
- Built-in SQL Server functions
- Advanced techniques for managing NULLs or empty values
- Performance considerations for large datasets with NULLs and empty values
- Conclusion
NULL and empty values in SQL Server
NULL represents missing or unknown data in a database column. This can occur in two scenarios: when the data does not exist or when it exists but is currently unknown. NULL can be assigned to fields of any data type, including strings, integers, and dates. Importantly, the field is not allocated memory since NULL signifies an unknown value.
In contrast, an empty or blank space in a database refers to an empty character or a whitespace character. While its meaning may seem similar to NULL, it is stored and retrieved like any other character in a text field. Empty strings are specific to string columns and cannot be applied to different data types.
For example, consider a table with product information, where one column stores warranty details. A NULL value in this column indicates that the warranty period was not specified. In contrast, an empty value signifies that the product does not have a warranty.
In databases, NULL values and blank strings differ in definition, syntax, and length, and they are treated differently in query and data manipulation. Therefore, detecting NULL and empty values separately is often essential. Most database management systems, including SQL Server, offer tools and functions to handle this distinction effectively.
Standard approaches to finding NULL or empty values
Depending on the context, NULLs and empty values may be treated together if they represent a similar concept or separately if they carry distinct meanings or conditions in the data model. This distinction can significantly impact both query performance and the accuracy of results.
The most common case is when it is necessary to avoid NULL value errors by removing both NULLs and empty values or by replacing NULLs with other values (like empty). To manage this effectively, users need reliable methods to identify both NULL and empty column values. This guide explores the built-in tools available in SQL Server, including specialized queries and functions.
Using the IS NULL operator
The IS NULL operator in SQL Server checks whether a column or an expression contains a NULL value. The basic query syntax is as follows:
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
Let’s look at a simple example. In this and subsequent examples, we’ll use the popular AdventureWorks2022 test database for SQL Server and dbForge Studio for SQL Server to demonstrate the test cases.
Suppose we need to retrieve a list of products, including their names and weights, where the weight is less than 10 pounds or the color is unknown (i.e., NULL). Below is the query to achieve this:
SELECT
pt.ProductID
,Name
,Weight
,Color
FROM [Product.Test] pt
WHERE Color IS NULL
Searching for empty strings
As we mentioned earlier, an empty value is a string of characters with a length of zero, and this causes issues because an empty string is not equal to a NULL value. SQL Server treats them differently, and using the IS NULL operator in a query with the WHERE condition does not return empty strings. The condition syntax to search for empty values is:
WHERE column_name = ''
Therefore, the basic query syntax is:
SELECT column_names
FROM table_name
WHERE column_name = ''
Assume that we want to retrieve the list of products where the Style column contains empty values. In this case, we need to search for empty values:
SELECT
pt.Name
,pt.ProductNumber
,pt.Style
FROM [Product.Test] pt
WHERE pt.Style = ''
Frequently, users need to obtain both the NULLs and empty values. We can then combine the IS NULL operator with the search for empty values using the OR operator as below:
SELECT
column_names
FROM table_name
WHERE column_name = ''
OR column_name IS NULL
We want to check whether all products are assigned the ListPrice values in the table. For that, we want to check if there are products with the ListPrice stated as NULL and with an empty ListPrice:
SELECT
ProductID
,Name
,ProductNumber
,ListPrice
FROM dbo.[Product.Test]
WHERE ListPrice = ''
OR ListPrice IS NULL
The output contains empty strings and NULL values, giving a broader set of results.
Using the TRIM function for space-only values
Some columns may contain values consisting entirely of spaces, which is common when importing data from various sources. These values are generally treated as empty since they lack meaningful characters. To identify such rows, you can use the TRIM function.
By default, TRIM removes leading and trailing whitespaces but can also eliminate other specified characters from the beginning and end of a string. In this case, we use this function to eliminate spaces before checking for empty values in the standard way.
The basic query syntax is:
SELECT
column_name
FROM table_name
WHERE column_name IS NULL
OR TRIM(column_name) = ''
The below query selects rows where the columns Color, Size, ProductLine, Class, and Style are either NULL or effectively empty after trimming any leading and trailing spaces.
SELECT
Color
,Size
,ProductLine
,Class
,Style
FROM dbo.[Product.Test]
WHERE (Color IS NULL
OR TRIM(Color) = '')
OR (Size IS NULL
OR TRIM(Size) = '')
OR (ProductLine IS NULL
OR TRIM(ProductLine) = '')
OR (Class IS NULL
OR TRIM(Class) = '')
OR (Style IS NULL
OR TRIM(Style) = '')
It helps us ensure there are no empty or meaningless values in the specified column.
Built-in SQL Server functions
In addition to SQL queries, Microsoft SQL Server offers built-in functions specifically designed to handle NULL values. In this section, we explore their work.
Using the COALESCE function
SQL COALESCE allows us to replace NULLs with a default value, ensuring only meaningful data in the output. This function is useful when NULL values could disrupt calculations or compromise data accuracy.
The syntax is:
COALESCE (expression [ ,...n ] )
The test table we work with contains some NULLs and some empty values instead of meaningful data. In our scenario, we want to retrieve product names where some essential parameters are missing. The rows containing NULLs for colors and sizes will be returned as Unknown, while those rows where ListPrice is not provided will returned as 0.
SELECT
Name
,Color
,Size
,ListPrice
,COALESCE(Color, 'No Color') AS MissingColor
,COALESCE(Size, 'No Size') AS MissingSize
,COALESCE(ListPrice, 0) AS MissingListPrice
FROM dbo.[Product.Test]
As a result, we get a table that defines all cases with NULL values and can process the data further.
The COALESCE function in SQL Server can be used with the TRIM function to retrieve rows with both NULLs and empty values with one query.
The syntax is:
SELECT
column_name
FROM table_name
WHERE TRIM(COALESCE(code, '')) = ''
Here, code is the name of the column where the data has to be filtered.
In our test case, we want to identify products with NULL or empty values in the Color column:
SELECT
ProductID
,Name
,Color
FROM dbo.[Product.Test]
WHERE TRIM(COALESCE(Color, '')) = ''
This query identifies products with a NULL or blank color value and ensures that strings with only spaces are treated as empty.
Using the NULLIF function
The NULLIF function compares two expressions and returns NULL if they are equal. When applied to a column containing an empty value, it returns NULL, allowing us to check for NULLs using the IS NULL operator:
SELECT
column_name
FROM table_name
WHERE NULLIF(TRIM(code), '') IS NULL
Have a look at the below example:
SELECT
Name
,Color
,Size
FROM dbo.[Product.Test]
WHERE NULLIF(TRIM(COALESCE(Color, '')), '') IS NULL
OR NULLIF(TRIM(COALESCE(Size, '')), '') IS NULL
This query effectively filters and returns rows from the table where the Color or Size columns are either NULL, empty or contain only whitespaces, using the NULLIF and TRIM functions.
Using the ISNULL function
The ISNULL function replaces NULLs with a predefined meaningful value.
The syntax of the function is:
ISNULL(expression, replacement)
Here, expression is a column name, while replacement is the value that will replace the column value if it is NULL.
In the example below, we retrieve the data for the product color, size, and class, and replace NULLs with the predefined value Unknown:
SELECT
Name
,ISNULL(NULLIF(LTRIM(RTRIM(Color)), ''), 'Unknown') AS Color
,ISNULL(NULLIF(LTRIM(RTRIM(Size)), ''), 'Unknown') AS Size
,ISNULL(NULLIF(LTRIM(RTRIM(Class)), ''), 'Unknown') AS Class
FROM dbo.[Product.Test]
Advanced techniques for managing NULLs or empty values
Handling NULLs and empty values often involves advanced techniques for more efficient data processing and precise results.
Combining multiple functions
You may have noticed the use of function combinations, such as TRIM with COALESCE or TRIM with ISNULL. Combinations of multiple functions allow more advanced data manipulations, providing precise and targeted results.
The following query demonstrates how to clean data in the Color column by removing whitespaces and replacing NULL values with a placeholder to identify records that lack the color definition:
SELECT
ProductID
,Name
,ISNULL(NULLIF(TRIM(COALESCE(Color, '')), ''), 'Not provided') AS Color
FROM dbo.[Product.Test]
The COALESCE function replaces any NULL values in Color with an empty string, making it safe to apply TRIM, which, in turn, removes any leading or trailing spaces from the Color column. NULLIF(TRIM(…), ”) converts empty strings (either originally empty or trimmed to empty) back to NULL. ISNULL(…, ‘Not provided’) replaces any NULL values (whether originally NULL or transformed to NULL by NULLIF) with the string Not provided.
Using CASE
In SQL Server, you can use conditional expressions, such as CASE, along with functions like ISNULL, COALESCE, and TRIM, to handle different types of missing data. In such scenarios, ISNULL() or COALESCE() replace NULLs with predefined placeholders, TRIM removes leading and trailing spaces and checks for empty strings (”), and CASE combined with TRIM ensures treating space-only strings as empty.
Here’s an example query using the Product.Test table that aims to classify Class into categories based on the type of missing data:
SELECT
ProductID
,Name
,Class
,CASE
WHEN Class IS NULL THEN 'Missing (NULL)'
WHEN TRIM(Class) = '' THEN 'Missing (Empty or Spaces)'
ELSE Class
END AS ProductClassStatus
FROM dbo.[Product.Test]
Such advanced techniques help ensure consistent handling of missing data, and cleansing data for analysis, reporting, and validation.
Performance considerations for large datasets with NULLs and empty values
When working with large datasets containing NULLs and empty values, performance considerations are crucial, as different factors can significantly impact query execution and resource usage. Consider the following factors and strategies to optimize performance:
Index usage
NULL values are indexed differently in SQL Server, and query filtering might not efficiently use indexes. To avoid issues, use filtered indexes to include only non-NULL or relevant rows (e.g., WHERE Column IS NOT NULL).
Function usage
Functions like ISNULL, COALESCE, and TRIM applied directly in the WHERE clause or on indexed columns can prevent index usage and cause full table scans. The solution will be restructuring queries to remove these functions from the WHERE clause. Additionally, advanced modern ETL solutions offer in-built tools for immediate data cleansing.
Storage efficiency
Different treatments of NULLs and empty values related to memory allocation can cause storage overhead if not managed properly. To avoid it, evaluate column defaults to minimize NULLs and empty values during the data input.
Query design and optimization
Complex conditional expressions can cause performance degradation on large datasets. The solution could be separating NULLs and empty values into different query passes. Besides, make use of execution plan analysis to identify query bottlenecks before executing queries, which will help optimize them accordingly.
Statistics and cardinality
In SQL Server, cardinality estimation for columns containing many NULL or empty values may suffer. It is essential to update statistics regularly using the dedicated UPDATE STATISTICS command or auto-update features.
These strategies can help you ensure efficient handling of large datasets with NULL and empty values while minimizing resource consumption and query execution time.
Conclusion
NULLs and empty values are common in databases, making it crucial to understand their concepts, distinguish between them, and handle them appropriately. This article explores reliable methods for identifying and addressing cases of missing data defined by NULLs or empty values. It offers practical techniques to help clean data and ensure accurate calculations. To illustrate these scenarios, we used dbForge Studio for SQL Server, a tool that excels in handling such cases.
dbForge Studio for SQL Server provides a powerful SQL Editor with context-based suggestions, code analysis, syntax validation, formatting, and code snippets, enabling users to write high-quality SQL code twice as fast. In addition, the Studio offers a comprehensive toolset for managing SQL Server databases, whether on-premise or in the cloud. A fully functional 30-day trial is available—download and experience how it can elevate your workflows to the next level!