Devart Blog

Is UNPIVOT the best way for converting columns into rows?

Posted by on October 30th, 2013

Convertion of columns into rows is widely used in real tasks. In this article we will talk about different implementations of T-SQL UNPIVOT transformations.

Let’s assume we have an aggregated table which contains data about results of the games played by each player. We have a task to convert columns into rows.

IF OBJECT_ID ('dbo.Players') IS NOT NULL
	DROP TABLE dbo.Players;

CREATE TABLE dbo.Players
(
	  PlayerID INT
	, Win INT
	, Defeat INT
	, StandOff INT
	, CONSTRAINT PK_Players PRIMARY KEY CLUSTERED (PlayerID) ON [PRIMARY]
);
INSERT INTO dbo.Players (PlayerID, Win, Defeat, StandOff)
VALUES
	(1, 7,  6,  9),
	(2, 12, 5,  0),
	(3, 3,  11, 1);

There are several ways to accomplish this task, so let’s take a look at the execution plans of each of the suggested below implementations. For this we will use SQL Profiler available in dbForge Studio for SQL Server.

To receive execution plans automatically each time the query is executed we need to switch to the profiling mode:

profiling mode

It’s also possible to get the query plan without starting its execution. For this, the Generate Execution Plan command must be ran.

Let’s Start!

1. UNION ALL

SQL Server 2000 did not offer an efficient way to convert columns into rows. Because of that multiple reading from the same table with a different set of columns united through the UNION ALL statements was widely used:

SELECT PlayerID, GameCount = Win, GameType = 'Win'
FROM dbo.Players
	UNION ALL
SELECT PlayerID, Defeat, 'Defeat'
FROM dbo.Players
	UNION ALL
SELECT PlayerID, StandOff, 'StandOff'
FROM dbo.Players

The crucial weakness of this practice is the multiple data reading, which considerably decreases the efficiency of the query execution.

It is obvious, when we look at the execution plan of the following query:

query execution plan

2. UNPIVOT

When SQL Server 2005 hit the stage, a new syntax T-SQL structure was introduced – UNPIVOT.
Let’s simplify the previous query with the UNPIVOT statement:

SELECT PlayerID, GameCount, GameType
FROM dbo.Players
UNPIVOT (
	GameCount FOR GameType IN (
		Win, Defeat, StandOff
	)
) unpvt

As a result of the query execution we get the following execution plan:

result of execution plan

3. VALUES

However, progress does not stand still, because after the release of SQL Server 2008 the areas of use of the VALUES statement expanded.

Query which uses the VALUES statement will look like this:

SELECT t.*
FROM dbo.Players
CROSS APPLY (
	VALUES
		  (PlayerID, Win,      'Win')
		, (PlayerID, Defeat,   'Defeat')
		, (PlayerID, StandOff, 'StandOff')
) t(PlayerID, GameCount, GameType)

Moreover, the execution plan will be simpler, comparing to UNPIVOT:

comparing execution plan to UNPIVOT

4. Dynamic SQL

Using the dynamic SQL allows creating general-purpose query for any table on the condition that columns not included in the primary key have a compatible data-type between them:

DECLARE @table_name SYSNAME
SELECT @table_name = 'dbo.Players'

DECLARE @SQL NVARCHAR(MAX)
SELECT @SQL = '
SELECT *
FROM ' + @table_name + '
UNPIVOT (
	value FOR code IN (
		' + STUFF((
	SELECT ', [' + c.name + ']'
	FROM sys.columns c WITH(NOLOCK)
	LEFT JOIN (
		SELECT i.[object_id], i.column_id
		FROM sys.index_columns i WITH(NOLOCK)
		WHERE i.index_id = 1
	) i ON c.[object_id] = i.[object_id] AND c.column_id = i.column_id
	WHERE c.[object_id] = OBJECT_ID(@table_name)
		AND i.[object_id] IS NULL
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '') + '
	)
) unpiv'

PRINT @SQL
EXEC sys.sp_executesql @SQL

And the result is the following query:

SELECT *
FROM <table_name>
UNPIVOT (
	value FOR code IN (<unpivot_column>)
) unpiv

This method is less fast because the automatic generation of the UNPIVOT query requires an additional reading from the system views and rows concatenation via XML trick.

5. XML

A more smarter way to execute dynamic UNPIVOT is by making a little trick with XML:

SELECT
	  p.PlayerID
	, GameCount = t.c.value('.', 'INT')
	, GameType = t.c.value('local-name(.)', 'VARCHAR(10)')
FROM (
	SELECT
		  PlayerID
		, [XML] = (
				SELECT Win, Defeat, StandOff
				FOR XML RAW('f'), TYPE
			)
	FROM dbo.Players
) p
CROSS APPLY p.[XML].nodes('f/@*') t(c)

In the query above, for each row SQL Server has generated a specified XML:

 < f Column1="Value1" Column2="Value2" Column3="Value3" ... />

Afterwards the name of the attribute and its value are parsed. In most cases the use of XML results into a more slow execution plan. In our case that is the price we pay for it being universal.

Let’s compare the results by executing Compare Selected Results command:

Compare Selected Results command

Let’s note that there is no obvious difference in the speed of execution of UNPIVOT and VALUES queries. This is valid for the cases when simple transformation of columns into rows is taking place.

Let’s examine another task, where we need to find the most frequent outcome of the game for each player.

We’ll try to do this task with the help of the UNPIVOT statement:

SELECT
	  PlayerID
	, GameType = (
		SELECT TOP 1 GameType
		FROM dbo.Players
		UNPIVOT (
			GameCount FOR GameType IN (
				Win, Defeat, StandOff
			)
		) unpvt
		WHERE PlayerID = p.PlayerID
		ORDER BY GameCount DESC
	)
FROM dbo.Players p

In the execution plan its show that the bottleneck is the multiple data reading and sorting, which is necessary for organizing the data rows:

data rows organizing

It’s easy to get rid of the multiple data reading if we remember that columns from the external query block can be used:

SELECT
	  p.PlayerID
	, GameType = (
		SELECT TOP 1 GameType
		FROM (SELECT t = 1) t
		UNPIVOT (
			GameCount FOR GameType IN (
				Win, Defeat, StandOff
			)
		) unpvt
		ORDER BY GameCount DESC
	)
FROM dbo.Players p

Multiple reading of data was eliminated but the most resource consuming operation – sorting is still there:

sorting operation

And that is how the VALUES statement behaves during this task:

SELECT
	  t.PlayerID
	, GameType = (
			SELECT TOP 1 GameType
			FROM (
				VALUES
					  (Win,	 'Win')
					, (Defeat,	 'Defeat')
					, (StandOff, 'StandOff')
			) t (GameCount, GameType)
			ORDER BY GameCount DESC
		)
FROM dbo.Players t

As we’ve expected the plan was simplified, however sorting is still present

simplified execution plan

Let’s try to avoid sorting by using the aggregation function:

SELECT
	  t.PlayerID
	, GameType = (
			SELECT TOP 1 GameType
			FROM (
				VALUES
					  (Win,	 'Win')
					, (Defeat,	 'Defeat')
					, (StandOff, 'StandOff')
			) t (GameCount, GameType)
			WHERE GameCount = (
				SELECT MAX(Value)
				FROM (
					VALUES (Win), (Defeat), (StandOff)
				) t(Value)
			)
		)
FROM dbo.Players t

Now the execution plan looks like this:

final execution plan

Hurrah! We managed to get rid of sorting.

Conclusion

In the cases, when we need to carry out a simple convertion of columns into rows in SQL Server it is better to use UNPIVOT or VALUES structures.

If after the convertion the received data rows should be used for aggregation or sorting, then we should rather use VALUES structure which, in most cases, results into more efficient execution plans.

For the tables, where different structure types might occur and the number of columns is unrestricted, it is recommended to use XML which unlike the dynamic SQL can be used inside the table functions.

Step-by-step analysis of the execution plan with the help of Profiler in dbForge Studio for SQL Server – http://www.devart.com/dbforge/sql/studio/ allows detecting bottlenecks in the query’s performance.

P.S.

To adapt some examples according to the issues of SQL Server 2005, VALUES statements:

SELECT *
FROM (
	VALUES (1, 2), (2, 3)
) t(id, value)

we need to replace on SELECT UNION ALL SELECT combination:

SELECT id = 1, value = 2
UNION ALL
SELECT 2, 3
See also

8 Responses to “Is UNPIVOT the best way for converting columns into rows?”

  1. kevin Says:

    Hi Sergey Syrovatchenko , how do I use you example to reverse the unpivot method/s to go back to pivot ?

  2. Raghvendra Says:

    Please help me with the below query, I want column_Name to be displayed in rows and and Table_Name and database are grouped as all the column for a table will be displayed in single row:

    IF OBJECT_ID(‘Tempdb.dbo.#temptbl’, ‘u’) IS NOT NULL
    DROP TABLE #temptbl

    Create Table #tempTbl
    (
    Database_Name nvarchar(100),
    Table_Name nvarchar(100),
    Column_Name nvarchar(100)
    )

    Insert Into #tempTbl
    Select DB_Name() , sys.schemas.NAME + ‘.’ + sys.tables.NAME, sys.columns.name FROM sys.tables
    INNER JOIN sys.schemas ON sys.tables.schema_id = sys.schemas.schema_id
    Inner Join sys.columns On sys.tables.object_id = sys.columns.object_id
    Go

    Select * from #tempTbl

  3. Ahmed Shahin Says:

    I’m having a Table called Test_Pivot which looks like
    F_SN F_Title F_Data
    1001 TRX RRN 301807460001.00
    1001 DATE TIME 18:07 ,30/04/2015
    1001 AUTH CODE
    1001 AMOUNT 0.22
    1001 SCHEME NAME SPAN
    1001 RESULT DECLINED
    1011 TRX RRN 301920050002.00
    1011 DATE TIME 19:20 ,30/04/2015
    1011 AUTH CODE
    1011 AMOUNT 0.01
    1011 SCHEME NAME MASTERCARD
    1011 RESULT DECLINED

    The output should look like this:

    TRX RRN DATE TIME AUTH CODE AMOUNT SCHEME NAME RESULT
    301807460001.00 18:07 ,30/04/2015 0.22 SPAN DECLINED
    301920050002.00 19:20 ,30/04/2015 0.01 MASTERCARD DECLINED

  4. Andrey Langovoy Says:

    Hi Ahmed,
    Please clarify your question!
    Thank you.

  5. Kai Says:

    Hi,

    How did you get the query profiler results shown in image 06? What tool did you use?

  6. Sergey Syrovatchenko Says:

    Thanks for the comment. This is a built-in feature in dbForge Studio profiler

  7. Kai Says:

    Can dbForge Studio profile monitor the query’s tempdb and memory usage?

  8. Sergey Syrovatchenko Says:

    Hi Kai, you mean spills into tempdb? If yes, in this case we have identical functionality as in SSMS. In future it’s planned to expand this functionality.

Leave a Reply