You can’t make an SSIS Derived Column useful without expressions. So, how do you write one? This article will give you tips on using derived column transformation.
Here’s what we are going to discuss:
- What is Derived Column Transformation in SSIS?
- How to Write a Custom Expression in SSIS Derived Column?
- SSIS Derived Column Tips and Tricks
Below, you will see examples with illustrations and pro advice. And feel confident in using this easy SSIS transformation component.
But first, let’s consider more about this transformation component.
What is Derived Column Transformation in SSIS?
Data imports and exports may require transformation. If you’re into ETL and data warehouses, you can’t get away with this. And Derived Column transformation is one of the easy ways to do it in SSIS.
So, the SSIS Derived Column transformation creates a new value from input columns from the data source. And transformation is done by applying expressions. An expression is a mix of columns, operators, functions, literal values, and variables and this mix will also yield a single output value. Finally, the output value is either a new column or a replacement of an existing column. You will see examples of all these later.
You can use this transformation in the following scenarios:
- Combine string values, like forming a full name out of the first name, middle name, and surname.
- Compute a formula. One example is computing for net pay coming from gross pay minus taxes and dues.
- Parsing a long string value into several smaller strings.
- Do something with DateTime columns, like separating the date and time into 2 columns. Or form a file name out of the DateTime column.
To start using the Derived Column transformation, create an SSIS package. Next, drag a Data Flow component in the Control Flow, double-click the Data Flow component and drag a data source component. Then, configure the data source to connect to your database. Finally, add a Derived Column transformation component. After transformation, you can also set up a destination component. Be sure to join the arrows to each element.
Check out the SSIS Derived Column example below:
How to Write a Custom Expression in SSIS Derived Column
We already described what an expression is. The whole concept of expressions in SSIS is consistent across all SSIS elements. And this includes Derived Column transformation.
Before you write an expression, familiarize yourself with the expression builder first. It is a GUI tool for building expressions in SSIS. The interface is almost identical across elements. In our case, we will use the Derived Column Transformation Editor.
The example we have is a CSV file containing names and IDs. Let’s form a full name out of last name, first name, middle name, and suffix. The Derived Column transformation can form the full name through string concatenation.
But first, let’s examine the expression builder by section.
Columns, Variables, and Parameters
Let’s examine the Columns, Variables, and Parameters first. The input columns you see are the ones available for your new expression. The same is true for any variables and SSIS package parameters. Can’t see what you need in the expression box? Then, go back to the previous component. And check the inputs to your Derived Column transformation. In the case of variables and parameters, you may need to create them first.
See a screenshot below. The section for columns, variables, and parameters is boxed in green.
You can drag one or more columns into the Expression box or simply type the column names in the Expression box. Mind your spelling if you opt to type them.
The above image shows the columns FirstName, MiddleName, LastName, and Suffix. It means that we can use these columns for the Derived Column expression.
But we need more than column names to form the FullName column.
Functions and Operators
The next section of the expression builder is the functions and operators. Check out the screenshot below.
In the image above, 3 functions are highlighted:
- SUBSTRING – this is the same as the SQL equivalent function. It extracts a portion of a string.
- TRIM – removes leading and trailing spaces in a string.
- REPLACENULL – this is the same as the ISNULL function in SQL. It tests if the first parameter is null. And if it is, replace it with the value of the second parameter – a non-null value. Don’t get confused with the ISNULL function in SSIS that returns true if a value is null.
There are other functions that are used for strings, dates, and numbers. REPLACENULL which we used here is one of the functions dealing with nulls. Another option with the same result is using ISNULL with a conditional operator, like this:
(ISNULL(" " + SUBSTRING(MiddleName,1,1)+".")?"":" " + SUBSTRING(MiddleName,1,1)+".")
This is like an SSIS Derived Column IF THEN ELSE. But it’s longer to type. So, REPLACENULL is much better.
For a whole list of functions, you can check the official documentation here.
You can also see the plus (+) operator for combining strings in the Expression box. This is just one of the operators available in SSIS. And there are also type casts operators to change one data type to another. You can check the Type Casts and Operators folders below functions. For a whole list of operators in SSIS, check it here.
Derived Columns and Expressions
Finally, let’s examine the definition of derived columns. You can define one or more derived columns in each Derived Column transformation. Each derived column can either replace an existing column or add a new column. You usually add a column derived from existing input columns. In our previous example, we used 1 additional column. And we called the new column Fullname. Fullname is defined using the following expression:
TRIM(FirstName) +
REPLACENULL(" " + SUBSTRING(MiddleName,1,1) + ".","") + " " +
TRIM(LastName) +
REPLACENULL(" " + Suffix,""
Why does the name have to be so long?
TRIM removes leading and trailing spaces for FirstName and LastName. It will avoid an output like this:
MICHAEL B. JORDAN
It should be just right, like this:
MICHAEL B. JORDAN
And how about REPLACENULL?
Concatenating strings with NULL will result to NULL. So, let’s say we use the name ROBERT DOWNEY JR. There’s no middle name, as you can see. If there’s no REPLACENULL, the expression ” ” + SUBSTRING(MiddleName,1,1) + “.” will result to NULL. Even more, adding the LastName (DOWNEY) and Suffix (JR.) will still result to NULL.
To see this in action, let’s try to remove the REPLACENULL function along with SUBSTRING(MiddleName,1,1). And then, enable the Data Viewer before the Recordset Destination to see the result. The following is the screenshot of the Data Viewer.
There you go. Notice the names with NULL MiddleName above. The Fullname also results to NULL.
Too bad, right? Not the result we want.
Finally, we want not the whole MiddleName but only the initial. That’s why we used SUBSTRING.
The expression we formed is defensive. It avoids logic errors. It’s also how you should form your expressions in SSIS.
So, how do we form responsive and defensive expressions in Derived Columns?
5 SSIS Derived Column Tips and Tricks
There are 5 important points we will discuss here. Check it out below.
Test Nullable Columns in SSIS Derived Column
Earlier you saw how nulls are troublesome in strings. In our example earlier, REPLACENULL kills possible logic errors of nullable columns MiddleName and Suffix. Unless you accept nullable results, you need to test using ISNULL or REPLACENULL. A blank or truncated string can replace nulls in strings.
For nullable numeric columns, you need to replace nulls with zero or one, depending on your requirements. You can also treat other data types with a similar approach.
Output the Required Data Type and Size
Do you need to output the result of a Derived Column transformation into a table column or a variable? Then be wary of the data type and size of the result.
SSIS uses different data types. Your string can be anything from DT_STR or DT_WSTR.
Does it matter?
It does. If your output string from the Derived Column is a DT_WSTR Unicode string, your target column or variable should be DT_WSTR too. This ensures that no errors will happen upon writing. But then again, the size also matters. If the string size of the target is 50 and your output string is 51, a truncation error will occur.
So, the lesson? Mind the data type and size output from your Derived Column transformation. If it’s not the same, CAST them. So, a DT_WSTR expression result is compatible with a DT_WSTR target.
To do this, use this expression syntax:
(<data type, [size]>)<expression result>
Examples:
- String: (DT_WSTR,50) “Robert Downey Jr.”
- Non-integer to integer: (DT_I8) 557.0
- String to Number: (DT_DECIMAL,2) “649”
You can replace the literal values above with a column or variable name.
Balance Expression Readability with Performance
You can do multiple Derived Column transformation components with simple expressions. Or you can make multiple expressions in one Derived Column transformation.
Multiple Derived Columns with simple expressions are easy to read. Other developers reading your settings will find it easier to understand. However, note that there are more tasks to perform in multiple Derived Columns. And a single Derived Column may outperform them.
Here’s an example.
Then compare the above Control Flow to the earlier example. So, here we divide the transformation of Fullname into 2 Derived Columns. But instead of adding a new column, we replace the value of Fullname in the second Derived Column.
And the result? Check out the Execution Results of each. The first example finishes at 00:00:00.656. But the above example finishes at 00:00:00.734. Looks like no big deal. The execution times are almost negligible.
They have the same number of rows (19,973). The source and destination are also the same. But the first example always finishes earlier. So, it’s consistent. The second example has more tasks to perform. The vertical scrollbar also implies this. So, compare the execution results below.
But imagine a bigger CSV file. What do you think is the implication?
There are no hard rules here. The point is to make your packages easier to maintain and read by any developer. Not just you. But the result should be faster. You can test 2 scenarios and pick a choice.
Choose an SQL Transform IF It’s Faster
Sometimes, you may choose to have the transformation in SQL than using a Derived Column. In our previous examples, we have no choice because it’s a CSV flat file. You can’t use SQL in an SSIS Flat File Source. But if your source is SQL Server, for example, you may opt to do the transformation in a SQL SELECT statement.
The main reason – performance!
Let’s have an example. We will compare using an SSIS package with transformation within a SELECT statement. And an SSIS package using Derived Column transformation.
Which of the 2 will be quicker? Check out the execution results below.
So, using transformation in SQL is faster in this case. One noticeable reason is the one using a Derived Column has more tasks to perform (dive deeper into SSIS tasks).
But I’m not saying that this is always the case. Generally speaking, it is. But test with 2 scenarios to be sure.
Form Your Expressions in Another Editor
Do you find the Expression box too small? Then, you’re not alone. The Expression builder for variables and others has a wider and taller box to write your expressions.
A smaller box will not make you see the entire expression. If it’s long, you may encounter syntax errors.
So, use Notepad or another good editor to get a full view of your expressions. I used the good-old Notepad to form the Fullname expression earlier.
Conclusion
That’s how we write expressions in SSIS Derived Column. You mix columns, variables, literals, operators, and functions. It’s just like forming any other expression in SSIS. So, if you’re new to SSIS, this is going to be easy for you.
If you like it, please share it on your favorite social media platforms.