SQL Server Integration Services (SSIS) is a Microsoft tool mainly used for data integration and workflow automation in ETL processes. Transformations in SSIS are the essential mechanics behind your data flow, allowing you to clean and shape your data as it travels between systems — whether data warehouses, databases, or cloud storage.
Some of these include changing data types, aggregating values, and enriching your data to ensure that it’s accurate and gets where it needs to go smoothly. There are also third-party data flow components that make your SSIS packages more efficient with advanced data integration, flexible import and export options, and improved performance.
Table of contents
Top 10 common transformations in SSIS
You can choose over 30 SSIS transformations to validate and enhance data as it flows through your ETL pipeline. Here are the most common:
Data conversion transformation
The Data Conversion transformation in SSIS allows you to convert input columns to different data types, so you don’t run into compatibility issues that can lead to data loading errors.
How it works
First, create a Data Flow task in your SSIS package and switch to the Data Flow tab. Next, add the Devart Salesforce Source component and double-click to establish a new connection.
Once connected, building an SQL query to select and preview the required data is pretty simple. Just drag the Contact table from Tables to the Query area and add the field you want to convert (e.g., BirthDate).
Then, simply add a Data Conversion transformation to the Data Flow diagram and specify the output data type.
Use cases
Developers use the Data Conversion transformation to:
- Handle dates stored as text.
- Convert strings to integers, such as customer IDs represented as strings that you need in numeric format for database operations.
- Standardize decimal values for financial reporting.
Merge transformation
Like SQL’s UNION ALL operator, the Merge transformation combines two datasets into a single output. But while UNION ALL can work with unsorted data, the Merge transformation requires that both input datasets are sorted.
How it works
All you need to do is drag your source components (e.g., Devart Salesforce Source and Devart Magento Source components) into the Data Flow diagram, set up a new connection on each, and define the data you want to merge.
Then, add a Sort transformation to ensure both datasets are sorted by Customer ID.
After that, add the Merge transformation and connect the outputs of both sorted datasets. You’ll see an Input Output Selection window where you can choose which output corresponds to Merge Input 1 and Merge Input 2.
Use cases
You can use the Merge transformation to:
- Combine data from various sources.
- Correct errors in data that have been processed previously.
- Merge data that you divided using a Conditional Split, such as high-value customers vs. low-value customers.
Merge Join transformation
Merge Join is a variation of the Merge transformation, but instead of combining rows from multiple datasets, it joins columns between different sources.
How it works
When using Merge Join, you first have to sort both input datasets on the join key. Additionally, the key column must have compatible data types in both datasets, much like SQL joins with matching keys.
Then, choose between FULL, LEFT, or INNER join. For example, you can use a LEFT join to merge a list of registered users from a Mailchimp account with their corresponding email campaign responses from Google Analytics. This way, all users will be included in the output, even if they haven’t interacted with any campaigns.
You can also choose the INNER join to merge product details from a Shopify store with sales data from Stripe, returning only those products that have been sold.
Use cases
You can use the Merge Join transformation to do the following:
- Combining data from various sources for complete reports.
- Preserving all records from one dataset, even when there’s no match in the other.
Multicast transformation
The Multicast transformation creates copies of your data and sends them to multiple outputs without extra steps.
How it works
Simply add Multicast to the Data Flow task after connecting your Devart’s source component.
In this example, one copy goes through an Aggregate transformation to sum sales figures by region and is directed to an SQL Server destination. The other copy is enriched with customer demographic data using Devart’s Lookup component and sent to a different database, like Oracle.
You could use standard SSIS destination components, but Devart Destination components come with database-specific optimization for faster data loading and seamless integration without the usual setup headaches.
Use cases
Use the Multicast transformation when:
- You need to process the same dataset in multiple ways.
- You want to send data to different destinations.
Sort transformation
The Sort transformation in SSIS is essential for organizing your data in ascending or descending order, like the SQL command ORDER BY, but for data flows.
How it works
Connect the Sort transformation to your source data and double-click on it to open the configuration window. Here, select which columns to sort and set their sort orders.
You also have the option to remove rows with duplicate sort values if needed.
Keep in mind, though, that it holds up the data flow until all rows are sorted, so it can slow down your ETL process if you’re working with large datasets.
Use cases
The Sort transformation is a must-have step if you’re using Merge or Merge Join transformations. It’s also useful for:
- Identify and manage duplicate records.
- Organizing data for generating reports.
Aggregate transformation
Aggregate allows you to perform various calculations on your data. These include summing values, counting records, grouping by specific fields, finding averages, and determining minimum and maximum values.
How it works
If you want to sum total sales from a dataset, drag the element and connect your source component. In the Aggregate Transformation Editor, choose Sum from the dropdown under Operation and click OK.
If you want to count distinct customers who made purchases in a given month instead, select the Count Distinct operation.
Use cases
Developers use the Aggregate transformation to:
- Summarize sales data.
- Count distinct customers or transactions over time periods.
- Grouping data coming from different sources into one single view.
Lookup transformation
Think of the Lookup transformation in SSIS as a way to enrich your data by pulling in related details from another table or dataset.
How it works
In the Lookup Transformation Editor, click on the General tab, set the Cache Mode and choose the Connection Type (OLE DB or Cache). Then, switch to the Connection tab. Select the data source you want to reference and specify the dataset you’ll be pulling information from. In the Columns tab, map the columns from your input data to the corresponding columns in the reference dataset.
Now, using Devart Lookup components can significantly speed up this process. Unlike the standard Lookup transformation, they’re optimized for seamless integration with various data sources.
To use them, simply add the component that corresponds to the data source you are working with (e.g., Salesforce), connect it to your data flow path, and double-click to configure it.
Next, select the appropriate Devart Connection Manager and specify the lookup object or table. Then, check the columns you want to include in your output in Lookup Columns and set up the input and lookup key columns for matching in the Referential Constraint grid.
Use cases
The Lookup transformation is great for:
- Enriching your data with additional details.
- Validating records against a reference dataset.
Conditional Split transformation
With the Conditional Split transformation in SSIS, you can evaluate each row of data and determine where it should go based on the criteria you set.
How it works
It’s similar to an SQL CASE statement, but instead of returning a value, it routes the data to different outputs. Simply set up your conditions under Condition in the editor and add a destination path for each dataset.
Devart Destination components can handle INSERT, UPDATE, and DELETE operations, so you can send data to the same Salesforce object or table.
Use cases
Conditional Split is pretty useful when you have to:
- Sort data into different categories.
- Direct records based on specific criteria (like region, age, or sales amount).
- Send any problematic records for further review.
Derived Column transformation
Derived Column lets you create new columns or tweak the ones you already have using expressions.
How it works
In the Derived Column Transformation Editor, define the expression you want to use. For example, if you have a column with birth dates, you’d write DATEDIFF(“YY, Birthdate,GETDATE()) to calculate the age of each person by subtracting the birth date from the current date.
Use cases
The Derived Column transformation is handy for tasks like:
- Combining first and last names into a full name column.
- Extract specific parts of strings, like the domain from an email address.
Union All transformation
Like the SQL UNION ALL command, the Union All transformations stacks the data from each source on top of the other, creating a single, larger dataset. It’s a straightforward way to consolidate information without worrying about removing duplicates.
How it works
Connect it to the data sources you want to combine. Make sure all your input datasets have the same number of columns and compatible data types. If column names don’t match, you can easily map them manually to make sure everything lines up.
Use cases
The UNION ALL transformation is ideal for the following use cases:
- Combining data from Excel files, SQL tables, and so on into a single dataset.
- Consolidating sales data from multiple locations.
- Combining results from different queries.
Tips for choosing the right SSIS transformation
Choosing the right SSIS transformation can make a big difference in your workflow’s efficiency and the quality of your data. Here are some tips to help you find the right ones for your package:
- Understand your data requirements: Define your goals and select the right transformation for the task (e.g., Aggregate for sales, Row Count for unique transactions, etc.)
- Ensure data compatibility: Match data types and structures between source and destination. Pre-process at the source whenever possible.
- Consider performance: Choose transformations that minimize resource consumption, especially with large datasets.
- Plan for resource management: Prefer non-blocking transformations such as Conditional Split or Merge Join over those that load data into memory.
- Use Sort Wisely: Perform sorting at the database level. If that’s not possible, minimize the dataset size beforehand.
Conclusion
Determining how to apply the correct SSIS transformations is one of the most critical steps in data integration in any ETL process. Test each of the above transformations and others that may suit your needs until you get the best combination for your SSIS packages. If you want to make working in SSIS even smoother, try Devart SSIS Data Flow Components for free. They let you connect to major cloud applications and databases in a snap and come with optimized performance.