SSIS MultiCast could be the easiest transformation component. Why?
You simply drag it into the Data Flow. Then, point the source to it and point it to one or more transformation or destination components. That’s it. There’s nothing else to configure. But what is it? And how do you use it? This article will show you how to use it with simple examples.
What is a Multicast Transformation in SSIS?
The SSIS MultiCast transformation, part of the powerful SSIS components, allows you to distribute a single input to multiple outputs, enhancing your data flow. This functionality, along with other SSIS tasks, helps streamline data processing by enabling flexible branching within your ETL processes. You can say that it creates copies of the source inputs. Then you can use it to perform several transformations.
For example, to summarize your sales by territory and by period, you only need one source. Then, use a MultiCast and direct it to the 2 transformations. Why not check out the sample package next?
SSIS Multicast Transformation Example
In this example, you have a sales CSV file. And you want 2 summaries loaded to SQL Server.
Input and Output
Here are the target tables in SQL Server:
-- Sales per Territory
CREATE TABLE [dbo].[SalesPerTerritory](
[Territory] [varchar](50) NOT NULL,
[TotalDue] [money] NOT NULL
) ON [PRIMARY]
GO
-- Sales Per Period (Year and month)
CREATE TABLE [dbo].[SalesPerPeriod](
[OrderYear] [int] NOT NULL,
[January] [money] NULL,
[February] [money] NULL,
[March] [money] NULL,
[April] [money] NULL,
[May] [money] NULL,
[June] [money] NULL,
[July] [money] NULL,
[August] [money] NULL,
[September] [money] NULL,
[October] [money] NULL,
[November] [money] NULL,
[December] [money] NULL
) ON [PRIMARY]
GO
As you can see, these are useful transformations. It’s good to know the numbers by territory. It’s also a simple transformation. But the other one requires a pivot because this is the source.
So, the second transformation will depend on the OrderDate column. That’s why we need to pivot it. And SSIS can do it too.
The SSIS Package With MultiCast
Now that you know the input and the 2 outputs, here’s the SSIS package with Multicast.
As you can see, you only need to point the arrows from the CSV source to the Multicast transformation. Then, point it to the 2 other transformations. That’s the only thing you need to remember.
Here are some more tidbits to explain this package. Let’s start after the Multicast transformation. The following details will not give you further information about MultiCast. But it shows how the MultiCast is used to achieve 2 results.
(NOTE: We will let you dig into the CSV Flat File Source. But don’t forget to change the data types of its output columns. Because all the columns are strings in this case.)
Sales Per Territory Using Aggregate Transform
First, we used an Aggregate transformation to summarize the sales by territory. It is like using a SQL GROUP BY. See a picture below:
Click the checkboxes for Territory and TotalDue input columns. This will add 2 outputs with the correct operation (Sum and Group By).
Then, this leads to loading the summary to the SalesPerTerritory table. Since the column names and types are the same in the above image, it will map automatically.
Sales Per Period Using Derived Column, Aggregate, and Pivot Transform
And then, the next transformation starts with the Derived Column. This will extract the month and year from the OrderDate of the source CSV. Check out the screenshot below.
We are adding 2 new columns OrderMonth and OrderYear in this transformation.
After that, we did another aggregation. This will summarize the sales by OrderYear and OrderMonth.
Next is the Pivot transform. We need to make the months become columns with sales totals per month and year. Here’s the setup.
The Pivot key is the month numbers. You also specify the month columns in the Generate pivot output columns from values. Then, click Generate Columns Now. Meanwhile, the Set Key is the OrderYear. Each row in the output is per year with the sales totals per month. This appears as the first column in the output. Finally, the Pivot Value is TotalDue. This is the total sales for each month.
The last component is the OLE DB Destination. This will load the results of the second transformation to the SalesPerPeriod table. Here’s the mapping from the Pivot transformation to the OLE DB Destination:
The columns are based on the Pivot transform and the SalesPerPeriod table columns.
Execution Results
Here are the results for SalesPerTerritory.
And here’s the one for SalesPerPeriod:
That’s how you use the SSIS MultiCast to have 2 copies of the source and produce 2 summaries.
SSIS Multicast Performance
You may wonder about SSIS MultiCast performance. Will it run quicker than using SQL for the same transformation?
Let’s have another example. But to have a good comparison the source should be SQL Server instead of CSV. You can’t use SQL against a Flat File Source.
So, here’s the modified SSIS package using an OLE DB Source (SQL Server).
The details of this package are the same as the package earlier.
Now, here is the package using SQL instead of MultiCast.
Before we discuss what’s inside these components, here’s the table equivalent to the CSV earlier.
CREATE TABLE [dbo].[Sales](
[SalesOrderID] [int] NOT NULL,
[OrderDate] [datetime] NOT NULL,
[TotalDue] [money] NOT NULL,
[Territory] [varchar](50) NOT NULL,
CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED
(
[SalesOrderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
The same 31,466 rows exist here.
Now, the Get Sales Per Territory has a SQL command text below:
SELECT
Territory
,SUM(TotalDue) AS TotalDue
FROM dbo.Sales
GROUP BY Territory;
It’s a simple SUM and GROUP BY in a SELECT statement.
And the Get Sales Per Period has a SQL command text below:
SELECT *
FROM (
SELECT
YEAR(OrderDate) AS OrderYear,
MONTH(OrderDate) AS OrderMonth,
TotalDue
FROM Sales
) t
PIVOT (
SUM(t.TotalDue)
FOR t.OrderMonth IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) p
ORDER BY p.OrderYear;
The above uses PIVOT to output the same as earlier. But the column names are simpler with 1, 2, 3, etc. instead of C_1_TotalDue, C_2_TotalDue, etc.
Package performance summary table:
Execution | Using MultiCast (Elapsed Time) | Using SQL(Elapsed Time) |
1 | 00:00:01.360 | 00:00:00.875 |
2 | 00:00:01.125 | 00:00:01.000 |
3 | 00:00:01.156 | 00:00:00.985 |
In this case, the SQL transformations ran faster.
So, what’s the point?
Firstly, this post does not discredit the MultiCast transformation. It’s good for a Flat File Source that can’t use SQL. Though you can load the CSV to any relational database if you have one. And do the transformation from there. But that’s another story.
Lastly, the performance result does not push SQL queries to be always better. The results above are for this scenario. But yours can be different. So, do a performance test for your specific scenario.
Note also that we only compared 2 ways to do the same thing. There are other ways (like using a stored procedure).
Conclusion
The SSIS MultiCast transformation is a convenient way to have copies of the source. And then do some transformations from it. You can also use it to copy the same source to different data formats.
So, try it out today and see if this transformation component fits your use case.