Saturday, April 20, 2024
HomeHow ToSSIS Multicast Transformation: The Smart Guide with Easy Examples

SSIS Multicast Transformation: The Smart Guide with Easy Examples

SSIS MultiCast could be the easiest transformation component. Why?

You simply drag it into the Data Flow. Then, point the source to it and point it to one or more transformation or destination components. That’s it. There’s nothing else to configure. But what is it? And how do you use it? This article will show you how to use it with simple examples.

What is a Multicast Transformation in SSIS?

SSIS MultiCast transformation distributes an input to one or more outputs. You can say that it creates copies of the source inputs. Then you can use it to perform several transformations.

For example, to summarize your sales by territory and by period, you only need one source. Then, use a MultiCast and direct it to the 2 transformations. Why not check out the sample package next?

SSIS Multicast Transformation Example

In this example, you have a sales CSV file. And you want 2 summaries loaded to SQL Server.

Input and Output

Here are the target tables in SQL Server:

-- Sales per Territory
CREATE TABLE [dbo].[SalesPerTerritory](
	[Territory] [varchar](50) NOT NULL,
	[TotalDue] [money] NOT NULL
) ON [PRIMARY]
GO

-- Sales Per Period (Year and month)
CREATE TABLE [dbo].[SalesPerPeriod](
	[OrderYear] [int] NOT NULL,
	[January] [money] NULL,
	[February] [money] NULL,
	[March] [money] NULL,
	[April] [money] NULL,
	[May] [money] NULL,
	[June] [money] NULL,
	[July] [money] NULL,
	[August] [money] NULL,
	[September] [money] NULL,
	[October] [money] NULL,
	[November] [money] NULL,
	[December] [money] NULL
) ON [PRIMARY]
GO

As you can see, these are useful transformations. It’s good to know the numbers by territory. It’s also a simple transformation. But the other one requires a pivot because this is the source.

Sales CSV file for use with SSIS Multicast

So, the second transformation will depend on the OrderDate column. That’s why we need to pivot it. And SSIS can do it too.

The SSIS Package With MultiCast

Now that you know the input and the 2 outputs, here’s the SSIS package with Multicast.

As you can see, you only need to point the arrows from the CSV source to the Multicast transformation. Then, point it to the 2 other transformations. That’s the only thing you need to remember.

Here are some more tidbits to explain this package. Let’s start after the Multicast transformation. The following details will not give you further information about MultiCast. But it shows how the MultiCast is used to achieve 2 results.

(NOTE: We will let you dig into the CSV Flat File Source. But don’t forget to change the data types of its output columns. Because all the columns are strings in this case.)

Sales Per Territory Using Aggregate Transform

First, we used an Aggregate transformation to summarize the sales by territory. It is like using a SQL GROUP BY. See a picture below:

Using an Aggregate transform to summarize sales by territory.

Click the checkboxes for Territory and TotalDue input columns. This will add 2 outputs with the correct operation (Sum and Group By).

Then, this leads to loading the summary to the SalesPerTerritory table. Since the column names and types are the same in the above image, it will map automatically.

Sales Per Period Using Derived Column, Aggregate, and Pivot Transform

And then, the next transformation starts with the Derived Column. This will extract the month and year from the OrderDate of the source CSV. Check out the screenshot below.

Using a Derived Column transformation to extract the month and year from the OrderDate column.

We are adding 2 new columns OrderMonth and OrderYear in this transformation.

After that, we did another aggregation. This will summarize the sales by OrderYear and OrderMonth.

Aggregate transform to summarize sales by year and month.

Next is the Pivot transform. We need to make the months become columns with sales totals per month and year. Here’s the setup.

Using Pivot transform to pivot sales by year and month.

The Pivot key is the month numbers. You also specify the month columns in the Generate pivot output columns from values. Then, click Generate Columns Now. Meanwhile, the Set Key is the OrderYear. Each row in the output is per year with the sales totals per month. This appears as the first column in the output. Finally, the Pivot Value is TotalDue. This is the total sales for each month.

The last component is the OLE DB Destination. This will load the results of the second transformation to the SalesPerPeriod table. Here’s the mapping from the Pivot transformation to the OLE DB Destination:

OLE DB Destination component showing column mappings from the output of the Pivot transform into the destination columns.

The columns are based on the Pivot transform and the SalesPerPeriod table columns.

Execution Results

Here are the results for SalesPerTerritory.

And here’s the one for SalesPerPeriod:

That’s how you use the SSIS MultiCast to have 2 copies of the source and produce 2 summaries.

SSIS Multicast Performance

You may wonder about SSIS MultiCast performance. Will it run quicker than using SQL for the same transformation?

Let’s have another example. But to have a good comparison the source should be SQL Server instead of CSV. You can’t use SQL against a Flat File Source.

So, here’s the modified SSIS package using an OLE DB Source (SQL Server).

The details of this package are the same as the package earlier. 

Now, here is the package using SQL instead of MultiCast.

Before we discuss what’s inside these components, here’s the table equivalent to the CSV earlier.

CREATE TABLE [dbo].[Sales](
	[SalesOrderID] [int] NOT NULL,
	[OrderDate] [datetime] NOT NULL,
	[TotalDue] [money] NOT NULL,
	[Territory] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED 
(
	[SalesOrderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

The same 31,466 rows exist here.

Now, the Get Sales Per Territory has a SQL command text below:

SELECT
 Territory
,SUM(TotalDue) AS TotalDue
 FROM dbo.Sales
GROUP BY Territory;

It’s a simple SUM and GROUP BY in a SELECT statement.

And the Get Sales Per Period has a SQL command text below:

SELECT *
FROM (
	SELECT
	YEAR(OrderDate) AS OrderYear,
	MONTH(OrderDate) AS OrderMonth,
	TotalDue
	FROM Sales
) t
PIVOT (
	SUM(t.TotalDue)
	FOR t.OrderMonth IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]) 
) p
ORDER BY p.OrderYear;

The above uses PIVOT to output the same as earlier. But the column names are simpler with 1, 2, 3, etc. instead of C_1_TotalDue, C_2_TotalDue, etc.

Package performance summary table:

ExecutionUsing MultiCast (Elapsed Time)Using SQL(Elapsed Time)
100:00:01.36000:00:00.875
200:00:01.12500:00:01.000
300:00:01.15600:00:00.985

In this case, the SQL transformations ran faster.

So, what’s the point?

Firstly, this post does not discredit the MultiCast transformation. It’s good for a Flat File Source that can’t use SQL. Though you can load the CSV to any relational database if you have one. And do the transformation from there. But that’s another story.

Lastly, the performance result does not push SQL queries to be always better. The results above are for this scenario. But yours can be different. So, do a performance test for your specific scenario.

Note also that we only compared 2 ways to do the same thing. There are other ways (like using a stored procedure).

Conclusion

The SSIS MultiCast transformation is a convenient way to have copies of the source. And then do some transformations from it. You can also use it to copy the same source to different data formats.

So, try it out today and see if this transformation component fits your use case.

RELATED ARTICLES

Whitepaper

Social

Topics

Products