Sunday, November 17, 2024
HomeHow ToSSIS Data Types: The No-Sweat Guide with Easy Examples

SSIS Data Types: The No-Sweat Guide with Easy Examples

SSIS Data Flow Components Download

Consider this simple Integration Services package. It’s simple because the task is to upload a CSV file to SQL Server. No transformations.

Notice the warning in the OLE DB Destination. It’s a string truncation warning. The source column has 50 characters but the target has only 20. So, what? The package may run without errors. But that’s only half of the problem. Here’s a screenshot of the source:

Looks pretty simple too. But the execution results include:

[OLE DB Destination [27]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Invalid character value for cast specification".

[OLE DB Destination [27]] Error: There was an error with OLE DB Destination.Inputs[OLE DB Destination Input].Columns[ DateFounded] on OLE DB Destination.Inputs[OLE DB Destination Input]. The column status returned was: "The value could not be converted because of a potential loss of data."

That’s bad. The culprit? A date format (MMDDYYYY) that SSIS cannot convert.

These are just some of the problems you need to deal with. But there’s a way to fix these. And it’s not hard at all. Find them out in the following sections.

But before we deal with that, let’s discuss the core of SSIS data types first. With this, you can deal with the challenges ahead.

Discover more about SSIS tasks here.

What are the Data Types in SSIS?

SSIS data types are not too different from other platforms. Many data types correspond to SQL Server data types. The same is also true with other data formats. But it’s not a one-to-one mapping.

The following are the categorized data types.

Strings

These are the types that support letters, numbers, and symbols. Or it’s a combination of all of them. It also deals with Unicode strings that involve characters like Japanese or Korean. In SSIS, strings can be DT_STR, DT_WSTR, DT_TEXT, or DT_NTEXT.

Numbers

These are the types that support numbers for mathematical computation. It’s either whole numbers or those with fractional parts. In SSIS, DT_I1, DT_I2, DT_I4, DT_I8 for signed integers. There’s also DT_CY for currency values. And also DT_NUMERIC (exact numerics), and DT_R4 and DT_R8 (floating-point numbers).

Date/Time Types

These are the types that deal with dates, times, or a combination of both. In SSIS, there’s DT_DATE, DT_DBDATE, DT_DBTIME, DT_DBTIME2, DT_DBTIMESTAMP, DT_DBTIMESTAMPOFFSET, and DT_FILETIME.

Working with dates and times can be tricky. This is true if you are also dealing with fractional seconds and time zones. The problem happens when you are comparing 2 date/time values. Know your data requirements if you really need those sorts of detailed information. Then, design your SSIS expressions and set data types accordingly.

Boolean

Boolean in SSIS is DT_BOOL. It’s either true or false.

Binary

These are the types that deal with binary values. In SSIS, this can be DT_BYTES or DT_IMAGE. Note that some other database data types will be mapped to an image type. One example is the SQL Server geospatial data type.

Identifiers

A globally unique identifier (GUID) is DT_GUID in SSIS.

For further reading: Please visit the official documentation for a complete list of SSIS data types.

So, Which Data Type to Use?

You may wonder: Of all the different types of strings, which is the one to use? And this also applies to numbers and dates.

It’s simple. Whatever is the data type used by the source and destination. Do not change it unless the destination requires a different data type or the same data type with a larger size.

In our earlier example, the FastFoodChain column in the CSV file uses DT_STR with a size of 50. Meanwhile, the SQL Server counterpart uses DT_STR with a length of 20. They have the same type. Do not change it. But the length of both should be the larger value which is 50. So, change the size of the target column from 20 to 50 characters. Doing this will fix the truncation problem.

So, the question is what is the data type of the target? The answer to this is the data type you will use.

But this is not limited to table columns. Variables, literal values, and expression evaluation also deal with SSIS data types. So, if you are going to use variables to set column values, set them properly based on the target’s data type.

But there’s another bad news. SSIS may “secretly” convert the type. As you will see in the following section.

How to Change Data Types in SSIS?

So, the next question is: How to change data types in SSIS?

If you need to change a data type, you need to know how SSIS does data conversion. SSIS can do 2 types of conversion: implicit and explicit.

Implicit Conversion of SSIS Data Types

Implicit conversion is something SSIS does for you. This is also how SSIS converts a type “secretly”. It happens behind the scenes without you knowing it. That’s why implicit conversion is also known as automatic type conversion. To do this, SSIS uses 2 things: the expression evaluator and the data flow engine.

SSIS uses an expression evaluator to know the best type to use for expressions. An expression may contain literal values, functions, variables, and columns. And each of them may have a different data type. So, the expression evaluator makes sure your desired type is met.

Meanwhile, the data flow engine converts any data coming from any data format. It works the first time data is opened in a data flow.

So, in what situations does implicit conversion happens?

  • The first
    time data enters a data flow
    . Implicit conversion to SSIS data
    types happens to all columns.
  • In
    comparing 2 values or expressions
    . If you compare a DT_I8 to a
    DT_I4, the DT_I4 data converts to DT_I8 first. Only after that will
    the comparison proceeds. SSIS also does the same to other data
    types.
  • When SSIS
    writes data to a destination column
    . In our earlier example,
    the DateFounded column uses a bizarre format. So,
    SSIS cannot convert it implicitly. And an error occurred.

Later, you may want to check out the SSIS data type conversion table. This will give you an idea of what SSIS data type some SQL data type converts to.

SQL to SSIS Data Type Conversion Table

Not all SQL platforms are created the same. So, their data types may map differently in SSIS. The mapping below shows how SSIS data types map to known SQL platforms. And here we have the following:

  • SSIS data
    type mapping to SQL Server
  • SSIS data
    type mapping to MySQL
  • SSIS data
    type mapping to Oracle
  • SSIS data
    type mapping to PostgreSQL

Note that the table below serves as a guide and not a strict mapping. It is similar to the one used in the official documentation. Note also the data access components used (SQLOLEDB, SqlClient, OracleClient, and ODBC). Read more about SSIS components here.

The mapping for SQL Server and Oracle came from the official documentation. Meanwhile, MySQL and PostgreSQL conversion was taken from an actual package. This is done by examining the Input and Output Properties of the ODBC source. You can see this in the Advanced Editor of the source data. The table used for the source contains columns of different data types. The empty table served no other purpose but to be read and examined in the Advanced Editor.

Below is a screenshot of how the Boolean data type in PostgreSQL maps to DT_STR in SSIS.

This uses the Output Columns of a source component.

And here’s the PostgreSQL table structure. Note the data type of the col_boolean column.

CREATE TABLE IF NOT EXISTS public."SomeTable"
(
    col_bigint bigint,
    col_int integer,
    col_smallint smallint,
    col_bit bit(1),
    col_varchar character varying(20) COLLATE pg_catalog."default",
    col_char "char",
    col_real real,
    col_boolean boolean,
    col_datetime date,
    col_time time without time zone,
    col_money money,
    col_timestamp timestamp with time zone,
    col_json json,
    col_text text COLLATE pg_catalog."default",
    col_uuid uuid,
    col_double_prec double precision,
    col_bytea bytea
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public."SomeTable"
    OWNER to postgres;

Meanwhile, you also need to check your target table’s data types. To do that, use the External Columns of the Destination component. By matching the data type of both source and destination, you minimize the errors at runtime.

Explicit Conversion of SSIS Data Types

Explicit data conversion is telling SSIS to follow your way of converting data. So, when SSIS thinks it’s a DT_I4 (integer), you say NO. This time you’re the boss and you tell SSIS it’s a DT_WSTR (Unicode string).

This applies when the data type of the source is different from the target. And you’re not sure if SSIS can implicitly convert without data loss or errors. To convert explicitly, you can use 2 SSIS components. Please follow along in the next section.

SSIS Components to Change Data Types

The 2 components to convert from one data type to another are Derived Column transformation and Data Conversion transformation.

Using Derived Column to Fix the Error

To fix the problem in our earlier example, let’s use a Derived Column transformation.

But before that, below is the structure of the target table. The FastFoodChain column is also changed to VARCHAR(50) to avoid truncation.

CREATE DATABASE FastFoodChain;
GO

USE FastFoodChain
GO

CREATE TABLE FastFoodRestaurant
(
	ID INT NOT NULL PRIMARY KEY,
	FastFoodChain VARCHAR(50) NOT NULL,
	DateFounded DATE NOT NULL,
	Founders VARCHAR(50) NOT NULL
)
GO

And now, below is a better SSIS package from the scenario earlier:

The Derived Column will add a new column with the following expression:

(DT_DBDATE)(SUBSTRING(TRIM([ DateFounded]),1,2) + "/" + SUBSTRING(TRIM([ DateFounded]),3,2) + "/" + SUBSTRING(TRIM([ DateFounded]),5,4))


Let’s examine this expression. Remember that the output of the expression should be a DT_DBDATE. This is the same as the target column in SQL Server.

So, the first part of the expression is a cast to DT_DBDATE.

Then, the value of DateFounded from the CSV will be parsed to become MM/DD/YYYY. Recall from earlier that the format is MMDDYYYY without the backslash. This caused the error in the destination. So, the first part of the parsing is to extract the month. And that’s SUBSTRING(TRIM([ DateFounded]),1,2). We need to TRIM to remove any leading and trailing spaces. Trust me. There are spaces.

Then, combine the result with a backslash.

Next, extract the day of the month. And that’s SUBSTRING(TRIM([ DateFounded]),3,2). The month starts at the third character position. So, we used 3 as the starting point. Then, extract 2 characters.

Then, combine the result with another backslash.

Next, extract the year. And that’s SUBSTRING(TRIM([ DateFounded]),5,4). The year starts at the fifth character position. So, we used 5 as the starting point. Then, extract the last 4 characters.

The expression evaluator will do the parsing first. Then, when a date string is formed, convert it to a date data type.

That’s it.

Now, you need to map that derived column to the destination. Here’s a screenshot.

After this change, the truncation and conversion error will disappear.

Using the Data Conversion Transformation

You can use the SSIS Data Conversion Transformation for converting a column data type to another data type.

So, this is an alternative to using a CAST operation in a Derived Column transformation. But instead of typing the cast like (DT_I4), you set properties.

You may prefer a conversion that you can see at a glance. This component is visual and it’s right there when you view the SSIS package. Meanwhile, you need to double-click the Derived Column to see the same transformation.

Anyway, here’s how to do the conversion visually. See a sample package below.

It uses the same source and destination. But this time, we use a Data Conversion transformation. We still used the Derived Column transformation to reformat the DateFounded column. The Derived Column expression is the same as earlier but without the CAST. And the ID and the reformatted DateFounded column are converted using Data Conversion. So, loading the transformed data to the destination does not require implicit conversion.

Here is how the Data Conversion transformation was configured:

Notice the 2 new columns with the converted data type.

And here are the column mappings in the destination component.

That’s how easy it is to use a Data Conversion transformation.

NOTE: Another option to convert the ID column is to change it in the Output Columns of the Flat File Source. Change DT_STR to DT_I4. You can only see this in the Advanced Editor. But note that you can only do this approach if there’s no other transformation needed for the data. Since this is hidden in the Advanced Editor, it’s a good idea to put an annotation in the SSIS package.

Conclusion

That’s how you work with SSIS data types.

You check the data types done during data extraction. Then, you check the data types of the destination. If the data types don’t match, you do some explicit conversion. You can do this using Derived Column and Data Conversion transformations.

Is this a good read? Then, please share it with your friends on social media. And stop by again for more SSIS topics.

SSIS Data Flow Components Download
RELATED ARTICLES

Whitepaper

Social

Topics

Products