Thursday, April 25, 2024
HomeProductsSQL Server ToolsSQL Server 2016 Stretch Database

SQL Server 2016 Stretch Database

On June 1, SQL Server 2016 was released. It brought a large number of novelties into normal development, including the long-announced Stretch Database technology that allows migrating data from SQL Server dynamically to Azure.

SQL Server 2016 Stretch Database
From the marketing point of view, Stretch Database has been well promoted. Reasonable arguments were stating that with the growth of historical data the complexity and cost of its maintenance grows as well. As a rational solution to the problem, the automatic migration of obsolescent archival data to the cloud was introduced. Frankly speaking, I enjoyed the idea.

As from SQL Server 2016 RC0, I began testing the Stretch Database technology on two projects I’m helping to develop. The first one is TMetric, a free time tracker with OLTP load, and the second one is an internal project with DW load.
Let’s go into Stretch Database through a simple example. The first thing to do is to allow using Stretch Database since this functionality is disabled on the server by default:

EXEC sys.sp_configure 'remote data archive' , '1'
GO
RECONFIGURE
GO

There is no need to reload the server. Next, let’s create a test database:

USE [master]
GO

IF DB_ID('StretchDB') IS NOT NULL BEGIN
    ALTER DATABASE StretchDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE StretchDB
END
GO

CREATE DATABASE StretchDB /* COLLATE Cyrillic_General_CI_AS */
GO

USE StretchDB
GO

CREATE TABLE dbo.ErrorLog (
      LogID INT IDENTITY PRIMARY KEY
    , PostTime DATETIME NOT NULL DEFAULT GETDATE()
    , UserName NVARCHAR(100) NOT NULL
    , ErrorMessage NVARCHAR(MAX) COLLATE Ukrainian_BIN NOT NULL 
)
GO

INSERT INTO dbo.ErrorLog (UserName, ErrorMessage)
VALUES (N'sergeys', N'Azure row')

…and call the Stretch Database wizard from SSMS 2016:

Calling the Stretch Database wizard from SSMS 2016

The wizard cautions us at once that we must have:

1) admin privileges
2) current Azure subscription
3) allowed access to SQL Server to the Internet

Database for Stretch introductory information

On the next page of the wizard, we face the first disappointment:

The table cannot be configured for stretch

We cannot migrate data from our table to Azure because of some limitations in Stretch Database technology. I will list basic ones (limitations that are not mentioned in MSDN are highlighted in bold).

Data store in Azure is not supported for the tables which:

1) contain more than 1,023 columns and more than 998 indexes
2) include the FILESTREAM data
3) are involved in replication or use Change Tracking or Change Data Capture
4) are optimized for memory layout (In-Memory table)
5) have the Always Encrypted columns

Besides, you cannot use the following in tables:

1) all user data types (CLR, user-defined types)
2) TEXT, NTEXT, IMAGE
3) TIMESTAMP
4) SQL_VARIANT
5) XML
6) GEOMETRY, GEOGRAPHY, HIERARCHYID
7) SYSNAME
8) COLUMN_SET
9) COMPUTED columns
10) SEQUENCE in a column
11) DEFAULT and CHECK constraints
12) Full-Text Search, XML and Spatial indexes
13) there should be no index views on the table
14) external keys referring to the table (for example, you can enable Stretch for the OrderDetail child table, but you can’t do the same for the Order parent table).

As it turned out, our problem was in the DEFAULT constraint. Let’s recreate the table and try once more:

DROP TABLE IF EXISTS dbo.ErrorLog
CREATE TABLE dbo.ErrorLog (
      LogID INT IDENTITY PRIMARY KEY
    , PostTime DATETIME NOT NULL
    , UserName NVARCHAR(100) NOT NULL
    , ErrorMessage NVARCHAR(MAX) COLLATE Ukrainian_BIN NOT NULL 
)
GO

INSERT INTO dbo.ErrorLog (PostTime, UserName, ErrorMessage)
VALUES (GETDATE(), N'sergeys', N'Azure row')

Now, the wizard allows to select the table:

The warning showing the column is part of a primary key or unique key index

But the wizard cautions, that: Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the migrated data. That’s why, for the Stretch tables, it’s better to create PRIMARY KEY on the IDENTITY column, or at least, to use UNIQUEIDENTIFIER.

Next, we are prompted to log in and create SQL Server in Azure:

Create new Server in Azure

Let’s create SQL Server on Azure Portal manually:

Manual creation of SQL Server

Then, we need to select the created server in the wizard:

Select the created SQL Server

Let’s set the master password, that we will need in the future:

Provide a strong password to encrypt the database master key

Specify the exceptions to be added to Firewall:

Create a new Azure firewall rule

And here we are on the homestretch:

The Summary page in the wizard

But during the setup, another exception is waiting, and again, it is not mentioned in the help:

The error occurred during the setup

The log includes the following error:
Configure Stretch on the Database StretchDB
Status : 'Failed'
Details :
   Task failed due to the following error:
   Alter failed for Database 'StretchDB': 'Cyrillic_General_CI_AS' is not a supported collation.
   ALTER DATABASE statement failed.

During database creation, I didn’t specify collation explicitly, and it turned out that Cyrillic_General_CI_AS is not supported in addition to many other things. During testing, it turned out, that Stretch Database won’t work if collation is used in the database properties:

1) any Windows collations (Latin1_General_100_CI_AS, …)
2) any AI and BIN collations
3) there are particular problems with CS collations

To make everything work consistently, it is better to use only SQL Server collations. I haven’t checked them all, but when I used SQL_Latin1_General_CP1_CI_AS, I didn’t have any troubles:

ALTER DATABASE StretchDB COLLATE SQL_Latin1_General_CP1_CI_AS

After the COLLATE modification for the database, I ran the wizard once more and voila:

The tasks were successfully completed

Now, let’s try to track how data is migrated to Azure through the monitor in SSMS 2016:

Data migration to Azure Server

… or use a separated system view sys.dm_db_rda_migration_status:

SELECT *
FROM sys.dm_db_rda_migration_status
WHERE table_id = OBJECT_ID('dbo.ErrorLog')
    AND database_id = DB_ID()

There was a bug in RC3, and if COLLATE of columns did not coincide with COLLATE of the database, sys.dm_db_rda_migration_status would grow without being cleared.
In our example, one of the columns has COLLATE that does not coincide with the one that is set in the database properties. That’s why data migration will fail from time to time by error:

table_id    database_id migrated_rows  start_time_utc       end_time_utc         error_number error_severity error_state
----------- ----------- -------------- -------------------- -------------------- ------------ -------------- -----------
565577053   12          0              2016-06-15 15:44:41  2016-06-15 15:45:09  NULL         NULL           NULL
565577053   12          0              2016-06-15 15:45:16  2016-06-15 15:45:16  NULL         NULL           NULL
565577053   12          0              2016-06-15 15:45:16  2016-06-15 15:45:58  1205         13             55
565577053   12          0              2016-06-15 15:45:59  NULL                 NULL         NULL           NULL

But after several failed attempts it will be migrated successfully:

table_id    database_id migrated_rows  start_time_utc       end_time_utc         error_number error_severity error_state
----------- ----------- -------------- -------------------- -------------------- ------------ -------------- -----------
565577053   12          0              2016-06-15 15:46:21  2016-06-15 15:46:21  NULL         NULL           NULL
565577053   12          1              2016-06-15 15:46:21  2016-06-15 15:46:27  NULL         NULL           NULL
565577053   12          0              2016-06-15 15:47:56  2016-06-15 15:47:56  NULL         NULL           NULL
565577053   12          0              2016-06-15 15:47:56  NULL                 NULL         NULL           NULL

Therefore, we may conclude that the bug has not been completely fixed. Therefore, it is highly recommended to set the same COLLATE for columns as for the database.
In this step, we tried one of the ways of creating a Stretch table. By the way, you can do it a way simpler and faster with the script:

USE StretchDB
GO

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'VasyaPupkin12'
GO

CREATE DATABASE SCOPED CREDENTIAL azure
    WITH IDENTITY = N'server_name', SECRET = N'VasyaPupkin12'
GO

ALTER DATABASE StretchDB
    SET REMOTE_DATA_ARCHIVE = ON (
            SERVER = 'server_name.database.windows.net',
            CREDENTIAL = azure
        )
GO

ALTER TABLE dbo.ErrorLog
    SET (REMOTE_DATA_ARCHIVE = ON (MIGRATION_STATE = OUTBOUND))

The main thing is to set permissions for our Azure Server before:

Allowing access to Azure services

Now, let’s take a look at the changes in the table…

Whenever data comes to the table, it is physically stored on the local server for some time, and then it is migrated to Azure automatically. You can easily see it with help of the Live Query Statistic feature implemented in SSMS 2016:

Query execution diagram

After 5-10 seconds:

Query execution diagram 5-10 seconds later

If we take a look at the execution plan, all functionality of Stretch Database is no longer a secret anymore: a linked server is created together with a separate component that migrates data to Azure.

We get the following limitations:

1) you can’t UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration, in a Stretch-enabled table or in a view that includes Stretch-enabled tables
2) you can’t INSERT rows into a Stretch-enabled table on a linked server

That is, we can only insert data into a table:

INSERT INTO dbo.ErrorLog (PostTime, UserName, ErrorMessage)
VALUES (GETDATE(), N'sergeys', N'Local row')

Because any DELETE, UPDATE, TRUNCATE operation results in the error:

Msg 14893, Level 16, State 1, Line 6
Table '...' cannot be a target of an update or delete statement because it has the REMOTE_DATA_ARCHIVE option enabled without a migration predicate. 

Let’s try to create a Stretch table with the filter determining which data should be on the local machine, and which should be migrated to Azure:

ALTER TABLE dbo.ErrorLog
    SET (REMOTE_DATA_ARCHIVE (MIGRATION_STATE = INBOUND))

Next, we create a function and a column that will store an attribute indicating that data can be migrated to cloud:

CREATE FUNCTION dbo.fn_stretchpredicate(@IsOld BIT)
RETURNS TABLE
WITH SCHEMABINDING 
AS
RETURN
    SELECT 1 AS is_eligible
    WHERE @IsOld = 1
GO

ALTER TABLE dbo.ErrorLog ADD IsOld BIT
GO
During the secondary disposal of the table in cloud, we need to specify the following filter:
ALTER TABLE dbo.ErrorLog SET (
    REMOTE_DATA_ARCHIVE = ON (
        FILTER_PREDICATE = dbo.fn_stretchpredicate(IsOld),
        MIGRATION_STATE = OUTBOUND
    )
)

When the command work is over, let’s see the changes. Firstly, you can do anything you like with local data till the migration condition is not specified. You can delete, update it:

UPDATE dbo.ErrorLog
SET IsOld = 0

And when we need to selectively migrate data to Azure:

UPDATE TOP(1) dbo.ErrorLog
SET IsOld = 1

We need to remember this: what is already located in cloud cannot be returned with a simple UPDATE:

Msg 14875, Level 16, State 1, Line 14
DML operation failed because it would have affected one or more migrated (or migration-eligible) rows.

Let’s see how many space our data takes:

EXEC sys.sp_spaceused 'dbo.ErrorLog', @mode = 'LOCAL_ONLY'
EXEC sys.sp_spaceused 'dbo.ErrorLog', @mode = 'REMOTE_ONLY'
name           rows  reserved   data   index_size    unused
-------------- ----- ---------- ------ ------------- --------
dbo.ErrorLog   1     72 KB      8 KB   8 KB          56 KB
name           rows  reserved   data   index_size    unused
-------------- ----- ---------- ------ ------------- --------
dbo.ErrorLog   1     144 KB     8 KB   24 KB         112 KB

Now, let’s try to use a filter in our queries in order to make a call to Stretch tables more effective:

SELECT * FROM dbo.ErrorLog WHERE IsOld = 0
SELECT * FROM dbo.ErrorLog WHERE IsOld = 1

If we look at the execution plan, in the first case we don’t need to make connection to the linked server which should work a way faster:

The comparison between execution plans

Did we really find a variant when the usage of Stretch Database would be feasible? In real life, everything is not that good, because both queries are both slow because of the bug:

Table 'ErrorLog'. Scan count 1, logical reads 2, physical reads 0, ....

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1225 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table 'ErrorLog'. Scan count 1, logical reads 2, physical reads 0, ....

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1104 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

At every call to the Stretch table, the connection to Azure takes place:

The connection to Azure: IP and host mapping
IP address and the exact location

Because of such behavior, any queries with the Stretch table will freeze or fail because of time-out. The following limitations will seem ridiculous against the performance problems described above:

1) you can’t create an index for a view that includes Stretch tables
2) filtered indexes are not propagated to the remote table

Summary

As I wrote above, I tried to use Stretch Database for two projects. There was a large table with spans of work-time in TMetric time tracking app. Call to this table goes quite actively, that’s why Stretch Database simply could not do well because of its slow performance. The size of the table was treated in a simple way: all external keys were deleted and the table was recreated with the clustered COLUMNSTORE index (after all, the table was reduced by 12 times).

As for the inner project, a trick with sections switch came in handy, and I really liked to use it. The point is that two tables (A and B) are created. We work with table A actively, and then switch section to table B, which is a Stretch table:

DROP TABLE IF EXISTS A
GO
CREATE TABLE A (val INT PRIMARY KEY)
GO
INSERT INTO A SELECT 1
GO

DROP TABLE IF EXISTS B
GO
CREATE TABLE B (val INT PRIMARY KEY)
GO

ALTER TABLE B
    SET (REMOTE_DATA_ARCHIVE = ON (MIGRATION_STATE = OUTBOUND))
GO

ALTER TABLE A SWITCH TO B
GO

SELECT * FROM A
SELECT * FROM B

In fact, we deal with the demarcation between historic and operative data, and as a result, we delicately sidestep the performance problem. This post covers all problems I faced for two-three months of working with Stretch Database. I do hope it will be useful for readers.

RELATED ARTICLES

Whitepaper

Social

Topics

Products