Software That Shines

Better Software Blog

Software Insights

SQL Server Temporal Table Tricks and Troubles

Temporal tables are a useful feature introduced with SQL Server 2016 to track changing data. Triggers, boilerplate code, all the fun custom debris that was needed to support this in the past, it's all gone. There's also handy new syntax for querying historical data, such as

SELECT *
FROM Table
FOR SYSTEM_TIME AS OF '2019-01-05 12:00:00'

Or

FOR SYSTEM_TIME BETWEEN X and Y
FOR SYSTEM_TIME ALL

If you have views on temporal tables, they can also be queried this way.

Do note that SYSTEM_TIME is in UTC. That applies to the stored time stamps and the querying. Also note that a temporal table requires a primary key (whether it be clustered or nonclustered).

But all that can be read in the documentation (https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables). What I want to discuss today are two pieces:

  1. Tips on efficiently converting existing data to temporal tables

  2. Using temporal table functionality to track update timestamps without storing history!

There's also a caveat you should be aware of, which I'll demonstrate:

  1. The timestamps are determined by the start of the transaction, not the insert/update/delete statement. This can cause modifications to fail (very similar to snapshot isolation's "optimistic concurrency")

Tips on efficiently converting existing data to temporal tables

If you have existing data with created/modified dates, that can be converted for populating the temporal table data. The temporal table utilizes two columns, which I'll call SysStartTime and SysEndTime to match the documentation. Those columns determine the period for which the row is active. The current row will always have SysEndTime of 9999-12-31 23:59:59.9999. In other words, the "end time" is essentially "infinity" because a row in the current table is eo ipso still active.

As mentioned, the timestamps must be stored in UTC. Additionally, they must be DATETIME2 columns. DATETIME2 is largely an improvement on DATETIME. It can store a larger range, and it has variable precision, which can either save space or record more granular timestamps.

If you want to save space, DATETIME2(2) (or less) would take up 7 bytes, to DATETIME's 8. DATETIME2(4) (or 3) would take up the same space (8 bytes) but give you more precision. Or you can store up to 7 digits of precision at the cost of 9 bytes. You can read more details on DATETIME 2 here. See the footnote that the storage size listed, for some strange reason, omits a byte.

Let's pretend we have the following legacy data. For simplicity, I'm updating ModifiedDate in the query rather than adding a trigger.

CREATE TABLE TestTable
(
   Id INT IDENTITY
      CONSTRAINT PK_TestTable PRIMARY KEY
   , Value NVARCHAR(300)
   , CreatedDate DATETIME
      CONSTRAINT DF_TestTable_CreatedDate DEFAULT(GETDATE())
   , ModifiedDate DATETIME
);
GO
INSERT INTO TestTable (Value, ModifiedDate)
VALUES ('A', GETDATE());
SELECT *
FROM TestTable;
WAITFOR DELAY '00:00:05';
UPDATE TestTable
SET Value = 'B'
   , ModifiedDate = GETDATE()
WHERE Value = 'A';
SELECT *
FROM TestTable;

Let's walk through the process of converting this to a temporal table. CreatedDate is actually not going to play a part per se, because that doesn't change when the row does. If this were a new table with history built in from the beginning, it may end up being a redundant concept, since it would be equivalent to SELECT MIN (SysStartTime) … FOR SYSTEM_TIME ALL, but since we're converting a legacy table, we'll keep it. It's ModifiedDate that will play the role of SysStartTime. Nevertheless, we should probably be consistent, and convert CreatedDate to UTC and DATETIME2 also. It would be extremely confusing to have CreatedDate in local time, but ModifiedDate in UTC.

Note that, for the purposes of illustration (not necessarily following best practices!) our dates are in local time instead of UTC. Another useful query feature introduced in SQL 2016 is 'AT TIME ZONE'. We can use this to convert the existing data to UTC with relative ease. You could also use this to create a computed column for a specific time zone, although you have to specify that time zone explicitly.

Let's convert CreatedDate and ModifiedDate. First, we'll alter the columns to DATETIME2(2). Note that ModifiedDate/SysStartTime must not be null to use it with the temporal table functionality. If, for example, you only update "ModifiedDate" when a row is actually modified, you might want to copy it from CreatedDate.

ALTER TABLE TestTable DROP CONSTRAINT DF_TestTable_CreatedDate;
ALTER TABLE TestTable ALTER COLUMN CreatedDate DATETIME2(2) NOT NULL;
ALTER TABLE TestTable ALTER COLUMN ModifiedDate DATETIME2(2) NOT NULL;
ALTER TABLE TestTable ADD CONSTRAINT DF_TestTable_CreatedDate DEFAULT (SYSUTCDATETIME()) FOR CreatedDate;

Now let's use AT TIME ZONE to convert from server time (which we'll say is Eastern Standard) to UTC. Note that AT TIME ZONE returns a DATETIMEOFFSET, so first we must say that it starts in Eastern Standard, and from there, convert it to UTC.

UPDATE TestTable
SET CreatedDate = (CreatedDate AT TIME ZONE 'Eastern Standard Time') AT TIME ZONE 'UTC'
   , ModifiedDate = (ModifiedDate AT TIME ZONE 'Eastern Standard Time') AT TIME ZONE 'UTC';

Similarly, if we want to convert the DATETIME2 from UTC to a specific time zone, we must first get it treated specifically as UTC before converting it.

SELECT *
   , (CreatedDate AT TIME ZONE 'UTC') AT TIME ZONE 'Eastern Standard Time'
   , (ModifiedDate AT TIME ZONE 'UTC') AT TIME ZONE 'Eastern Standard Time'
FROM TestTable;

"SysEndTime" is much simpler. It will always be "infinity" for all existing rows, so we can use a default constraint to populate it. For consistency with CreatedDate/ModifiedDate I will just call it "EndDate" (as opposed to SysEndTime). Note that its DATETIME2 precision must be the same as that of the "SysStartTime" column.

ALTER TABLE TestTable ADD EndDate DATETIME2(2) NOT NULL 
   CONSTRAINT DF_TestTable_EndDate DEFAULT ('9999-12-31 23:59:59.9999999');
SELECT *
FROM TestTable;

Obviously if you don't want this column at the end of the table, you'll pretty much have to script it to copy all the data over to a new table.

So now we have all the columns we need, and they're in the format we need, so we can convert it to a temporal table. There is no longer a need for defaults for ModifiedDate or EndDate; they will be generated by the system.

GO
ALTER TABLE TestTable DROP CONSTRAINT DF_TestTable_CreatedDate;
ALTER TABLE TestTable DROP CONSTRAINT DF_TestTable_EndDate;
ALTER TABLE TestTable ADD PERIOD FOR SYSTEM_TIME (ModifiedDate, EndDate);
ALTER TABLE TestTable SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.TestTableHistory, DATA_CONSISTENCY_CHECK = ON));

Note that if you specify a name for the desired history table, needs to have the schema specified. Also note that if you specify a HISTORY_TABLE and then for some reason set SYSTEM_VERSIONING = OFF and then want to turn it back on, you must specify the HISTORY_TABLE again too, otherwise it will use the default naming convention and create a new history table.

Let's try it out!

GO
UPDATE TestTable
SET Value = 'C'
WHERE Value = 'B';
SELECT *
FROM TestTable;
SELECT *
FROM TestTable FOR SYSTEM_TIME ALL;

Easy!

If you don't want this new EndDate column to be returned with SELECT *, you can mark it as hidden.

ALTER TABLE TestTable ALTER COLUMN EndDate ADD HIDDEN;

Now note the difference in SELECT *

SELECT *
FROM TestTable;
SELECT *
FROM TestTable FOR SYSTEM_TIME ALL;

Using temporal table functionality to track update timestamps without storing history

Now for the second topic. You may have noticed earlier that enabling the temporal table involved two distinct statements:

ALTER TABLE TestTable ADD PERIOD FOR SYSTEM_TIME (ModifiedDate, EndDate);
ALTER TABLE TestTable SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.TestTableHistory, DATA_CONSISTENCY_CHECK = ON));

It is required to set ADD PERIOD FOR SYSTEM_TIME in order to use SYSTEM_VERSIONING, but it is totally fine to use ADD PERIOD FOR SYSTEM_TIME without a history table! You can use this to have an automatically-updating "Modified Date" column without any triggers, at the cost of the extra "EndDate" column (which is essentially useless without the history table). For example:

CREATE TABLE TestTable2
(
   Id INT IDENTITY
   , Value NVARCHAR(300)
   , CreatedDateUtc DATETIME2(2) 
      CONSTRAINT DF_TestTable2_CreatedDateUtc DEFAULT(SYSUTCDATETIME())
   , CreatedDate AS (CreatedDateUtc AT TIME ZONE 'UTC') AT TIME ZONE 'Eastern Standard Time'
   , UpdatedDateUtc DATETIME2(2) GENERATED ALWAYS AS ROW START
   , UpdatedDate AS (UpdatedDateUtc AT TIME ZONE 'UTC') AT TIME ZONE 'Eastern Standard Time'
   , RowEndDateUtc DATETIME2(2) GENERATED ALWAYS AS ROW END HIDDEN
   , PERIOD FOR SYSTEM_TIME (UpdatedDateUtc, RowEndDateUtc)
);
INSERT INTO TestTable2 (Value)
VALUES ('A');
SELECT *
FROM TestTable2;
WAITFOR DELAY '00:00:05';
UPDATE TestTable2
SET Value = 'B'
WHERE Value = 'A';
SELECT *
FROM TestTable2;

A reliable UpdatedDateUtc column, with no triggers and no extra logic. The cost, unfortunately, is an extra RowEndDateUtc column taking up at least 7 bytes.

As for HIDDEN, unfortunately for whatever reason it is only supported on the temporal table columns, so you can't use it to hide arbitrary columns.

SYSTEM_TIME behavior and concurrency failures

It's listed in the documentation, but easy to overlook: the timestamps recorded in the history are from the beginning of the transaction. Once you BEGIN TRANSACTION, all temporal table updates will use that same timestamp, regardless of when they actually took place.

The first implication of this is that multiple updates to the same table will appear to be at the same time. This also means that FOR SYSTEM_TIME queries won't even see some of the records! As the documentation states, it filters out records where the start time is equal to the end time. For example (using our TestTable above):

WAITFOR DELAY '00:00:05';
BEGIN TRANSACTION;
UPDATE TestTable
SET Value = 'D'
WHERE Value = 'C';
WAITFOR DELAY '00:00:05';
UPDATE TestTable
SET Value = 'E'
WHERE Value = 'D';
WAITFOR DELAY '00:00:05';
UPDATE TestTable
SET Value = 'F'
WHERE Value = 'E';
COMMIT;


SELECT *
FROM TestTable FOR SYSTEM_TIME ALL;
SELECT *
FROM dbo.TestTableHistory

UNION ALL

SELECT Id, Value, CreatedDate, ModifiedDate, EndDate
--EndDate is HIDDEN, but that doesn't apply to history, so the UNION ALL would fail
FROM dbo.TestTable;

Notice how the 'D' and 'E' rows don't show up in the FOR SYSTEM_TIME query (it only has 'B', 'C', and 'F'), and notice how if we query the history table directly (with UNION ALL), the 'D', 'E', and 'F' rows all have the same modified date even though we waited 5 seconds in between each modification.

It gets worse. In one query window, run the following to start a new transaction:

BEGIN TRANSACTION;

In a second query window (not the one that just started a transaction), run this:

UPDATE dbo.TestTable
SET Value = 'G'
WHERE Value = 'F';

Now go back to the first one (with the open transaction), and try and update the TestTable:

UPDATE dbo.TestTable
SET Value = 'H'
WHERE Value = 'G';

It should fail with the following error:

Msg 13535, Level 16, State 0, Line 3
Data modification failed on system-versioned table 'dbo.TestTable' because transaction time was earlier than period start time for affected records.
The statement has been terminated.

What happened is the following:

  1. The first query begins a transaction at time X

  2. The second query updates the row at time X+1

  3. Switching back to the first query, we update the table at time X+2

As previously mentioned, the update timestamp is that of the transaction start (time X). But another transaction updated the data at (time X+1). When the second update occurs at (time X+2), it would be trying to set the timestamp back in time to (time X), which triggers the error above, which should now make perfect sense: the transaction time, i.e. time X, that it's trying to set, is earlier than the period start time (the modified date on the current row), i.e. time X+1, and so the update fails.

Note that this doesn't occur on TestTable2, where there is no history table. Also note that even if you set DATA_CONSISTENCY_CHECK = OFF, it doesn't affect this situation.

Summary

In general, temporal tables are a useful feature for tracking changes to tables. I showed you an example of converting some existing modification logic to use temporal tables, alongside the 'AT TIME ZONE' feature of SQL 2016. I also showed how you can leverage the system-generated period logic to implement a modification date column.

I also point out a possible surprise when using temporal tables: that the modification date is always set to the beginning of the transaction. Additionally I showed an example of how this can cause modification queries to actually fail in certain circumstances.

Hope this helps!

Mark SowulSQL ServerComment