Slowly changing dimensions and the merge statement

There are two types of people in the world

  • Those who don't know something, but claim to know something and take what they are given at face value.
  • Those who kind of know about something, are open about it, and investigate it to see if it is really what it claims to be.

Actually, there is a third person - the person who doesn't care.

The aim of this piece of work

I wanted to be in a position to assess whether I can genuinely create code to generate merge statements off the back of the table definition alone. It seems, from my work, that it is possible. This will allow me to delegate responsibility of state management to single aggregate root stored procedures, but to have individual procedures responsible for each table's logical unit of work. This should then reduce the complexity of my business and data layer inside my project. The short answer is, yes, I now have a pattern I can use to perform record versioning off of database metadata.

Background to type 2 variations on record versioning

To be honest, I have never been somebody who religiously remembers everything but somehow a lot sticks. The way to think of a type 2 change is to think about the idea that a woman gets married, changes her surname but that her bank manager wants to know what her name was before she got married and now what her name is since she got married. Maybe her bank manager wants to know the effect of a surname change from Ferrari to Slidy-Butternut-Squash will have on her credit rating - in my eyes it would go up, but I am not a bank manager? There are variations on this type two - loads of variations.

How is the relationship to the outside world to be viewed after that change? I see there being two main requirements;

  • Other entities recognise the record as they were.
  • Other entities can recognise the record as they were before and are now based upon a point in time.

The difference can be explained as;

  • Primary Table PackageId, PackageKey, PackageName, PackageDescription, Price, ActiveFrom, ActiveUntil. Foreign key tables refer to the Package Id column. We can pretty much consider this relationship to die once the old primary record is closed off. This is unlikely to be a desired use case in many circumstances, but in some situations it is.
  • Primary Table PackageId, PackageKey, PackageName, PackageDescription, Price, ActiveFrom, ActiveUntil. Foreign key tables refer to the Package Key column. This relationship seems more common. We can call this the super key approach.

Imagine, we have a number of tables where we want to be certain they were all valid at the same time. Well, we then need to either version all tables with a common record key or record timestamp or propagate changes through the related tables. Sometimes we call this a reporting key or reporting timestamp. By introducing the concept of a fixed point in time, it can help us to work out a record's state in relation to other records based upon a central reference point.

Except, many systems are more complicated than that.

SQL Server record versioning

We now have temporal tables in SQL Server. In Oracle, we had a number of approaches including auditing. SQL Server, I believe it was 2008, introduced the merge statement and it is purportedly the ansi way of managing record state. Except there are two main observations;

  1. It is one of the most confusing coding constructs to understand and doesn't begin to consider how complicated record versioning is.
  2. Is allegedly buggy.

SQL Server Sequence

One of the most confusing elements in SQL Server is its implementation of the SEQUENCE. A Sequence represents an incrementing number external to the inserting of a record. Those who have worked with Oracle, know how useful this is. It allows you to generate a key, and insert it wherever you want. It means, for example, we could create a transaction ID and store it in lots of different places without needing a transaction table. In the SQL Server world, a sequence is vital because you are otherwise left with using a GUID to achieve a similar thing and if you have wide clustered indexes - yes, it can make sense. Unique Identifier = 16 bytes, Bigint = 8 bytes.

The most notable element on the SQL Server Sequence, is the number of places you can't use the next value for . Look here. https://docs.microsoft.com/en-us/sql/t-sql/functions/next-value-for-transact-sql?view=sql-server-2017 Basically, you can't use a Sequence inside a merge statement - in fact, how about we just focus on the one place we can use a sequence in SQL Server. Which pretty much makes us have to work out whether we want to create a sequence on the condition of whether data is new or not.

About the merge statement

The merge statement claims to be able to solve the Type 2 versioning. In our table example we have two "Key" columns - PackageId and PackageKey. These PackageId is an identity column. The Package Key is the record anchor which other tables can refer to. If we remember, we either want to officially carry the changes at a point in time OR we want to take full control and decide what records gets attached. A good example would be the expiration of a driver's license. Whether we should be modelling this inside a database is a question in itself, but anyway.

The merge statement is very useful. However, it is incredibly complicated once you try to take it to it's entirety. My main observation is, we have to rely upon a temporary table or table variable to capture part of the changed data to fully manage a type 2 change.

The below is code I have written where I see it as being appropriate for handling type two changes on a table. Some assumptions are made;

  1. When data is being populated, new records have a package key of 0.
  2. We don't want to delete data, we want to version data.
  3. We are not concerning ourselves with related foreign tables.
  4. Null value columns don't exist in the target table.
  5. We need the table variable to have nullable columns - certainly on the package Id to handle the null deleted output clause.
  6. I can definitely optimise this code and make it neater.

ALTER procedure payment.SetupPackage

@typPackage [payment].[typPackage] READONLY
AS 

DECLARE @pckgtemp AS TABLE(
 [PackageId] [bigint]  NULL,
 [PackageKey] [bigint]  NULL,
 [PackageName] [nvarchar](100) NULL,
 [PackageDescription] [nvarchar](300) NULL,
 [Price] [decimal](30, 8)  NULL,
 [ActiveFrom] [datetime] NULL,
 [ActiveUntil] [datetime] NULL
)
Declare @ActiveFrom DateTime = DateAdd(ms,3, Getdate() );
Declare @ActiveUntil DateTime = DateAdd(ms,-3,@ActiveFrom);
DECLARE @SeqTab TABLE (SeqID bigint);
DECLARE @typPackage_SEQ [payment].[typPackage]
INSERT INTO @typPackage_SEQ
(
PackageId
,PackageKey
,PackageName
,PackageDescription
,Price
,ActiveFrom
,ActiveUntil
)
select
PackageId
,NEXT VALUE FOR payment.SeqPackage
,PackageName
,PackageDescription
,Price
,ActiveFrom
,ActiveUntil
FROM
@typPackage src
where PackageKey = 0;

SELECT * FROM @typPackage_SEQ;
MERGE INTO [payment].[Package] TARGET
USING
(
SELECT
PackageId
,PackageKey
,PackageName
,PackageDescription
,Price
,ActiveFrom
,ActiveUntil
FROM
@typPackage src
WHERE PackageKey <> 0
union all
SELECT
PackageId
,PackageKey
,PackageName
,PackageDescription
,Price
,ActiveFrom
,ActiveUntil
FROM
@typPackage_SEQ
)
SOURCE
ON
TARGET.PackageKey = SOURCE.PackageKey
AND
TARGET.ActiveUntil = '31 DEC 9999'
WHEN MATCHED
AND
(TARGET.PackageName <> SOURCE.PackageName
OR
TARGET.Price <> SOURCE.Price
OR
TARGET.PackageDescription <> SOURCE.PackageDescription
)
then
UPDATE SET
Target.ActiveUntil = @ActiveUntil
WHEN NOT MATCHED BY TARGET AND SOURCE.PackageID = 0
THEN
INSERT
(PackageKey
,PackageName
,PackageDescription
,Price
,ActiveFrom
)
VALUES
(
SOURCE.PackageKey
,SOURCE.PackageName
,SOURCE.PackageDescription
,SOURCE.Price
,@ActiveFrom
)
WHEN NOT MATCHED BY SOURCE
then
DELETE
output INSERTED.*
 into @pckgtemp --CHK.PackageId
--output inserted.* into @pcktemp chk.
;
/*extra step*/
INSERT INTO
[payment].[Package]
(
PackageKey
,PackageName
,PackageDescription
,Price
,ActiveFrom
)
select
PackageKey
,PackageName
,PackageDescription
,Price
,@ActiveFrom
from
@typPackage src
where
EXISTS
(SELECT * FROM @pckgtemp CHK
where
src.PackageKey = chk.PackageKey
and
chk.ActiveUntil = @ActiveUntil
)
;

GO

 

https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-2017

https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/

 

https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-2017

 

Add comment