Use merge to split data into multiple tables
I wish over the years, as I’ve learned various tricks, that I had saved where/who I got it from so that I could credit back later on. I guess I never expected to have a blog one day.
This is a trick I’ve only had to use a few times over the years, but I was going though some code I had written a while back which uses this technique and I thought it would be fun to write about it.
Here’s the scenario
You’ve got a table with data you need to split into two tables.
Table1 has an identity primary key (PK),
Table2 has a foreign key (FK) pointing back to the PK on
Take a second and think about how you would you do this.
The basic idea is you need to insert records into
Table1, somehow get their identity values, then insert those identity values into the FK column when you perform your insert into
I can think of a few ways to do it…
Insert and select. Insert into
Table1. Select the records you just inserted back from
Table1 to get the new identity values. Use some sort of natural key to join those records back to the source, then perform the insert into
Table2. This is a little janky because you risk causing duplicates or bad matches if your natural key isn’t as unique as you thought it was. It’s also not very efficient because you have to hit
Table1 twice, once for the insert, then again for the select.
Insert with output clause. Basically the same idea as above, except instead of performing the select, you use an
OUTPUT...INTO clause on your insert.
Use a sequence. Create a sequence object. Assign the sequence to the PK column on
Table1 using a default constraint. Now you can use your sequence to generate the values prior to inserting into both tables.
Add a helper column to the destination table. Create some kind of “helper” column on
Table1 which stores a unique ID from the source table. Combine this with using the “insert with output clause”, and it makes for an interesting solution. Now you can return both the new identity value and the original key from the source allowing you to easily map the new identity back to the source data to be later used with your insert into
Unless you have access to make schema changes, some of these may not be good solutions for you. You can also run into issues when trying to re-query your inserted data if you don’t have a reliable and efficient way to get it back, especially if it’s a large table and not indexed in a way useful to you.
In comes the merge statement
MERGE might sound like an odd tool to use for this, especially since you’re only planning to do an insert, no updates or deletes. The thing that makes it awesome is the
OUTPUT clause. I know what you’re thinking…“but the output clause is supported on insert statements too”…yes…you’re right; but when using merge, you get something a little extra with the
When using the
OUTPUT clause with a
MERGE statement, not only do you get access to the inserted/deleted columns of the destination table, but you also have access to columns in the source table. This is not something you can do with an
OUTPUT clause on a simple
Setup the demo
Create some sample data. This is the source table we want to split across two tables:
CREATE TABLE #Source ( SourceID int IDENTITY(1,1), Col1 varchar(10), Col2 varchar(10), Col3 varchar(10), Col4 varchar(10), ); INSERT INTO #Source (Col1, Col2, Col3, Col4) VALUES ('a', '1','b', '2') , ('c', '3','d', '4') , ('e', '5','f', '6') , ('g', '7','h', '8') , ('i', '9','j','10') , ('k','11','l','12') , ('m','13','n','14');
Set up the schema (sorry, can’t use temp tables for this one, they don’t support FK’s):
CREATE TABLE dbo.Table1 ( Table1ID int IDENTITY(1,1), Col1 varchar(10), Col2 varchar(10), PRIMARY KEY CLUSTERED (Table1ID), ); CREATE TABLE dbo.Table2 ( Table1ID int, Col3 varchar(10), Col4 varchar(10), FOREIGN KEY (Table1ID) REFERENCES dbo.Table1 (Table1ID), );
Lastly, we need a table we can use for temporarily storing the old and new identity values.
CREATE TABLE #IDs ( SourceID int, Table1ID int, );
The key to making this whole thing work
MERGE INTO dbo.Table1 t USING #Source s ON 1 = 0 WHEN NOT MATCHED BY TARGET THEN INSERT (Col1, Col2) VALUES (s.Col1, s.Col2) OUTPUT s.SourceID, Inserted.Table1ID INTO #IDs (SourceID, Table1ID);
The result of this query is that now
Table1 is populated with
#Source.Col2, it generated identity values for each record (
Table1.Table1ID). It then returned (
#Source.SourceID and the
Table1.TableID columns and insert them into our ID mapping table
#IDs. All of that being done in a single statement.
Let’s go through this. For the most part, it’s just a simple boring merge statement. We want to merge data from
Table1. The two pieces here that make this work are…
First, we’ve hard coded
1 = 0 in for the join logic. This is to force the merge to always jump to the
WHEN NOT MATCHED BY TARGET part of the merge, since all we care about is performing inserts.
Second, we’ve added an
OUTPUT statement returning both the original identity PK value from
#Source as well as the new identity value from
Table1. Thus giving us a perfect mapping of our source data to the new identity values!
Now that we’ve got
Table1 populated, lets finish it up and populate
Table2 using our ID Mapping table to include the FK values.
INSERT INTO dbo.Table2 (Table1ID, Col3, Col4) SELECT id.Table1ID, s.Col3, s.Col4 FROM #Source s JOIN #IDs id ON id.SourceID = s.SourceID
Now verify your results:
SELECT t1.Table1ID, t1.Col1, t1.Col2, t2.Col3, t2.Col4 FROM dbo.Table1 t1 JOIN dbo.Table2 t2 ON t2.Table1ID = t1.Table1ID
For those interested, here’s what the execution plan looks like:
If you want to know more about the details of a merge execution plan, check out this blog post from Hugo Kornelis explaining the details of a merge execution plan from his “plansplaining” series.