Is it possible for a SQL table’s index to get fragmented even if its data doesn’t get updated, inserted or deleted?
Recently I found that this is the case in very specific circumstances. I was using a merge statement to update some existing data, though in my scenario the data had not changed so the merge statement shouldn’t have made any changes to the table data and the index fragmentation should not have changed dramatically.
It should also be noted this only happened with SQL Azure with an index fill factor of 0, but the same didn’t happen with my on premise SQL server. With SQL Azure, if my index fill factor is set to 0 the index gets heavily fragmented. The full results can be found at the end of this post but first I will present a couple of SQL scripts that recreate this index fragmentation.
Script 1 - Data insert into a test table with ~100,000 records.
Script 2 - Update test table with 30,000 records. The merge statement in the script will only update existing data (as we are selecting from the original data) but I have intentionally commented out the insert part of the merge to make sure we are only updating. The values being updated in this case are actually unchanged.
--Script 1: --Do initial insert IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'TestTable') BEGIN DROP TABLE TestTable END CREATE TABLE [dbo].[TestTable] ( [UnitID] [int] not null, [ReportedDate] [datetime] not null, [Serial] [bigint] null, [Value] [smallint] NULL ) GO ALTER table [dbo].[TestTable] ADD CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED ([ReportedDate], [UnitID]) GO declare @currentDate datetime = dateadd(day, -14, getdate()) declare @currentUnitID int = 1 declare @unitTotal int = 300 while (@currentDate < GETDATE()) begin set @currentUnitID = 1; while (@currentUnitID < @unitTotal) begin insert into TestTable (UnitID, ReportedDate, Serial, [Value]) values (@currentUnitID, @currentDate, 1234, 50) set @currentUnitID = @currentUnitID + 1 end set @currentDate = dateadd(hour, 1,@currentDate) end select COUNT(*) from [dbo].[TestTable] --Get Fragmentation SELECT a.index_id, name, avg_fragmentation_in_percent, b.fill_factor FROM sys.dm_db_index_physical_stats (DB_ID(N'MSLMessagingTest'), OBJECT_ID('TestTable'), NULL, NULL, NULL) AS a JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
--Script 2 --Do updates (approx 30,000) using merge --Create table varible and populate declare @UpdateTable TABLE ( [UnitID] [int] not null, [ReportedDate] [datetime] not null, [Serial] [bigint] null, [Value] [smallint] NULL ) insert into @UpdateTable (UnitID, ReportedDate, Serial, [Value]) select top 30000 * from [dbo].TestTable t order by t.[ReportedDate] desc select COUNT(*) from @UpdateTable MERGE INTO [dbo].TestTable WITH (HOLDLOCK) AS target USING @UpdateTable AS source ON target.ReportedDate = source.ReportedDate and target.UnitID = source.UnitID WHEN MATCHED THEN UPDATE SET target.[Value] = source.[Value], target.[Serial] = source.[Serial] /* WHEN NOT MATCHED BY TARGET THEN INSERT ([ReportedDate], [UnitID],[Value], [Serial]) VALUES (source.[ReportedDate], source.[UnitID], source.[Value], source.[Serial])*/; --Get Fragmentation SELECT a.index_id, name, avg_fragmentation_in_percent, b.fill_factor FROM sys.dm_db_index_physical_stats (DB_ID(N'MSLMessagingTest'), OBJECT_ID('TestTable'), NULL, NULL, NULL) AS a JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
Results (index fill factor set to 0):
Run script 1 to insert ~100,000 records:
- Local SQL Fragmentation: 0.52%
- Azure SQL Fragmentation: 0.53%
Run script 2 to ‘update’ the data
- Local SQL Fragmentation: 0.52%
- Azure SQL Fragmentation: 43.82%
As you can see the difference in fragmentation is very different with the Azure SQL database after we run the second script.
If I set the fill factor to 80% it helps to alleviate the fragmentation. In my real world case, the fragmentation creeps up my a fraction of a percentage per day and is a lot more manageable.
I reached out to Azure support and they responded with the following:
Product group has confirmed that with accelerated database recovery feature, the new behavior is expected. It you want the ability to disable in-row versioning for a table, we have a traceflag that can be used to disable in-row versioning (but it can cause performance issues on the updates as the versions would go off-row). Currently the traceflag is not exposed in golden bits. We cannot disable Accelerated Database Recovery feature as that is needed to handle long recovery issues during failovers due to deployment or other reasons. Possible workarounds are the following:
- Setting the fillfactor. You are currently using this option.
- Enabling the traceflag to disable in-row versioning on the logical server al together. Not recommended