Performance of calculated fields in Linq

Long story short - be careful when using Linq that you don’t include calculated fields in your where clauses, or if you do make sure you know how the query is being executed.

The following uses Linq in .Net Core 2.0.

Take for example a database table as follows:

CREATE TABLE [dbo].[Ticket](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](100) NULL,
[Description] [nvarchar](max) NULL,
[Status] [int] NULL,
CONSTRAINT [PK_Ticket] PRIMARY KEY CLUSTERED ([ID] ASC)
)

CREATE NONCLUSTERED INDEX [IX_Ticket_StatusID] ON [dbo].[Ticket]
(
[Status] ASC
)

The table has a ‘Status’ column with an associated index that we will be searching on. The table is populated with 1,000,000 records. There are four different Status values (1,2,3,4) with are approximately spread evenly across the records.

We also have a data model that will be used by the data context to map to the database table:

public class Ticket
{
	public int ID { get; set; }
	public string Title { get; set; }
	public string Description { get; set; }
	public StatusType Status { get; set; }
	public int StatusID {
		get
		{
			return (int)Status;
		}
	}
}

Notes:

The Status property will return an enumerator (called StatusType) that maps to the Status column in the database table.
The StatusID property in a simple calculated field, the return the integer value of the Status property.

Now consider the following two Linq statements:

var query1 = await _context.Ticket.CountAsync(t => t.StatusID == 1);

var query2 = await _context.Ticket.CountAsync(t => t.Status == Common.Enums.StatusType.New);

Query 1 uses the calculated property, and takes approximately 5,000 milliseconds to run.

Query 2 uses the property that directly maps to the table field, and takes approximately 20 milliseconds to run.

The reason for this, is that Query 1 has to do a calculation on the data and to do this it needs to:

Return all the table data into memory.
Perform the calculation on each record within the C# code.
Assign the matching records to the query1 variable.

Query 2 on the other hand can count the matching records all within the SQL server, and only the matching records are returned to the C# code and assigned to the query2 variable. This uses the capabilities of SQL, reduces the amount of data passed from the SQL server to the application code and reduces the amount of memory required by C#.

To further illustrate this, lets take a look at the actual SQL code ran in each case:

Query 1 SQL:

SELECT [t].[ID], [t].[Description], [t].[Status], [t].[Title]
FROM [Ticket] AS [t]

Query 2 SQL:

SELECT COUNT(*)
FROM [Ticket] AS [t]
WHERE [t].[Status] = 1

As you can see, Query 1 returns all the table data, where as Query 2 performance a simple Count on the table.

It is always worth while checking what SQL is generated from your Linq statements when developing your application. I like to use the free Express Profiler that can monitor all the queries that are being executed on your SQL server in realtime.