Info from Diego:
I got a question on this from Unai Zorrilla (a friend of mine who has written a couple of books on EF in Spanish), and when I tried what he explained to me I was very surprised because I had never heard of the issue before:
For a query like this:
var toSkip = 4;
var toTake = 5;
var query = context.Mesas.OrderBy(m => m.Id).Skip(toSkip).Take(toTake);
We produce a translation like this:
SELECT TOP (5)
[Extent1].[Id] AS [Id],
[Extent1].[Capacidad] AS [Capacidad],
[Extent1].[Ubicacion] AS [Ubicacion]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[Capacidad] AS [Capacidad], [Extent1].[Ubicacion] AS [Ubicacion], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[Mesa] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 4
ORDER BY [Extent1].[Id] ASC
The behavior is actually consistent with how we treat constants in queries (i.e. we always translate them into constants in the target query string rather than trying to parameterize them). Although the values are passed in variables, they are not captured in the closure because Skip and Take receive regular int arguments and not in lambda expressions.
The consequence in this case is that specific values passed to Skip and Take will produce different CQTs, which should prevent queries for different pages to hit the same entry in the query cache with the auto-compiled queries feature
The different CQTs will also produce different target query strings with the literal values embedded, which should then prevent the database server from reusing the same query plan for different pages (SQL Server has the ability to auto-parameterize some queries, but in general it seems that the mechanism doesn’t kick in of TOP or subqueries are present).
A minor issue probably and expected if you know enough about the language: since the variables are not captured in the closure the values get fixed in query construction and even if you change the value later you will always get the same page
An interesting observation is that the issue only affects in-line LINQ queries. When the arguments for Skip and Take are parameterized in a compiled query, we produce queries with parameters. In fact, using CompiledQuery or creating the query in a lambda (which requires syntax almost as complex as CompiledQuery) seems to be the only way to get reusable and cacheable queries for paging.
Given that CompiledQuery cannot be used with DbContext, there is no simple workaround with the new API. Long term I think there are several things that could help with this:
1. We could parameterize constants more aggressively, like LINQ to SQL does (such design seems to make more sense to me at the moment anyway)
2. The DbContext.Query method that we have been talking about introducing will allow capturing entire queries as lambda expressions without much syntax overhead, which should turn variables into parameters automatically, similar to parameters in CompiledQuery
3. We could also somehow provide access to the ROW_NUMBER() functionality (e.g. think about adding an IndexOf extension method on IEnumerable<T>) so that people could express paging as simple predicates
4. We can introduce new overloads of Skip and Take that accept Expression<Func<int>>.
For the short term I think we could add a simple workaround in the expression translator for DbQuery that artificially introduces a closure variable in the expression tree to force parameterization of the query. I tried a similar approach just adding extension methods and it seems to work well:
public static class PagingExtensions
{
private static readonly MethodInfo skipMethodInfo = typeof(Queryable).GetMethod(""Skip"");
private static readonly MethodInfo takeMethodInfo = typeof(Queryable).GetMethod(""Take"");
public static IQueryable<TSource> ParameterizedSkip<TSource>(this IQueryable<TSource> source, int count)
{
if (source == null) throw new ArgumentNullException(""source"");
return Parameterize(skipMethodInfo, source, count);
}
public static IQueryable<TSource> ParameterizedTake<TSource>(this IQueryable<TSource> source, int count)
{
if (source == null) throw new ArgumentNullException(""source"");
return Parameterize(takeMethodInfo, source, count);
}
private static IQueryable<TSource> Parameterize<TSource>(MethodInfo methodInfo, IQueryable<TSource> source, int count)
{
Expression<Func<int>> countAccessor = () => count;
Expression newQueryExpression = Expression.Call(null, methodInfo.MakeGenericMethod(new[] { typeof(TSource) }), new[] { source.Expression, countAccessor.Body });
return source.Provider.CreateQuery<TSource>(newQueryExpression);
}
}
I did simple perf testing to see the impact and found ~11% improvement with a simple paged query.
Notice that this doesn’t address the expectation that the same query instance can be reused by changing the count parameters. For that we could add support the IndexOf extension method or create the new overloads of Skip and Take that take expressions.
Kati pointed out that we also need to verify whether there is no perf regression when executing the query. With hardcoded values SQL Server will choose one execution plan. With parameters the plan most likely will be totaly different and may be not as efficient. Kati pointed out that even Sql Server does not parametrize SKIP/LIMIT queries. When fixing this we need to make sure that there are no perf regressions at execution time related to slower query execution on the Sql Server side:
"I seem to be missing context, would this mean auto-parameterizing all the time for skip and take? If so my concern is that it may lead for worse perf in some cases. The query plan that SQL Server produces may differ for queries with constants vs. params, thus they have the “OPTIMIZE FOR” query hint.
I’m not concerned about Sql8. We no longer need to support it."
This item was migrated from the DevDiv work item tracking system [ID=340632].
Comments: verified, filed issue 1446 to complete other operators that were missed
I got a question on this from Unai Zorrilla (a friend of mine who has written a couple of books on EF in Spanish), and when I tried what he explained to me I was very surprised because I had never heard of the issue before:
For a query like this:
var toSkip = 4;
var toTake = 5;
var query = context.Mesas.OrderBy(m => m.Id).Skip(toSkip).Take(toTake);
We produce a translation like this:
SELECT TOP (5)
[Extent1].[Id] AS [Id],
[Extent1].[Capacidad] AS [Capacidad],
[Extent1].[Ubicacion] AS [Ubicacion]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[Capacidad] AS [Capacidad], [Extent1].[Ubicacion] AS [Ubicacion], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[Mesa] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 4
ORDER BY [Extent1].[Id] ASC
The behavior is actually consistent with how we treat constants in queries (i.e. we always translate them into constants in the target query string rather than trying to parameterize them). Although the values are passed in variables, they are not captured in the closure because Skip and Take receive regular int arguments and not in lambda expressions.
The consequence in this case is that specific values passed to Skip and Take will produce different CQTs, which should prevent queries for different pages to hit the same entry in the query cache with the auto-compiled queries feature
The different CQTs will also produce different target query strings with the literal values embedded, which should then prevent the database server from reusing the same query plan for different pages (SQL Server has the ability to auto-parameterize some queries, but in general it seems that the mechanism doesn’t kick in of TOP or subqueries are present).
A minor issue probably and expected if you know enough about the language: since the variables are not captured in the closure the values get fixed in query construction and even if you change the value later you will always get the same page
An interesting observation is that the issue only affects in-line LINQ queries. When the arguments for Skip and Take are parameterized in a compiled query, we produce queries with parameters. In fact, using CompiledQuery or creating the query in a lambda (which requires syntax almost as complex as CompiledQuery) seems to be the only way to get reusable and cacheable queries for paging.
Given that CompiledQuery cannot be used with DbContext, there is no simple workaround with the new API. Long term I think there are several things that could help with this:
1. We could parameterize constants more aggressively, like LINQ to SQL does (such design seems to make more sense to me at the moment anyway)
2. The DbContext.Query method that we have been talking about introducing will allow capturing entire queries as lambda expressions without much syntax overhead, which should turn variables into parameters automatically, similar to parameters in CompiledQuery
3. We could also somehow provide access to the ROW_NUMBER() functionality (e.g. think about adding an IndexOf extension method on IEnumerable<T>) so that people could express paging as simple predicates
4. We can introduce new overloads of Skip and Take that accept Expression<Func<int>>.
For the short term I think we could add a simple workaround in the expression translator for DbQuery that artificially introduces a closure variable in the expression tree to force parameterization of the query. I tried a similar approach just adding extension methods and it seems to work well:
public static class PagingExtensions
{
private static readonly MethodInfo skipMethodInfo = typeof(Queryable).GetMethod(""Skip"");
private static readonly MethodInfo takeMethodInfo = typeof(Queryable).GetMethod(""Take"");
public static IQueryable<TSource> ParameterizedSkip<TSource>(this IQueryable<TSource> source, int count)
{
if (source == null) throw new ArgumentNullException(""source"");
return Parameterize(skipMethodInfo, source, count);
}
public static IQueryable<TSource> ParameterizedTake<TSource>(this IQueryable<TSource> source, int count)
{
if (source == null) throw new ArgumentNullException(""source"");
return Parameterize(takeMethodInfo, source, count);
}
private static IQueryable<TSource> Parameterize<TSource>(MethodInfo methodInfo, IQueryable<TSource> source, int count)
{
Expression<Func<int>> countAccessor = () => count;
Expression newQueryExpression = Expression.Call(null, methodInfo.MakeGenericMethod(new[] { typeof(TSource) }), new[] { source.Expression, countAccessor.Body });
return source.Provider.CreateQuery<TSource>(newQueryExpression);
}
}
I did simple perf testing to see the impact and found ~11% improvement with a simple paged query.
Notice that this doesn’t address the expectation that the same query instance can be reused by changing the count parameters. For that we could add support the IndexOf extension method or create the new overloads of Skip and Take that take expressions.
Kati pointed out that we also need to verify whether there is no perf regression when executing the query. With hardcoded values SQL Server will choose one execution plan. With parameters the plan most likely will be totaly different and may be not as efficient. Kati pointed out that even Sql Server does not parametrize SKIP/LIMIT queries. When fixing this we need to make sure that there are no perf regressions at execution time related to slower query execution on the Sql Server side:
"I seem to be missing context, would this mean auto-parameterizing all the time for skip and take? If so my concern is that it may lead for worse perf in some cases. The query plan that SQL Server produces may differ for queries with constants vs. params, thus they have the “OPTIMIZE FOR” query hint.
I’m not concerned about Sql8. We no longer need to support it."
This item was migrated from the DevDiv work item tracking system [ID=340632].
Comments: verified, filed issue 1446 to complete other operators that were missed