The current implementation of auto-compiled queries computes the cache key based on the DbExpression after partial evaluation and LINQ translation. The good thing about this approach is that it can be made to work with most queries. The negative side of this implementation is that sometimes a significant amount of processing on the query has to happen before we can find a cache hit, i.e. we are not getting as much performance gains as we could.
A possible improvement to the current implementation is to add the ability to cache one layer above, i.e. based on the LINQ expression.
Caching based on the LINQ expressions will have limitations compared to the current approach: a LINQ query can present a constant expressions that accesses a non-scalar on the closure which would be opaque at this level but could completely change the semantics of the query. E.g. in the following example secondQuery will contain a reference to a closure variable that is firstQuery, and it is not possible to know the full semantics of secondQuery without expanding firstQuery, something that the current approach takes care of:
int[] ids = new int[10000];
...
using (var context = new MyContext())
{
var firstQuery = from entity in context.MyEntities
where ids.Contains(entity.Id)
select entity;
var secondQuery = from entity in context.MyEntities
where firstQuery.Any(otherEntity => otherEntity.Id == entity.Id)
select entity;
string results = secondQuery.ToList();
...
}
For this reason in such improvement it would be essential to preserve the current caching mechanism and use it as a fall back.
A possible improvement to the current implementation is to add the ability to cache one layer above, i.e. based on the LINQ expression.
Caching based on the LINQ expressions will have limitations compared to the current approach: a LINQ query can present a constant expressions that accesses a non-scalar on the closure which would be opaque at this level but could completely change the semantics of the query. E.g. in the following example secondQuery will contain a reference to a closure variable that is firstQuery, and it is not possible to know the full semantics of secondQuery without expanding firstQuery, something that the current approach takes care of:
int[] ids = new int[10000];
...
using (var context = new MyContext())
{
var firstQuery = from entity in context.MyEntities
where ids.Contains(entity.Id)
select entity;
var secondQuery = from entity in context.MyEntities
where firstQuery.Any(otherEntity => otherEntity.Id == entity.Id)
select entity;
string results = secondQuery.ToList();
...
}
For this reason in such improvement it would be essential to preserve the current caching mechanism and use it as a fall back.