Devart Blog

Entity Framework: SQL Generation Enhancements for IN Clause

Posted by on December 9th, 2010

Introduction

Devart was the first company who shipped Entity Framework providers for Oracle, MySQL, PostgreSQL, SQLite. And now we remain the leader in supporting new versions and features of Entity Framework.

Continuing to improve support of Entity Framework v4 features, as it was described here, we have optimized generated SQL for our ADO.NET data providers – dotConnect for Oracle, dotConnect for MySQL, dotConnect for PostgreSQL, and dotConnect for SQLite.

SQL generation was significantly improved and simplified in Entity Framework 4. Most of these improvements can be used in third-party providers, but one aspect has been improved only for SqlClient. It is a conversion of IN expression in Entity SQL and Contains method for collections in LINQ to Entities.

New versions of our dotConnect implement this enhancement for both Entity Framework v4 and Entity Framework v1. Our EF v1 users can get optimized SQL for IN Entity SQL expressions, but LINQ to Entities doesn’t support Contains method for collections in EF v1.

Let’s consider examples of improved SQL generation by Devart dotConnect for Oracle and compare them with the code generated by SqlClient. EF-model was created for the “Categories” table:

Oracle DDL script:

CREATE TABLE NORTHWINDEF."Categories" (
  "CategoryID" NUMBER(9),
  "CategoryName" VARCHAR2(15) NOT NULL,
  "Description" VARCHAR2(4000),
  "Picture" BLOB,
  CONSTRAINT "PK_Categories" PRIMARY KEY ("CategoryID")
)

MS SQL Server DDL script:

CREATE TABLE [Categories] (
  [CategoryID] int IDENTITY (1, 1) NOT NULL,
  [CategoryName] nvarchar (15) NOT NULL,
  [Description] nvarchar(4000),
  [Picture] varbinary(8000),
  CONSTRAINT "PK_Categories" PRIMARY KEY ([CategoryID])
)

All examples were made in C# for EF v4 (.NET 4.0) using LINQ to Entities. All collections in all examples are arrays with predefined data, but principles of work are the same for any collection that implements IEnumerable.

Collection with unique elements

Let’s look at the Contains method for a single collection that consists of unique elements:

Entities ctx = new Entities();
int[] collection1 = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };

var query = from c in ctx.Categories
            where collection1.Contains(c.CategoryID)
            select c;

The equivalent Entity SQL statement is:

SELECT VALUE Categories 
FROM Entities.Categories
WHERE Categories.CategoryID IN { 1, 2, 3, 4, 5, 6, 7, 8 }

SQL generated by dotConnect for Oracle 5.XX:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName", 
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   (((1 = "Extent1"."CategoryID") OR (2 = "Extent1"."CategoryID")) 
   OR ((3 = "Extent1"."CategoryID") OR (4 = "Extent1"."CategoryID"))) 
   OR (((5 = "Extent1"."CategoryID") OR (6 = "Extent1"."CategoryID")) 
   OR ((7 = "Extent1"."CategoryID") OR (8 = "Extent1"."CategoryID")))

SQL generated by dotConnect for Oracle 6.00:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName", 
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   "Extent1"."CategoryID" IN (1, 2, 3, 4, 5, 6, 7, 8 )

SQL generated by SqlClient in EF v4:

SELECT
   [Extent1].[CategoryID] AS [CategoryID], [Extent1].[CategoryName] AS [CategoryName], 
   [Extent1].[Description] AS [Description], [Extent1].[Picture] AS [Picture]
FROM 
   [dbo].[Categories] AS [Extent1]
WHERE 
   [Extent1].[CategoryID] IN (1, 2, 3, 4, 5, 6, 7, 8 )

As we can see dotConnect for Oracle 6.00 generates a single IN clauses instead of multiple OR clauses. Microsoft Entity Framework provider for SQL Server provides a similar result.

Two and more collections with the repeated elements

Let’s consider two collections containing unique elements; join of these collections is a new collection with repeated elements.

LINQ to Entities:

Entities ctx = new Entities();
int[] collection1 = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };
int[] collection2 = new int[] { 1, 2, 7, 8, 9, 10 };

var query = from c in ctx.Categories
            where collection1.Contains(c.CategoryID) 
            || collection2.Contains(c.CategoryID)
            select c;

Entity SQL:

SELECT VALUE Categories 
FROM Entities.Categories
WHERE Categories.CategoryID IN{ 1, 2, 3, 4, 5, 6, 7, 8}
   OR Categories.CategoryID IN { 1, 2, 7, 8, 9, 10 }

SQL generated by dotConnect for Oracle 5.XX:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   ((((1 = "Extent1"."CategoryID") OR (2 = "Extent1"."CategoryID")) 
   OR ((3 = "Extent1"."CategoryID") OR (4 = "Extent1"."CategoryID"))) 
   OR (((5 = "Extent1"."CategoryID") OR (6 = "Extent1"."CategoryID")) 
   OR ((7 = "Extent1"."CategoryID") OR (8 = "Extent1"."CategoryID")))) 
   OR ((((1 = "Extent1"."CategoryID") OR (2 = "Extent1"."CategoryID")) 
   OR ((7 = "Extent1"."CategoryID") OR (8 = "Extent1"."CategoryID"))) 
   OR ((9 = "Extent1"."CategoryID") OR (10 = "Extent1"."CategoryID")))

SQL generated by dotConnect for Oracle 6.00:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   "Extent1"."CategoryID" IN ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

SQL generated by SqlClient in EF v4:

SELECT
   [Extent1].[CategoryID] AS [CategoryID], [Extent1].[CategoryName] AS [CategoryName],
   [Extent1].[Description] AS [Description], [Extent1].[Picture] AS [Picture]
FROM 
   [dbo].[Categories] AS [Extent1]
WHERE 
   [Extent1].[CategoryID] IN ( 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 7, 8, 9, 10)

Even though LINQ query contains two collections and two Contains clauses, provider will generate only one IN expression, containing united item collection. As we can see, the repeated elements are removed from the IN collection in dotConnect for Oracle.

We can get the similar result, if we perform Contains for one collection with the repeated elements:

LINQ to Entities:

int[] collection1 = new int[] { 1, 2, 3, 4, 5, 1, 1, 2, 3, 3, 4, 5, 5, 5, 5, 5, 5 };
var query = from c in ctx.Categories
            where collection1.Contains(c.CategoryID)
            select c;

The equivalent EntitySQL statement is:

SELECT VALUE Categories 
FROM Entities.Categories
WHERE Categories.CategoryID 
IN { 1, 2, 3, 4, 5, 1, 1, 2, 3, 3, 4, 5, 5, 5, 5, 5, 5 }

In this case EF will generate the following SQL statements.

SQL generated by dotConnect for Oracle 5.XX:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   ((((1 = "Extent1"."CategoryID") OR (2 = "Extent1"."CategoryID")) 
   OR ((3 = "Extent1"."CategoryID") OR (4 = "Extent1"."CategoryID"))) 
   OR (((5 = "Extent1"."CategoryID") OR (1 = "Extent1"."CategoryID")) 
   OR ((1 = "Extent1"."CategoryID") OR (2 = "Extent1"."CategoryID")))) 
   OR ((((3 = "Extent1"."CategoryID") OR (3 = "Extent1"."CategoryID")) 
   OR ((4 = "Extent1"."CategoryID") OR (5 = "Extent1"."CategoryID"))) 
   OR (((5 = "Extent1"."CategoryID") OR (5 = "Extent1"."CategoryID")) 
   OR (((5 = "Extent1"."CategoryID") OR (5 = "Extent1"."CategoryID")) 
   OR (5 = "Extent1"."CategoryID"))))

SQL generated by dotConnect for Oracle 6.00:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM NORTHWINDEF."Categories" "Extent1"
WHERE "Extent1"."CategoryID" IN ( 1, 2, 3, 4, 5)

SQL generated by SqlClient in EF v4:

SELECT
   [Extent1].[CategoryID] AS [CategoryID], [Extent1].[CategoryName] AS [CategoryName],
   [Extent1].[Description] AS [Description], [Extent1].[Picture] AS [Picture]
FROM 
   [dbo].[Categories] AS [Extent1]
WHERE 
   [Extent1].[CategoryID] IN ( 1, 2, 3, 4, 5, 1, 1 , 2, 3, 3, 4, 5, 5, 5, 5, 5, 5)

Optimization of the query without obvious collection

Let’s consider an example where collection isn’t declared explicitly.

LINQ to Entities:

var query = from c in ctx.Categories
            where c.CategoryID == 1 || c.CategoryID == 3 
             || c.CategoryID == 5 || c.CategoryID > 200
            select c;
SELECT VALUE Categories 
FROM Entities.Categories
WHERE Categories.CategoryID = 1 OR Categories.CategoryID == 3 
OR Categories.CategoryID == 5 OR Categories.CategoryID > 200

In this case EF will generate the following SQL.

SQL generated by dotConnect for Oracle 5.XX:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   (((1 = "Extent1"."CategoryID") OR (3 = "Extent1"."CategoryID")) 
   OR (5 = "Extent1"."CategoryID")) OR ("Extent1"."CategoryID" > 200)

SQL generated by dotConnect for Oracle 6.00:

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1"."Picture" AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   ("Extent1"."CategoryID" IN (1,3,5)) OR ("Extent1"."CategoryID" > 200)

SQL generated by SqlClient in EF v4:

SELECT
   [Extent1].[CategoryID] AS [CategoryID], [Extent1].[CategoryName] AS [CategoryName],
   [Extent1].[Description] AS [Description], [Extent1].[Picture] AS [Picture]
FROM 
   [dbo].[Categories] AS [Extent1]
WHERE 
   (1 = [Extent1].[CategoryID]) OR (3 = [Extent1].[CategoryID]) 
   OR (5 = [Extent1].[CategoryID]) OR ([Extent1].[CategoryID] > 200)

As we can see dotConnect for Oracle 6.00 is capable of transforming a set of OR clauses into single IN expression in some cases.

 

Oracle-specific SQL

There is a limit of 1000 elements for the IN expression in Oracle. If element count exceeds this limit, then SQL query execution fails with ORA-01975 exception. dotConnect for Oracle deals with this limitation by the correct SQL generation (several IN expressions connected with OR clause). So now you don’t need to split your collection into smaller ones, dotConnect for Oracle does it automatically.

SELECT
   "Extent1"."CategoryID" AS "CategoryID", "Extent1"."CategoryName" AS "CategoryName",
   "Extent1"."Description" AS "Description", "Extent1".Picture AS "Picture"
FROM 
   NORTHWINDEF."Categories" "Extent1"
WHERE 
   "Extent1"."CategoryID" IN ( 0, 1, 2, 3, 4, 5, 6, 7, 8, ... 994, 995, 996, 997, 998, 999) 
   OR "Extent1"."CategoryID" IN (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008)

Afterword

New versions of Devart ADO.NET data providers for Oracle, MySQL, PostgreSQL, and SQLite generate more compact and fast SQL queries. In future we will continue optimization of generated SQL for specific cases of Entity Framework usage considering the peculiarities of each DBMS.

Comments are closed.