Introduction
PostgreSQL database supports an advanced functionality of full-text search (FTS) and full-text indexing described comprehensively in the documentation:
We decided to meet the needs of our users willing to take advantage of the full-text search in Entity Framework and implemented the full-text search functionality in our Devart dotConnect for PostgreSQL ADO.NET Entity Framework provider.
This article deals with the following:
- Full-Text Search in LINQ to Entities
- Full-Text Indexes in Code-First Migrations
- Tutorial Overview
- Prerequisites
- Using Full-Text Index on Single Column
- Using Full-Text Index on Multiple Columns
- Conclusion
Full-Text Search in LINQ to Entities
The PgSqlTextFunctions class located in the Devart.Data.PostgreSql.Entity.dll assembly is used to form full-text search specific queries in LINQ to Entities. It has a number of methods for forming queries.
Method | Description | Generated SQL |
---|---|---|
Match | Performs full-text search | textOrVector @@ textOrQuery |
AsTsQuery | Converts text to tsquery. Does not normalize words. | ‘text’::tsquery |
AsTsVector | Converts text to tsvector. Does not normalize words. | ‘text’::tsvector |
PlainToTsQuery | Converts text to tsquery. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. | plainto_tsquery(config, text) |
ToTsQuery | Converts text to tsquery. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. Allows using boolean operators and (&), or (|), not (!) in the query text and allows attaching weight to lexemes. | to_tsquery(config, text) |
ToTsVector | Converts text to tsvector. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. | to_tsvector(config, text) |
ToTsVector | Converts text to tsvector. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. | to_tsvector(config, text) |
Length | Returns the number of lexemes in the given tsvector. | length(vector) |
SetWeight | Labels the entries of a tsvector with the specified weight. Does nothing for stripped tsvectors. | setweight(vector, weight) |
TsRank | Returns rank of the full-text search result. Has several overloads, allowing you to specify optional weight values to assign to word instances, depending on their weight labels, and to specify the optional normalization kind. | ts_rank(weightD, weightC, weightB, weightA, vector, query, normalization) |
TsRankCd | Returns rank of the full-text search result. Has several overloads, allowing you to specify optional weight values to assign to word instances, depending on their weight labels, and to specify the optional normalization kind. Unlike TsRank, computes the cover density ranking for the given document vector and query. | ts_rank_cd(weightD, weightC, weightB, weightA, vector, query, normalization) |
QueryAnd | Unites two full-text search queries with an AND operator | firstQuery && secondQuery |
QueryOr | Unites two full-text search queries with an OR operator | firstQuery || secondQuery |
QueryNot | Returns the negation of the query | !! query |
NumNode | Returns the number of nodes (lexemes plus operators) in a tsquery. Returns 0 if the text, converted to the tsquery contained only stop words. | numnode(query) |
QueryTree | Returns the portion of a tsquery that can be used for searching an index. | querytree(query) |
TsHeadline | Returns the fragments of the searched text that contain the search terms. The search terms are highlighted. Contains several overloads, allowing you to specify optional text search configuration and options, such as fragment word number and highlighting. | ts_headline(config, document, query, options) |
TsRewrite | Searches for the given tsquery in the target tsquery and replaces its occurences with the substitute tsquery. | ts_rewrite(query, target, substitute) |
So, when you need to write an analogue of the following query:
SELECT * FROM MyTable WHERE to_tsvector(Column1) @@ plainto_tsquery('english','MyExpression')
you can use:
var query = ctx.MyTables .Where(t => PgSqlTextFunctions.Match( PgSqlTextFunctions.ToTsVector(t.Column1), PgSqlTextFunctions.PlainToTsQuery("english","MyExpression") );
Full-Text Indexes in Code-First Migrations
To customize Code-First Migrations for the purpose of creating full-text indexes, the PgSqlCreateIndexConfiguration class is used, which is located in the Devart.Data.PostgreSql.Entity.Migrations.dll assembly.
PgSqlCreateIndexConfiguration is designed to customize the CreateIndex operation. This class allows specifying the following parameters:
- The kind of index to create (the IndexType property) – use PgSqlIndexType.Gin or PgSqlIndexType.Gist values for full-text search indexes.
- Whether to apply the to_tsvector function to the indexed column (this can be not necessary if the column already has the tsvector type) – the UseToTsVectorFunction property.
- The name of the text search configuration to use for to_tsvector function – the ToTsVectorConfigName property.
- You can create not only column-based full-text search indexes, but also expression-based indexes. The expression is specified as the Expression property of the PgSqlCreateIndexConfiguration class.
Examples of customizations are given in the tutorial below.
Tutorial Overview
In this tutorial you will learn how to create an application which will use the following technologies:
- PostgreSQL Full-Text Indexing and Full Text Search;
- Entity Framework;
- Code-First using the Data Annotation mapping;
- Code-based Code-First Migrations;
- Query execution monitoring via Devart dbMonitor.
Follow the link at the end of this article to download the complete application sample.
In spite of the fact that in this sample we show work with Code-First, full-text search queries can be similarly written for reqular XML mapping using EDMX/EDML models developed within Database-First or Model-First approaches. So we will also consider the Database-First approach in this tutorial.
Prerequisites
Presetting Database and Model under the Database-First Approach
1. Create a console C# project in Visual Studio 2010 SP1 and call it PostgreSqlFullTextSearch.
2. In the project, add references to the following assemblies:
- Devart.Data;
- Devart.Data.PostgreSql;
- Devart.Data.PostgreSql.Entity;
- Devart.Data.PostgreSql.Entity.Migrations.
3. Create an empty Entity Framework model via Entity Developer. In the Solution Explorer window, call the context menu of the project, and then select Add -> New Item -> Devart Entity Model. Follow the wizard instructions and specify a valid connection string. The file of the model and the generated code will be added to the project.
Presetting Database and Model under the Code-First Approach
1. Create a console C# project in Visual Studio 2010 SP1 and call it PostgreSqlFullTextSearch.
2. In the project, add references to the following assemblies:
- System.Data.Entity;
- Devart.Data;
- Devart.Data.PostgreSql;
- Devart.Data.PostgreSql.Entity;
- Devart.Data.PostgreSql.Entity.Migrations.
3. Add the latest public version of EF Code-First to the project: run Package Manager Console via the Visual Studio Tools menu -> Library Package Manager -> Package Manager Console, and in the displayed window run the following command:
Install-Package EntityFramework
As a result, the EntityFramework.dll assembly is added to the project, App.config is supplemented with records and the packages.config file is created.
4. Add a new MyContext.cs file to the project, place an empty model in it:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data.Entity; using Devart.Data.PostgreSql; namespace PostgreSqlFullTextSearch { public class MyContext: DbContext { public MyContext() { } public MyContext(PgSqlConnection connection) : base(connection, false) { } static MyContext() { } protected override void OnModelCreating(DbModelBuilder modelBuilder) { } } }
5. Run the following command in Package Manager Console:
Enable-Migrations
As a result, the Configuration class is added to the project.
6. Complement the existing Configuration class constructor
public Configuration() { AutomaticMigrationsEnabled = false; }
by specifying the SQL generator:
public Configuration() { AutomaticMigrationsEnabled = false; SetSqlGenerator(Devart.Data.PostgreSql.Entity.Migrations.PgSqlConnectionInfo.InvariantName, new Devart.Data.PostgreSql.Entity.Migrations.PgSqlEntityMigrationSqlGenerator()); }
7. Setting the connection string for our class of the MyContext model will be performed in the config file of the application. Add the connection string to the App.config file:
<connectionstrings> <add name="MyContext" connectionString="User Id=my_user;Password=user_password;Host=my_server;Port=5432;Database=my_database;" providerName="Devart.Data.PostgreSql" /> </connectionstrings>
Replace the connection string parameter values with the correct values for your environment.
Using Full-Text Index on Single Column
Presetting Database and Model under the Database-First Approach
1. Create the Books table and a full-text index on the Text column:
CREATE TABLE Books ( Id serial NOT NULL, Author varchar(200) NULL, Name varchar(300) NULL, Content text NULL, PRIMARY KEY (Id) ) CREATE INDEX IX_Books_Content ON Books USING GIN(to_tsvector('english', Content))
2. Use the Update From Database Wizard to add the Books table to the EF model.
Presetting Database and Model under the Code-First Approach
1. Add the Book.cs file to the project:
using System; using System.Collections.Generic; using System.ComponentModel.DataAnnotations; using System.Linq; using System.Text; namespace PostgreSQLFullTextSearch { public class Book { //[Key] public int Id { get; set; } [MaxLength(200)] public string Author { get; set; } [MaxLength(300)] public string Name { get; set; } public string Content { get; set; } } }
Here we don’t specify any attributes for the ID property, as it is a Int32 property with the name ‘Id’, it will be automatically mapped as a server-generated auto-incremental primary key.
For the Author property we set the VARCHAR(200) data type.
For the Name property we set the VARCHAR(300) data type.
For the Content property we don’t set any length restrictions, so this string property will be mapped to LONGTEXT.
2. Add the following line to the MyContext class:
public DbSet<Book> Books { get; set; }
3. To generate migration, run the following command in Package Manager Console:
Add-Migration AddBook
As a result, a migration class with the following content is added to the project:
public partial class AddBook : DbMigration { public override void Up() { CreateTable( "Books", c => new { Id = c.Int(nullable: false, identity: true), Author = c.String(maxLength: 200), Name = c.String(maxLength: 300), Content = c.String(unicode: false), }) .PrimaryKey(t => t.Id); } public override void Down() { DropTable("Books"); } }
4. Complement the migration by specifying the storage engine type for the table, and creating a full-text index on the Content column, without specifying explicitly the name of the generated index:
using Devart.Data.PostgreSql.Entity; using Devart.Data.PostgreSql.Entity.Migrations; public partial class AddBook : DbMigration { public override void Up() { CreateTable( "Books", c => new { Id = c.Int(nullable: false, identity: true), Author = c.String(maxLength: 200), Name = c.String(maxLength: 300), Content = c.String(unicode: false), }, .Index(c => c.Content, anonymousArguments: new PgSqlCreateIndexConfiguration() { IndexType = PgSqlIndexType.Gin, UseToTsVectorFunction = true, ToTsVectorConfigName = "english" }) .PrimaryKey(t => t.Id); } public override void Down() { DropTable("Books"); } }
5. Update the database using the migration, by running the following command in Package Manager Console:
Update-Database -Verbose
As a result, commands for model objects creation will be executed
CREATE TABLE Books ( Id serial NOT NULL, Author varchar(200) NULL, Name varchar(300) NULL, Content text NULL, PRIMARY KEY (Id) ) CREATE INDEX IX_Books_Content ON Books USING GIN(to_tsvector('english', Content))
Also, commands for creating the system MigrationHistory table and filling it with migration data will be executed.
Full-Text Search Sample Using Natural Language and Boolean Mode Queries
In this tutorial, we have turned on PgSqlMonitor to view executed DDL and DML statements in the Devart dbMonitor application. However, keep in mind that the use of PgSqlMonitor is feasible for the purpose of testing and debugging but should be limited in production environments, since monitoring can decrease the performance of your application.
To use monitoring, add this code to your application
var monitor = new PgSqlMonitor() { IsActive = true };
and run dbMonitor.
Let’s write the code filling the table with data and forming full-text search queries.
using (var ctx = new PostgreSQLFullTextSearch.MyContext()) { foreach (Book b in ctx.Books) ctx.Books.Remove(b); ctx.SaveChanges(); // Load information into the Books table ctx.Books.Add(new PostgreSQLFullTextSearch.Book() { Author = "William Shakespeare", Name = "The Tragedy of King Richard the Second", Content = @"ACT 1 SCENE I. London. A Room in the palace. [Enter KING RICHARD, attended; JOHN OF GAUNT, with other NOBLES.] KING RICHARD. Old John of Gaunt, time-honoured Lancaster, Hast thou, according to thy oath and band, Brought hither Henry Hereford thy bold son, Here to make good the boisterous late appeal, Which then our leisure would not let us hear, Against the Duke of Norfolk, Thomas Mowbray? ..." }); ctx.Books.Add(new PostgreSQLFullTextSearch.Book() { Author = "William Shakespeare", Name = "The Life and Death of King John", Content = @"ACT 1. SCENE I. Northampton. A Room of State in the Palace. [Enter KING JOHN, QUEEN ELINOR, PEMBROKE, ESSEX, SALISBURY, and others, with CHATILLON.] KING JOHN. Now, say, Chatillon, what would France with us? CHATILLON. Thus, after greeting, speaks the King of France, In my behaviour, to the majesty, The borrow'd majesty of England here. ..." }); ctx.Books.Add(new PostgreSQLFullTextSearch.Book() { Author = "William Shakespeare", Name = "The Life of Henry the Fifth", Content = @"ACT 1. SCENE I. London. An ante-chamber in the King's palace. [Enter the Archbishop of Canterbury and the Bishop of Ely.] CANTERBURY. My lord, I'll tell you: that self bill is urg'd, Which in the eleventh year of the last king's reign Was like, and had indeed against us pass'd, But that the scambling and unquiet time Did push it out of farther question. ..." }); ctx.SaveChanges(); // Write an analogue of the SQL query: // SELECT author, name FROM books // WHERE to_tsvector(content) @@ plainto_tsquery('king palace Norfolk') var firstQuery = ctx.Books .Where(b => PgSqlTextFunctions.Match( PgSqlTextFunctions.ToTsVector(b.Content), PgSqlTextFunctions.PlainToTsQuery("king palace Norfolk")) ) .Select(b => new { b.Author, b.Name }); var firstQueryResults = firstQuery.ToList(); // Output: Console.WriteLine(); Console.WriteLine("== Querying your single column full-text index =="); Console.WriteLine(); Console.WriteLine("1) Natural language"); Console.WriteLine(); Console.WriteLine("SQL:"); Console.WriteLine(firstQuery.ToString()); Console.WriteLine(); Console.WriteLine("Data:"); Console.WriteLine("{0,-5} | {1}", "Author", "Name"); Console.WriteLine(new string('-', 50)); foreach (var item in firstQueryResults) Console.WriteLine("{0,-5} | {1} ", item.Author, item.Name); // Write an analogue of the SQL query: // SELECT author, name FROM books // WHERE to_tsvector(content) @@ to_tsquery('king & palace & !Norfolk') var secondQuery = ctx.Books .Where(b => PgSqlTextFunctions.Match( PgSqlTextFunctions.ToTsVector(b.Content), PgSqlTextFunctions.ToTsQuery("king & palace & !Norfolk")) ) .Select(b => new { b.Author, b.Name }); var secondQueryResults = secondQuery.ToList(); // Output: Console.WriteLine(); Console.WriteLine("2) Boolean mode"); Console.WriteLine(); Console.WriteLine("SQL:"); Console.WriteLine(secondQuery.ToString()); Console.WriteLine(); Console.WriteLine("Data:"); Console.WriteLine("{0,-5} | {1}", "Author", "Name"); Console.WriteLine(new string('-', 50)); foreach (var item in secondQueryResults) Console.WriteLine("{0,-5} | {1} ", item.Author, item.Name); Console.ReadLine(); }
Conclusion
Thus, we have shown how to use PostgreSQL full-text search functionality in Entity Framework applications, including the development process, based on Code-First Migrations. We are looking forward to receiving feedback from you via comments, the Entity Framework forum and our contact form, if it’s necessary to rework or improve support of full-text indexing and searching in our provider.
You can download the archive with the complete version of the sample from this tutorial.