Saturday, November 23, 2024
HomeProductsADO.NET Data ProvidersUsing PostgreSQL Full-Text Search in Entity Framework

Using PostgreSQL Full-Text Search in Entity Framework

Introduction

PostgreSQL database supports an advanced functionality of full-text search (FTS) and full-text indexing described comprehensively in the documentation:

Full-Text Search

We decided to meet the needs of our users willing to take advantage of the full-text search in Entity Framework and implemented the full-text search functionality in our Devart dotConnect for PostgreSQL ADO.NET Entity Framework provider.
This article deals with the following:


Full-Text Search in LINQ to Entities

The PgSqlTextFunctions class located in the Devart.Data.PostgreSql.Entity.dll assembly is used to form full-text search specific queries in LINQ to Entities. It has a number of methods for forming queries.

Method Description Generated SQL
Match Performs full-text search textOrVector @@ textOrQuery
AsTsQuery Converts text to tsquery. Does not normalize words. ‘text’::tsquery
AsTsVector Converts text to tsvector. Does not normalize words. ‘text’::tsvector
PlainToTsQuery Converts text to tsquery. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. plainto_tsquery(config, text)
ToTsQuery Converts text to tsquery. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. Allows using boolean operators and (&), or (|), not (!) in the query text and allows attaching weight to lexemes. to_tsquery(config, text)
ToTsVector Converts text to tsvector. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. to_tsvector(config, text)
ToTsVector Converts text to tsvector. Can be used for table columns. Has an overload that allows you to specify the text search configuration to use. to_tsvector(config, text)
Length Returns the number of lexemes in the given tsvector. length(vector)
SetWeight Labels the entries of a tsvector with the specified weight. Does nothing for stripped tsvectors. setweight(vector, weight)
TsRank Returns rank of the full-text search result. Has several overloads, allowing you to specify optional weight values to assign to word instances, depending on their weight labels, and to specify the optional normalization kind. ts_rank(weightD, weightC, weightB, weightA, vector, query, normalization)
TsRankCd Returns rank of the full-text search result. Has several overloads, allowing you to specify optional weight values to assign to word instances, depending on their weight labels, and to specify the optional normalization kind. Unlike TsRank, computes the cover density ranking for the given document vector and query. ts_rank_cd(weightD, weightC, weightB, weightA, vector, query, normalization)
QueryAnd Unites two full-text search queries with an AND operator firstQuery && secondQuery
QueryOr Unites two full-text search queries with an OR operator firstQuery || secondQuery
QueryNot Returns the negation of the query !! query
NumNode Returns the number of nodes (lexemes plus operators) in a tsquery. Returns 0 if the text, converted to the tsquery contained only stop words. numnode(query)
QueryTree Returns the portion of a tsquery that can be used for searching an index. querytree(query)
TsHeadline Returns the fragments of the searched text that contain the search terms. The search terms are highlighted. Contains several overloads, allowing you to specify optional text search configuration and options, such as fragment word number and highlighting. ts_headline(config, document, query, options)
TsRewrite Searches for the given tsquery in the target tsquery and replaces its occurences with the substitute tsquery. ts_rewrite(query, target, substitute)

So, when you need to write an analogue of the following query:

SELECT *
  FROM MyTable
 WHERE to_tsvector(Column1) @@ plainto_tsquery('english','MyExpression')

you can use:

var query = ctx.MyTables
  .Where(t => PgSqlTextFunctions.Match(
    PgSqlTextFunctions.ToTsVector(t.Column1),
    PgSqlTextFunctions.PlainToTsQuery("english","MyExpression")
  );

Full-Text Indexes in Code-First Migrations

To customize Code-First Migrations for the purpose of creating full-text indexes, the PgSqlCreateIndexConfiguration class is used, which is located in the Devart.Data.PostgreSql.Entity.Migrations.dll assembly.
PgSqlCreateIndexConfiguration is designed to customize the CreateIndex operation. This class allows specifying the following parameters:

  • The kind of index to create (the IndexType property) – use PgSqlIndexType.Gin or PgSqlIndexType.Gist values for full-text search indexes.
  • Whether to apply the to_tsvector function to the indexed column (this can be not necessary if the column already has the tsvector type) – the UseToTsVectorFunction property.
  • The name of the text search configuration to use for to_tsvector function – the ToTsVectorConfigName property.
  • You can create not only column-based full-text search indexes, but also expression-based indexes. The expression is specified as the Expression property of the PgSqlCreateIndexConfiguration class.

Examples of customizations are given in the tutorial below.

Tutorial Overview

In this tutorial you will learn how to create an application which will use the following technologies:

Follow the link at the end of this article to download the complete application sample.
In spite of the fact that in this sample we show work with Code-First, full-text search queries can be similarly written for reqular XML mapping using EDMX/EDML models developed within Database-First or Model-First approaches. So we will also consider the Database-First approach in this tutorial.

Prerequisites

Presetting Database and Model under the Database-First Approach

1. Create a console C# project in Visual Studio 2010 SP1 and call it PostgreSqlFullTextSearch.
2. In the project, add references to the following assemblies:

  • Devart.Data;
  • Devart.Data.PostgreSql;
  • Devart.Data.PostgreSql.Entity;
  • Devart.Data.PostgreSql.Entity.Migrations.

3. Create an empty Entity Framework model via Entity Developer. In the Solution Explorer window, call the context menu of the project, and then select Add -> New Item -> Devart Entity Model. Follow the wizard instructions and specify a valid connection string. The file of the model and the generated code will be added to the project.

Presetting Database and Model under the Code-First Approach

1. Create a console C# project in Visual Studio 2010 SP1 and call it PostgreSqlFullTextSearch.
2. In the project, add references to the following assemblies:

  • System.Data.Entity;
  • Devart.Data;
  • Devart.Data.PostgreSql;
  • Devart.Data.PostgreSql.Entity;
  • Devart.Data.PostgreSql.Entity.Migrations.

3. Add the latest public version of EF Code-First to the project: run Package Manager Console via the Visual Studio Tools menu -> Library Package Manager -> Package Manager Console, and in the displayed window run the following command:

Install-Package EntityFramework

As a result, the EntityFramework.dll assembly is added to the project, App.config is supplemented with records and the packages.config file is created.
4. Add a new MyContext.cs file to the project, place an empty model in it:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.Entity;
using Devart.Data.PostgreSql;

namespace PostgreSqlFullTextSearch {

  public class MyContext: DbContext {

    public MyContext() {
    }

    public MyContext(PgSqlConnection connection)
      : base(connection, false) {
    }

    static MyContext() {

    }

    protected override void OnModelCreating(DbModelBuilder modelBuilder) {

    }

  }
}

5. Run the following command in Package Manager Console:

Enable-Migrations

As a result, the Configuration class is added to the project.
6. Complement the existing Configuration class constructor

    public Configuration()
    {
      AutomaticMigrationsEnabled = false;
    }

by specifying the SQL generator:

    public Configuration()
    {
      AutomaticMigrationsEnabled = false;

      SetSqlGenerator(Devart.Data.PostgreSql.Entity.Migrations.PgSqlConnectionInfo.InvariantName,
              new Devart.Data.PostgreSql.Entity.Migrations.PgSqlEntityMigrationSqlGenerator());
    }

7. Setting the connection string for our class of the MyContext model will be performed in the config file of the application. Add the connection string to the App.config file:

<connectionstrings>
  <add name="MyContext" connectionString="User Id=my_user;Password=user_password;Host=my_server;Port=5432;Database=my_database;" providerName="Devart.Data.PostgreSql" />
</connectionstrings>

Replace the connection string parameter values with the correct values for your environment.

Using Full-Text Index on Single Column

Presetting Database and Model under the Database-First Approach

1. Create the Books table and a full-text index on the Text column:

CREATE TABLE Books ( 
  Id serial NOT NULL,
  Author varchar(200) NULL,
  Name varchar(300) NULL,
  Content text NULL,
  PRIMARY KEY (Id)
)
CREATE INDEX IX_Books_Content ON Books USING GIN(to_tsvector('english', Content))

2. Use the Update From Database Wizard to add the Books table to the EF model.

Presetting Database and Model under the Code-First Approach

1. Add the Book.cs file to the project:

using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Text;

namespace PostgreSQLFullTextSearch {

  public class Book {

    //[Key]
    public int Id { get; set; }

    [MaxLength(200)]
    public string Author { get; set; }

    [MaxLength(300)]
    public string Name { get; set; }

    public string Content { get; set; }

  }
}

Here we don’t specify any attributes for the ID property, as it is a Int32 property with the name ‘Id’, it will be automatically mapped as a server-generated auto-incremental primary key.
For the Author property we set the VARCHAR(200) data type.
For the Name property we set the VARCHAR(300) data type.
For the Content property we don’t set any length restrictions, so this string property will be mapped to LONGTEXT.
2. Add the following line to the MyContext class:

public DbSet<Book> Books { get; set; }

3. To generate migration, run the following command in Package Manager Console:

Add-Migration AddBook

As a result, a migration class with the following content is added to the project:

    public partial class AddBook : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "Books",
                c => new
                    {
                        Id = c.Int(nullable: false, identity: true),
                        Author = c.String(maxLength: 200),
                        Name = c.String(maxLength: 300),
                        Content = c.String(unicode: false),
                    })
                .PrimaryKey(t => t.Id);
        }

        public override void Down()
        {
            DropTable("Books");
        }
    }

4. Complement the migration by specifying the storage engine type for the table, and creating a full-text index on the Content column, without specifying explicitly the name of the generated index:

    using Devart.Data.PostgreSql.Entity;
    using Devart.Data.PostgreSql.Entity.Migrations;

    public partial class AddBook : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "Books",
                c => new
                    {
                        Id = c.Int(nullable: false, identity: true),
                        Author = c.String(maxLength: 200),
                        Name = c.String(maxLength: 300),
                        Content = c.String(unicode: false),
                    },
                .Index(c => c.Content,
                anonymousArguments: new PgSqlCreateIndexConfiguration()
                {
                    IndexType = PgSqlIndexType.Gin,
                    UseToTsVectorFunction = true,
                    ToTsVectorConfigName = "english"
                })
               .PrimaryKey(t => t.Id);
        }

        public override void Down()
        {
            DropTable("Books");
        }
    }

5. Update the database using the migration, by running the following command in Package Manager Console:

Update-Database -Verbose

As a result, commands for model objects creation will be executed

CREATE TABLE Books ( 
  Id serial NOT NULL,
  Author varchar(200) NULL,
  Name varchar(300) NULL,
  Content text NULL,
  PRIMARY KEY (Id)
)
CREATE INDEX IX_Books_Content ON Books USING GIN(to_tsvector('english', Content))

Also, commands for creating the system MigrationHistory table and filling it with migration data will be executed.

Full-Text Search Sample Using Natural Language and Boolean Mode Queries

In this tutorial, we have turned on PgSqlMonitor to view executed DDL and DML statements in the Devart dbMonitor application. However, keep in mind that the use of PgSqlMonitor is feasible for the purpose of testing and debugging but should be limited in production environments, since monitoring can decrease the performance of your application.
To use monitoring, add this code to your application

var monitor = new PgSqlMonitor() { IsActive = true };

and run dbMonitor.
Let’s write the code filling the table with data and forming full-text search queries.

            using (var ctx = new PostgreSQLFullTextSearch.MyContext())
            {

                foreach (Book b in ctx.Books)
                    ctx.Books.Remove(b);
                ctx.SaveChanges();

                // Load information into the Books table
                ctx.Books.Add(new PostgreSQLFullTextSearch.Book()
                {
                    Author = "William Shakespeare",
                    Name = "The Tragedy of King Richard the Second",
                    Content = @"ACT 1
SCENE I. London. A Room in the palace.
[Enter KING RICHARD, attended; JOHN OF GAUNT, with other NOBLES.]
KING RICHARD.
  Old John of Gaunt, time-honoured Lancaster,
  Hast thou, according to thy oath and band,
  Brought hither Henry Hereford thy bold son,
  Here to make good the boisterous late appeal,
  Which then our leisure would not let us hear,
  Against the Duke of Norfolk, Thomas Mowbray?
..."
                });
                ctx.Books.Add(new PostgreSQLFullTextSearch.Book()
                {
                    Author = "William Shakespeare",
                    Name = "The Life and Death of King John",
                    Content = @"ACT 1.
SCENE I. Northampton. A Room of State in the Palace.
[Enter KING JOHN, QUEEN ELINOR, PEMBROKE, ESSEX, SALISBURY, and others, with CHATILLON.]
KING JOHN.
  Now, say, Chatillon, what would France with us?
CHATILLON.
  Thus, after greeting, speaks the King of France,
  In my behaviour, to the majesty,
  The borrow'd majesty of England here.
..."
                });
                ctx.Books.Add(new PostgreSQLFullTextSearch.Book()
                {
                    Author = "William Shakespeare",
                    Name = "The Life of Henry the Fifth",
                    Content = @"ACT 1.
SCENE I. London. An ante-chamber in the King's palace.
[Enter the Archbishop of Canterbury and the Bishop of Ely.]
CANTERBURY.
  My lord, I'll tell you: that self bill is urg'd,
  Which in the eleventh year of the last king's reign
  Was like, and had indeed against us pass'd,
  But that the scambling and unquiet time
  Did push it out of farther question.
..."
                });
                ctx.SaveChanges();

                // Write an analogue of the SQL query:
                //  SELECT author, name FROM books
                //  WHERE to_tsvector(content) @@ plainto_tsquery('king palace Norfolk')
                var firstQuery = ctx.Books
                    .Where(b => PgSqlTextFunctions.Match(
                        PgSqlTextFunctions.ToTsVector(b.Content), 
                        PgSqlTextFunctions.PlainToTsQuery("king palace Norfolk"))
                    )
                    .Select(b => new { b.Author, b.Name });
                var firstQueryResults = firstQuery.ToList();

                // Output:
                Console.WriteLine();
                Console.WriteLine("== Querying your single column full-text index ==");
                Console.WriteLine();
                Console.WriteLine("1) Natural language");
                Console.WriteLine();
                Console.WriteLine("SQL:");
                Console.WriteLine(firstQuery.ToString());
                Console.WriteLine();
                Console.WriteLine("Data:");
                Console.WriteLine("{0,-5} | {1}", "Author", "Name");
                Console.WriteLine(new string('-', 50));
                foreach (var item in firstQueryResults)
                    Console.WriteLine("{0,-5} | {1} ", item.Author, item.Name);

                // Write an analogue of the SQL query:
                //   SELECT author, name FROM books
                //   WHERE to_tsvector(content) @@ to_tsquery('king & palace & !Norfolk')
                var secondQuery = ctx.Books
                    .Where(b => PgSqlTextFunctions.Match(
                        PgSqlTextFunctions.ToTsVector(b.Content), 
                        PgSqlTextFunctions.ToTsQuery("king & palace & !Norfolk"))
                    )
                    .Select(b => new { b.Author, b.Name });
                var secondQueryResults = secondQuery.ToList();

                // Output:
                Console.WriteLine();
                Console.WriteLine("2) Boolean mode");
                Console.WriteLine();
                Console.WriteLine("SQL:");
                Console.WriteLine(secondQuery.ToString());
                Console.WriteLine();
                Console.WriteLine("Data:");
                Console.WriteLine("{0,-5} | {1}", "Author", "Name");
                Console.WriteLine(new string('-', 50));
                foreach (var item in secondQueryResults)
                    Console.WriteLine("{0,-5} | {1} ", item.Author, item.Name);

                Console.ReadLine();

            }

Conclusion

Thus, we have shown how to use PostgreSQL full-text search functionality in Entity Framework applications, including the development process, based on Code-First Migrations. We are looking forward to receiving feedback from you via comments, the Entity Framework forum and our contact form, if it’s necessary to rework or improve support of full-text indexing and searching in our provider.
You can download the archive with the complete version of the sample from this tutorial.

RELATED ARTICLES

Whitepaper

Social

Topics

Products