Semantic Search in Rails with pgvector: From Zero to Production

A client came to me last year with a support ticket queue problem. They had four years of resolved tickets in their Rails app — over 80,000 of them — and their support team spent twenty minutes per new ticket just searching for similar past cases. The search was keyword-based. A ticket about “app won’t open” returned zero results when similar past tickets said “application fails to launch.” Same problem, different words, useless search.

Semantic search solved it in an afternoon. Not because I’m clever, but because pgvector and OpenAI embeddings have gotten genuinely simple to integrate into a Rails stack. Here’s exactly what I built, and how you can do the same.

What Vector Embeddings Actually Are

Forget the math for now. An embedding is a list of numbers — a vector — that represents the meaning of a piece of text. Two sentences that mean the same thing will have vectors that are close together in space, even if they share no words. “App won’t open” and “application fails to launch” end up nearly identical vectors. “Database migration guide” ends up far away.

You generate these vectors by sending text to an embedding model (OpenAI’s text-embedding-3-small is fast and cheap). You store the vectors in Postgres using the pgvector extension. You query for similarity using a dot product or cosine distance. That’s it.

Setting Up pgvector

First, the extension. If you’re on managed Postgres (RDS, Supabase, Render), pgvector is likely already available. For a fresh install on Debian/Ubuntu:

sudo apt install postgresql-16-pgvector

Then enable it in a Rails migration:

class EnablePgvector < ActiveRecord::Migration[8.0]
  def up
    execute "CREATE EXTENSION IF NOT EXISTS vector"
  end

  def down
    execute "DROP EXTENSION IF EXISTS vector"
  end
end

Add the pgvector gem to your Gemfile:

# Gemfile
gem "pgvector"

Then add a vector column to whichever table you want to make searchable. For the support ticket example:

class AddEmbeddingToTickets < ActiveRecord::Migration[8.0]
  def change
    add_column :tickets, :embedding, :vector, limit: 1536
  end
end

The limit: 1536 matches the dimensionality of OpenAI’s text-embedding-3-small. If you use a different model, adjust accordingly (text-embedding-3-large is 3072).

Generating and Storing Embeddings

Wire up the model:

# app/models/ticket.rb
class Ticket < ApplicationRecord
  include Pgvector::ActiveRecord

  has_neighbors :embedding

  after_create_commit :generate_embedding, if: :embeddable?

  def embeddable?
    subject.present? && body.present?
  end

  private

  def generate_embedding
    GenerateEmbeddingJob.perform_later(self)
  end
end

The job that calls OpenAI:

# app/jobs/generate_embedding_job.rb
class GenerateEmbeddingJob < ApplicationJob
  queue_as :embeddings

  def perform(record)
    text = [record.subject, record.body].compact.join("\n\n")
    vector = EmbeddingService.generate(text)
    record.update_columns(embedding: vector)
  end
end

And the service wrapper around the OpenAI client:

# app/services/embedding_service.rb
class EmbeddingService
  MODEL = "text-embedding-3-small"

  def self.generate(text)
    response = client.embeddings(
      parameters: {
        model: MODEL,
        input: text.truncate(8000) # stay within token limits
      }
    )
    response.dig("data", 0, "embedding")
  end

  def self.client
    @client ||= OpenAI::Client.new(access_token: Rails.application.credentials.openai_api_key)
  end
end

update_columns bypasses callbacks intentionally — you don’t want the after_create_commit to re-trigger and cause an embedding loop.

Querying: Find Similar Records

With pgvector’s has_neighbors, you get a scope for free:

# Find the 10 tickets most similar to a given ticket
Ticket.nearest_neighbors(:embedding, ticket.embedding, distance: "cosine").limit(10)

For a search box where you have a raw query string, generate an embedding for the query first:

# app/services/ticket_search.rb
class TicketSearch
  def self.call(query, limit: 10)
    return Ticket.none if query.blank?

    query_vector = EmbeddingService.generate(query)
    Ticket
      .nearest_neighbors(:embedding, query_vector, distance: "cosine")
      .where.not(embedding: nil)
      .limit(limit)
  end
end

In the controller:

# app/controllers/tickets_controller.rb
def index
  @tickets = if params[:q].present?
    TicketSearch.call(params[:q])
  else
    Ticket.order(created_at: :desc).limit(50)
  end
end

That’s the core. Ninety lines of code and your search understands meaning rather than matching keywords.

Adding an HNSW Index for Production

An exact nearest-neighbor search scans every row in the table. Fine for 10,000 records; painful at 500,000. pgvector supports two approximate nearest-neighbor index types: IVFFlat and HNSW. HNSW is better for most use cases — faster queries at the cost of slightly more index build time.

class AddHnswIndexToTicketsEmbedding < ActiveRecord::Migration[8.0]
  def up
    execute <<~SQL
      CREATE INDEX tickets_embedding_hnsw_idx
      ON tickets
      USING hnsw (embedding vector_cosine_ops)
      WITH (m = 16, ef_construction = 64)
    SQL
  end

  def down
    remove_index :tickets, name: :tickets_embedding_hnsw_idx
  end
end

The parameters m and ef_construction control the quality/speed tradeoff. For most production workloads, m = 16 and ef_construction = 64 are fine starting points. Raise ef_construction if recall quality matters more than index build time.

Run VACUUM ANALYZE tickets after building the index to update query planner statistics.

Backfilling Existing Records

You probably have records that existed before you added the embedding column. Don’t try to backfill them all in a single migration — the API calls will time out and you’ll hold locks. Use a background job with in_batches:

# lib/tasks/embeddings.rake
namespace :embeddings do
  desc "Backfill embeddings for tickets missing them"
  task backfill: :environment do
    Ticket.where(embedding: nil).in_batches(of: 100) do |batch|
      batch.each do |ticket|
        GenerateEmbeddingJob.perform_later(ticket)
      end
      sleep 0.5 # respect OpenAI rate limits
    end
  end
end

Run this with bundle exec rails embeddings:backfill. For 80,000 tickets at text-embedding-3-small pricing, you’re looking at roughly $1.20 in API costs total. Cheap.

A Minimal RAG Pipeline

Once you have semantic search working, you’re halfway to RAG (Retrieval-Augmented Generation) — the pattern where you pull relevant context from your database before sending a question to the LLM. Here’s what it looks like added to the ticket system:

# app/services/support_answer.rb
class SupportAnswer
  SYSTEM_PROMPT = <<~PROMPT
    You are a support assistant. Use the provided past ticket resolutions to suggest 
    an answer. Be specific and practical. If the past tickets don't cover the question, 
    say so rather than guessing.
  PROMPT

  def self.call(question)
    similar = TicketSearch.call(question, limit: 5)
    context = similar.map { |t| "Q: #{t.subject}\nA: #{t.resolution}" }.join("\n\n---\n\n")

    client.chat(
      parameters: {
        model: "gpt-4o",
        messages: [
          { role: "system", content: SYSTEM_PROMPT },
          { role: "user", content: "Past similar tickets:\n\n#{context}\n\nNew question: #{question}" }
        ],
        temperature: 0.3
      }
    )
  end

  def self.client
    @client ||= OpenAI::Client.new(access_token: Rails.application.credentials.openai_api_key)
  end
end

temperature: 0.3 keeps the answers grounded. You’re not looking for creativity — you want the model to synthesize past resolutions, not invent new ones.

Production Gotchas

Nil embeddings will break your sort. pgvector returns nil for records with null vectors. Scope them out: .where.not(embedding: nil).

Cosine vs. L2 distance. Cosine distance is the right choice for text — it ignores vector magnitude and focuses on direction (meaning). L2 distance is more appropriate for images or numeric features. Stick with cosine for language models.

Keep the embedding model consistent. If you generate some embeddings with text-embedding-3-small and later switch to text-embedding-ada-002, comparisons between old and new vectors are meaningless. Pick a model and stick with it. If you do switch, full backfill required.

Async or bust. Never generate embeddings synchronously in a web request. The OpenAI API adds 100-400ms latency. Always use a background job with a dedicated queue. At high volume, use OpenAI’s batch embedding endpoint — it’s 50% cheaper and designed for bulk workloads.

Chunking for long documents. If you’re embedding documents longer than ~400 words, consider splitting them into overlapping chunks (e.g., 300-word chunks with 50-word overlap) and storing each chunk as a separate vector. Retrieve chunks, deduplicate by parent document, return documents. This is the standard chunking strategy for RAG.

What This Doesn’t Replace

Semantic search is not a replacement for full-text search — it’s a complement. Exact keyword matches, phrase searches, and faceted filtering still work better with pg_search or Postgres’s native tsvector. The right architecture is often a hybrid: run semantic search and keyword search in parallel, merge results with a scoring function, present the union to the user.

Eighteen years into Rails, I’m still surprised by how cleanly the ecosystem absorbs new ideas. pgvector slots into ActiveRecord like it was always meant to be there. The complexity is in the product thinking — what should you embed, how do you chunk it, what context does the LLM actually need — not in the Rails plumbing.

Frequently Asked Questions

Do I need to use OpenAI for embeddings?

No. Any model that produces fixed-size dense vectors works. Alternatives include Mistral embeddings, Cohere Embed, Google’s text-embedding-004, or a locally-hosted model via Ollama. The tradeoff is quality vs. cost vs. latency. OpenAI’s text-embedding-3-small is a sensible default.

How much does this cost at scale?

text-embedding-3-small is priced at $0.02 per million tokens. At an average of 200 tokens per ticket, 100,000 tickets cost about $0.40 to embed. Running the search (embedding the query) costs fractions of a cent per search. Cost is not the constraint — architecture is.

Can I use this without the `pgvector` gem?

Technically, yes — you can store vectors as arrays and write raw SQL for cosine distance. In practice, the pgvector gem gives you has_neighbors and type casting for free. Use it.

Is HNSW better than IVFFlat?

For most use cases, yes. HNSW has higher recall at equivalent query speed and doesn’t require you to pre-define the number of clusters (nlist in IVFFlat). IVFFlat is useful if you need extremely fast build times and can tolerate a recall tradeoff. If you don’t have a reason to choose IVFFlat, use HNSW.

What happens if the OpenAI API is down?

Records created while the API is unavailable will have nil embeddings. Your background job should use Solid Queue or Sidekiq retries with exponential backoff. When the API recovers, the jobs will complete. Your search will gracefully skip nil-embedding records in the meantime.

Need to add semantic search or a RAG pipeline to your Rails application? TTB Software has been building AI-powered features on Rails for years. We know where the edges are. Get in touch.

Semantic Search in Rails with pgvector: From Zero to Production

What Vector Embeddings Actually Are

Setting Up pgvector

Generating and Storing Embeddings

Querying: Find Similar Records

Adding an HNSW Index for Production

Backfilling Existing Records

A Minimal RAG Pipeline

Production Gotchas

What This Doesn’t Replace

Frequently Asked Questions

Do I need to use OpenAI for embeddings?

How much does this cost at scale?

Can I use this without the `pgvector` gem?

Is HNSW better than IVFFlat?

What happens if the OpenAI API is down?

About the Author

Share this article

Related Articles

How to Upgrade a Rails App Without a Big-Bang Rewrite

Rails ActiveRecord Callbacks: When They Help and When They'll Burn You

Build Custom Rails 8 Generators to Eliminate Repetitive Boilerplate

Need Expert Rails Development?

Semantic Search in Rails with pgvector: From Zero to Production

What Vector Embeddings Actually Are

Setting Up pgvector

Generating and Storing Embeddings

Querying: Find Similar Records

Adding an HNSW Index for Production

Backfilling Existing Records

A Minimal RAG Pipeline

Production Gotchas

What This Doesn’t Replace

Frequently Asked Questions

Do I need to use OpenAI for embeddings?

How much does this cost at scale?

Can I use this without the pgvector gem?

Is HNSW better than IVFFlat?

What happens if the OpenAI API is down?

About the Author

Share this article

Related Articles

How to Upgrade a Rails App Without a Big-Bang Rewrite

Rails ActiveRecord Callbacks: When They Help and When They'll Burn You

Build Custom Rails 8 Generators to Eliminate Repetitive Boilerplate

Need Expert Rails Development?

Can I use this without the `pgvector` gem?