Rails insert_all and upsert_all: Bulk Database Operations That Skip the ORM Overhead
Rails insert_all and upsert_all for bulk database operations: skip callbacks, handle conflicts, returning IDs, and benchmark results from a 100K row import.
The nightly sync job had been running for two and a half years. Every evening at 11pm, it pulled product catalog updates from a third-party feed and saved them to the database. Forty thousand records. Six hours.
The client had accepted it as fact of life — “that’s just how long the sync takes.” When I looked at the code, it was exactly what you would expect from a six-year-old Rails codebase: product_data.each { |attrs| Product.find_or_create_by!(external_id: attrs[:external_id]).update!(attrs) }. One query to find, one to update. Eighty thousand round trips to Postgres for forty thousand products, with ActiveRecord instantiating a Ruby object for each one and running the full callback chain.
I replaced the core of it with upsert_all and a thoughtful conflict strategy. The sync now runs in four minutes.
insert_all and upsert_all shipped in Rails 6.0 and they are still underused five-plus years later. I see senior Rails developers writing row-by-row inserts out of habit, or reaching for third-party gems that were necessary before Rails built this in. Here is everything you need to know to use them correctly.
What Rails insert_all and upsert_all Actually Do
insert_all and upsert_all generate a single SQL statement that inserts multiple rows in one round trip to the database. They bypass ActiveRecord callbacks, validations, and associations entirely. They do not instantiate Ruby objects for the rows being inserted. They are as close to raw SQL as you can get from the Rails ORM while still having the ORM write the query for you.
# Without insert_all — N round trips, N Ruby objects, full callback chain
records.each do |attrs|
User.create!(attrs)
end
# With insert_all — 1 round trip, 0 Ruby objects for inserted rows
User.insert_all(records)
The generated SQL looks like this:
INSERT INTO "users" ("name", "email", "created_at", "updated_at")
VALUES
('Alice', 'alice@example.com', '2026-06-16 10:00:00', '2026-06-16 10:00:00'),
('Bob', 'bob@example.com', '2026-06-16 10:00:00', '2026-06-16 10:00:00'),
...
ON CONFLICT DO NOTHING
RETURNING "id"
upsert_all is insert_all with ON CONFLICT DO UPDATE instead of ON CONFLICT DO NOTHING — conflicting rows are updated rather than skipped.
Both methods accept an array of hashes and return an ActiveRecord::Result object you can use to retrieve the inserted or updated IDs.
Basic Usage
# Bulk insert — skip on conflict (default)
User.insert_all([
{ name: "Alice", email: "alice@example.com" },
{ name: "Bob", email: "bob@example.com" }
])
# Bulk insert — raise on conflict
User.insert_all!([
{ name: "Alice", email: "alice@example.com" },
{ name: "Bob", email: "bob@example.com" }
])
# Upsert — update on conflict
User.upsert_all([
{ name: "Alice", email: "alice@example.com", updated_at: Time.current },
{ name: "Bob", email: "bob@example.com", updated_at: Time.current }
])
The bang variants (insert_all!, upsert_all!) raise on any conflict. The non-bang variants use ON CONFLICT DO NOTHING for insert_all and ON CONFLICT DO UPDATE for upsert_all.
One thing that catches people: Rails does not automatically set created_at and updated_at when using these methods. Because there is no ActiveRecord object lifecycle, the timestamp columns are not populated for you. Include them explicitly:
now = Time.current
records = product_data.map do |attrs|
attrs.merge(created_at: now, updated_at: now)
end
Product.insert_all(records)
If you omit timestamps and your table has a NOT NULL constraint without a database default — which is the Rails default — the insert will fail. Capture Time.current once before the batch so every row in the slice shares the same timestamp.
Benchmarks: insert_all vs Row-by-Row
The performance difference is real and consistent. Here is a benchmark inserting 10,000 rows with a realistic payload (five string attributes, two timestamps):
require "benchmark"
data = 10_000.times.map do |i|
{
name: "User #{i}",
email: "user#{i}@example.com",
created_at: Time.current,
updated_at: Time.current
}
end
Benchmark.bm(20) do |x|
x.report("create! row-by-row:") do
data.each { |attrs| User.create!(attrs) }
end
x.report("insert_all:") do
User.insert_all(data)
end
end
Typical results against a local Postgres:
user system total real
create! row-by-row: 8.412000 1.334000 9.746000 (23.841000)
insert_all: 0.089000 0.012000 0.101000 (0.347000)
68x faster in wall time. The gap widens under network latency — on a production app where Rails and Postgres run on separate machines, each round trip adds 0.5–5ms. Ten thousand round trips at 1ms each is ten seconds of pure network wait. insert_all costs one round trip regardless of record count.
The memory difference is equally significant. Row-by-row create! allocates one Ruby object per record plus all the ActiveRecord overhead. insert_all allocates one ActiveRecord::Result for the entire batch. For 100,000 records the difference can be hundreds of megabytes of heap pressure and multiple GC cycles mid-import — exactly the kind of thing that GC tuning can soften but not eliminate.
Handling Conflicts with upsert_all
upsert_all with ON CONFLICT DO UPDATE is where the real power lives. You control exactly which columns trigger a conflict check and which columns get updated on conflict.
The unique_by: option specifies which unique constraint to use for conflict detection:
Product.upsert_all(
products_data,
unique_by: :external_id # use the unique index on external_id
)
The on_duplicate: option specifies which columns to update when a conflict is detected:
Product.upsert_all(
products_data,
unique_by: :external_id,
on_duplicate: Arel.sql("name = excluded.name, price_cents = excluded.price_cents, updated_at = excluded.updated_at")
)
excluded is the Postgres keyword for the row that was proposed for insertion — the one that conflicted. excluded.name means “the name from the row we tried to insert.” This lets you update specific columns while leaving others (like created_at) untouched:
Product.upsert_all(
products_data.map { |p| p.merge(updated_at: Time.current) },
unique_by: :external_id,
on_duplicate: Arel.sql(<<~SQL)
name = excluded.name,
price_cents = excluded.price_cents,
stock_count = excluded.stock_count,
updated_at = excluded.updated_at
SQL
)
This is exactly the pattern from the catalog sync in the opening. The external_id unique index provides the conflict key; the on_duplicate clause updates the product data while preserving the original created_at.
Returning Inserted IDs
Sometimes you need the IDs of inserted or upserted rows — to create associations, queue follow-up jobs, or confirm what was actually processed. The returning: option asks Postgres to hand them back:
result = User.insert_all(
user_data,
returning: [:id, :email]
)
result.rows # => [[1, "alice@example.com"], [2, "bob@example.com"]]
result.to_a # => [{"id" => 1, "email" => "alice@example.com"}, ...]
To extract just the IDs:
inserted_ids = User.insert_all(user_data, returning: :id).rows.flatten
This is far cleaner than querying for IDs after insertion. With upsert_all, the returned rows include both newly inserted and updated records.
Queueing follow-up jobs after a bulk import:
inserted_ids = Product.upsert_all(
products_data,
unique_by: :external_id,
returning: :id
).rows.flatten
IndexProductJob.perform_later(inserted_ids) if inserted_ids.any?
The Callbacks Trade-off
insert_all and upsert_all skip all ActiveRecord callbacks. before_create, after_save, after_commit — none of them fire. This is a feature, not a bug, and it is the primary source of the performance gain. But it means you need to understand what your callbacks do and whether you need that work to happen.
Common callbacks that need manual attention:
Search index updates — if Elasticsearch or Meilisearch callbacks push documents on save, they will not fire. Trigger a re-index explicitly using the returned IDs after the bulk operation.
Cache invalidation — cache-sweeping callbacks will not fire. Expire caches explicitly or accept stale data until TTL expiry.
Webhooks and notifications — after_commit hooks that fire webhooks will not run. Trigger them explicitly for the upserted IDs if they are business-critical.
Slug generation — gems like FriendlyId set slugs in callbacks. Either compute the slug before calling insert_all and include it in the payload, or run a follow-up update for rows missing slugs.
The pattern I use is “insert first, handle side effects explicitly”:
ActiveRecord::Base.transaction do
result = Product.upsert_all(products_data, unique_by: :external_id, returning: [:id])
upserted = result.rows.map(&:first)
Product.search_index.reindex(upserted)
Cache.delete_many(upserted.map { |id| "product/#{id}" })
end
Explicit is better than implicit. Callbacks are implicit; this is explicit.
Validations Don’t Run — Plan Accordingly
No model validations run during insert_all or upsert_all. Your validates :email, uniqueness: true at the Rails level has no effect. The data goes straight to the database, constrained only by what the schema enforces.
What does protect you: database-level constraints. Unique indexes, NOT NULL constraints, check constraints, foreign key constraints — all enforce integrity at the database level and will raise ActiveRecord::RecordNotUnique or ActiveRecord::InvalidForeignKey on violation.
This is actually the right level for integrity guarantees. Model validations are convenience; database constraints are guarantees. If you have been relying only on Rails validations without corresponding database constraints, bulk operations will expose that gap. Fix the gap at the schema level, not by avoiding insert_all.
Validate your input data yourself before calling insert_all:
valid_records, invalid_records = records.partition do |attrs|
attrs[:email].present? && attrs[:external_id].present?
end
Rails.logger.warn("Skipping #{invalid_records.size} invalid records") if invalid_records.any?
Product.upsert_all(valid_records) if valid_records.any?
Batching Large Imports
Postgres handles large INSERT statements well, but extremely large ones — hundreds of thousands of rows — can cause issues: long lock hold times, memory pressure on the server, and statement timeout violations. Batch your imports:
BATCH_SIZE = 1_000
now = Time.current
records.each_slice(BATCH_SIZE) do |batch|
Product.insert_all(batch.map { |r| r.merge(created_at: now, updated_at: now) })
end
1,000 rows per batch is a solid default. Go up to 5,000–10,000 for simple schemas with small rows; go down if you are seeing timeout errors or your rows have large text columns.
For large imports where you want progress visibility and clean failure recovery:
class BulkProductImporter
BATCH_SIZE = 1_000
def initialize(records)
@records = records
@now = Time.current
end
def call
total = @records.size
imported = 0
@records.each_slice(BATCH_SIZE) do |batch|
Product.upsert_all(
batch.map { |r| r.merge(created_at: @now, updated_at: @now) },
unique_by: :external_id,
on_duplicate: Arel.sql("name = excluded.name, price_cents = excluded.price_cents, updated_at = excluded.updated_at")
)
imported += batch.size
Rails.logger.info("Imported #{imported}/#{total} products")
end
end
end
If a batch fails partway through, you restart from the last checkpoint. Keep a cursor in your import state if your data source supports it.
Combining with Background Jobs
The cleanest production pattern pairs Solid Queue recurring jobs with upsert_all for the actual import work:
class SyncProductCatalogJob < ApplicationJob
queue_as :default
def perform
feed = ProductFeedClient.fetch_all
now = Time.current
records = feed.map do |item|
{
external_id: item.id,
name: item.name,
price_cents: (item.price * 100).to_i,
created_at: now,
updated_at: now
}
end
records.each_slice(1_000) do |batch|
Product.upsert_all(
batch,
unique_by: :external_id,
on_duplicate: Arel.sql("name = excluded.name, price_cents = excluded.price_cents, updated_at = excluded.updated_at")
)
end
Rails.logger.info("Synced #{records.size} products from feed")
end
end
No callbacks. No Ruby objects. No six-hour windows.
When to Use insert_all — and When Not To
Use insert_all / upsert_all when:
- Importing from an external data source (CSV, API feed, webhook payload)
- Syncing large datasets on a schedule
- Seeding the database (seeds rarely need callbacks)
- Creating many records in a background job where side effects are handled separately
- Migrating data between tables within the same database
Do not use them when:
- You need validations for user-submitted data. Use
createwith proper error handling. - Your callbacks have meaningful side effects that are part of the feature (a welcome email on sign-up, a ledger entry on payment). The callback exists for a reason.
- You are creating records with complex associations that must be set up synchronously. Chain multiple
insert_allcalls if needed, but the orchestration gets complex quickly — sometimes row-by-row is the right call. - Downstream code expects ActiveRecord model instances.
insert_allreturns anActiveRecord::Result, not model objects. Load them after insertion using the returned IDs if you need the full AR interface.
After nineteen years of Rails the rule is simple: more than a few dozen rows and you do not need callbacks, insert_all is your default. Row-by-row inserts are for single records and small batches where the ORM lifecycle adds value.
FAQ
Does insert_all work with PostgreSQL, MySQL, and SQLite?
insert_all uses database-specific SQL but the Rails API is database-agnostic — Rails handles the adapter difference. PostgreSQL uses ON CONFLICT DO NOTHING/DO UPDATE. MySQL uses INSERT IGNORE/ON DUPLICATE KEY UPDATE. SQLite 3.24+ supports ON CONFLICT. On PostgreSQL, all features including unique_by and returning: are fully supported. On MySQL, the returning: option is not available. If you are on PostgreSQL — which you should be for any production Rails app — you get the full feature set.
Why don’t timestamps get set automatically in insert_all?
Because there is no ActiveRecord object and no callbacks. Timestamps are set by the ActiveRecord::Timestamp module, which runs during the normal save lifecycle. insert_all bypasses that lifecycle entirely. Capture Time.current once before your loop and merge it into every row — one timestamp for the entire batch is fine and gives you a consistent marker for all rows inserted in the same operation.
Can I use insert_all with STI (Single Table Inheritance)?
Yes. Include the type column in your data:
Document.insert_all([
{ type: "Invoice", number: "INV-001", amount_cents: 10_000, created_at: now, updated_at: now },
{ type: "Receipt", number: "REC-001", amount_cents: 5_000, created_at: now, updated_at: now }
])
ActiveRecord’s STI uses the type column to distinguish subclasses, and insert_all will set it to whatever you provide. No magic, no inference — you specify the type explicitly.
How do I handle partial failures in a large insert_all batch?
insert_all (non-bang) with ON CONFLICT DO NOTHING silently skips conflicting rows. insert_all! raises on any conflict. For controlled conflict handling, use upsert_all with unique_by and on_duplicate to specify exactly what happens on conflict. For database errors beyond conflicts (constraint violations, invalid data), wrap each batch in a rescue block with enough context to retry:
records.each_slice(1_000).with_index do |batch, i|
Product.insert_all(batch)
rescue ActiveRecord::StatementInvalid => e
Rails.logger.error("Batch #{i} failed: #{e.message}. First record: #{batch.first.inspect}")
end
Log the batch index and a sample record so you can identify which data caused the failure and rerun that slice after fixing it.
Need to untangle a slow data sync or design a bulk import pipeline that holds up in production? TTB Software has spent nineteen years building and fixing Rails data infrastructure. We can get your imports from hours to minutes.
Related Articles
Rails State Machine: AASM Patterns for Orders, Subscriptions, and Workflows in Production
Rails state machine with AASM: production patterns for orders, subscriptions, and workflows. Guards, callbacks, optim...
Rails PgBouncer: Transaction Pooling, Prepared Statements, and Connection Sizing in Production
Rails PgBouncer transaction pooling done right: prepared statements, connection sizing, advisory locks, LISTEN/NOTIFY...
Rack Mini Profiler: Performance Profiling for Rails in Development and Production
Rack Mini Profiler for Rails: profile SQL queries, partials, memory, and GC in development and production. Find N+1s,...