Back to Blog

Engineering

How Smuves made search 200 times faster with Typesense

Smuves • April 17, 2026 • 7 min read

Searching across 70,000 HubSpot records inside Smuves used to take 20-30 seconds. Sometimes it timed out entirely. Today, the same search returns in under 40 milliseconds. That is roughly 200 times faster, and it was not the result of better indexes or a faster server. It was the result of accepting that Postgres, as much as we love it, is not the right tool for live search and sorting over large semi-structured datasets with dynamic keys.

This post is the story of that rebuild. What broke, what we tried before giving up on the old approach, why we ended up putting a search engine in front of Postgres even though every instinct said do not add more infrastructure, and what it took to make the whole thing work in production.

What was actually broken

For most of the last year, Smuves ran entirely on Supabase. Postgres stored every HubSpot page, blog post, landing page, and HubDB row we fetched from our users portals. It also handled every search, every filter, every sort. One database, one source of truth, no moving parts. For a product at our stage, that simplicity was a feature.

Then users started connecting real portals. Not test portals with fifty pages. Real agency portals with 30,000 pages, 70,000 blog posts, HubDB tables with a hundred thousand rows each. The kind of content volume that sounds abstract until someone opens the Smuves dashboard and clicks a filter and the page just sits there. Spinning.

Searches and filters across a large content type took anywhere from 20 to 30 seconds. Sorting by date on a 70,000 post collection was particularly painful. We had indexes. We had generated columns for the most common filter fields. We had done the Postgres homework. The queries were optimized. And still, the experience was unacceptable.

When we dug into why, the problem was not any one bad query. It was the nature of what users do inside a bulk editor. They do not run a single search and stop. They type a few characters, see results, adjust a filter, toggle sort order, scroll, refine, and repeat. Every one of those actions hits the database. Postgres, no matter how well indexed, reads data from disk, and disk reads have a floor on how fast they can go.

We had hit that floor.

What we tried before the rewrite

Before we even considered a search engine, we spent a couple of weeks tuning Postgres harder. We added more generated columns. We tried GIN indexes with jsonb_path_ops on the raw HubSpot payload. We partitioned one of the larger tables by portal ID. We set up read replicas so search queries would not compete with writes.

All of it helped. None of it was enough.

The issue was not Postgres being slow at any single thing. Postgres is excellent at what it does. The issue was that a bulk editor is essentially a live query interface over a huge dataset, and a live query interface wants data in memory, not on disk. We were asking Postgres to do a job it was not designed for, and no amount of index tuning was going to close the gap between disk reads and RAM reads.

That is the core insight we wish we had accepted two months earlier than we did. If you are building something where users are constantly searching, filtering, and sorting over large data, you are going to need something that keeps the searchable part easily accessible in memory. Not eventually. From the start.

The EAV detour

Before moving to a search engine, we tried another approach that initially looked promising.

Instead of querying directly against the raw JSONB payloads, we introduced an EAV (Entity Attribute Value) model for searchable fields.

The idea was straightforward. We kept the full HubSpot payload stored inside Postgres as JSONB, but extracted fields needed for filtering and sorting into a separate table. Instead of searching deeply nested JSON structures repeatedly, queries could operate against indexed rows containing key-value pairs.

At first, this worked surprisingly well.

Searches became faster immediately because Postgres no longer had to traverse massive JSON payloads for every filter and sort operation. We also reduced the amount of indexed data aggressively. Only fields with string, number, or date-time types were extracted into the EAV table. Everything else stayed inside the raw payload.

That optimization mattered because the write amplification was already becoming significant.

For example, if a content type had 45 fields, we might still end up extracting around 30 searchable fields per record. Fetching 500 HubSpot records in one invocation could easily translate into roughly 15,000 EAV inserts on top of the original writes.

And that was where the architecture started fighting us.

Our ingestion pipeline was already handling:

HubSpot API pagination
payload normalization
JSONB inserts
batching
job tracking
retries

Adding thousands of secondary EAV writes into the same pipeline created constant pressure on execution limits and database throughput.

We tried several variations:

Postgres triggers
async population using pg_net
delayed EAV processing
background workers
queue-based approaches

All of them improved pieces of the problem, but none solved the underlying issue.

The deeper problem was that we were slowly building a search engine inside Postgres instead of using a system designed for search workloads from the start.

One uncomfortable realization during this phase was how easy it is for modern AI tooling to reinforce the wrong architectural direction.

The EAV approach looked reasonable on paper, and we repeatedly validated it through LLMs, blog posts, and engineering discussions online. The issue was not that the advice was completely wrong. The issue was that most of it focused on read optimization in isolation, without fully accounting for the operational cost of maintaining the write path at scale.

That became an important lesson for us. AI tools are extremely useful for local optimizations, but they still struggle to reason about long-term system complexity and operational tradeoffs.

Why Typesense

The shortlist was Typesense, Meilisearch, and Elasticsearch. All three are designed specifically for high-performance search workloads and heavily optimize indexed data structures for fast in-memory access, which is why they can return filtered and sorted results dramatically faster than general-purpose relational queries at scale.

Elasticsearch is the most powerful of the three and the one most teams reach for out of habit. We ruled it out because it is operationally heavy. Running a production Elasticsearch cluster is a full-time job for someone.

Meilisearch and Typesense are both lighter. Single binary, easy to run, reasonable defaults, designed for small teams. We went back and forth for a while. In the end Typesense won for three specific reasons. First, its query syntax maps cleanly to what we already had in Postgres, which meant our frontend code barely had to change. Second, Typesense Cloud is a real, managed offering that we trusted, whereas Meilisearch Cloud felt newer and less battle-tested. Third, Typesense has better support for faceted search, which we knew we would want later for multi-select filters on content status, author, and language.

How the data actually flows now

The architecture is simpler than it sounds.

Postgres is still the source of truth. Every HubSpot record we fetch lands in Postgres first, with the full raw payload stored as JSONB and a few high-use fields pulled into real columns. Nothing about that changed.

What changed is what happens next. After a record is inserted or updated in Postgres, we sync the document into Typesense. We actually push the full document, but configure the Typesense schema to define which fields are searchable, sortable, or faceted. That lets us avoid maintaining a separate projection layer while still keeping search performance extremely fast.

When a user types in the search bar or clicks a filter in Smuves, the frontend no longer calls Postgres. It calls Typesense. Typesense returns matching record IDs in single-digit milliseconds. The frontend then either renders straight from the Typesense result, for things like the content list view, or fetches the full record from Postgres by ID, for things like the edit view that needs the complete payload.

That split matters. Typesense is fast because it is optimized specifically for search and filtering workloads. Postgres stays lean because it is no longer the thing being hammered by live queries. Each system does the job it is good at.

The part that was harder than expected

Syncing data between Postgres and Typesense looks easy on paper. Write to Postgres, then write to Typesense. Done.

In practice, it is never that clean. What happens when the Postgres write succeeds and the Typesense write fails? Now the index is out of sync with the source of truth. What happens when a user updates a record twice in a second? You can get writes to Typesense out of order if you are not careful. What happens when you need to re-sync an entire content type because the schema changed?

Deletes turned out to be particularly important. A stale search result appearing after a record was deleted creates trust issues immediately because users assume the system is inconsistent.

We ended up building a small sync layer on top of pgmq, Supabase's native queue extension. Every write to Postgres enqueues a sync job. A background worker dequeues, pushes to Typesense, and retries on failure with exponential backoff. If a write fails enough times, it goes to a dead letter queue and we get alerted. Re-syncing a full content type is a matter of enqueuing every record ID in that content type and letting the workers grind through.

None of this is novel. It is a textbook event-driven sync pattern. But it took a week to get right, and the edge cases kept surfacing for another two weeks after that. If you are planning a similar migration, budget for this. The search engine itself is the easy part. Keeping it in sync with your source of truth is where the real work lives.

What we would do differently

We would question the architecture earlier.

The EAV system was not irrational. In fact, it solved the exact problem we initially had. That is what made it dangerous. Reads became faster, queries became cleaner, and the system appeared to scale better for a while.

But every optimization quietly increased operational complexity somewhere else:

write amplification
sync reliability
deletion handling
reprocessing
execution limits
queue management

Eventually we realized we were spending more time building infrastructure around search than building the product itself.

That was the real signal.

The biggest lesson from this migration was not “use Typesense.” It was learning to recognize when a system is being stretched beyond the kind of workload it was designed for.

Once we accepted that search needed its own home, the architecture actually became simpler.