Realtime Sync
SemiLayer gives you four distinct paths for keeping the vector index fresh. They differ in how much infrastructure you need to set up, how quickly changes land, and whether deletes propagate automatically.
The four paths at a glance
| Path | Bridge scan | Wipes first | Hash dedup | Detects deletes | Typical use |
|---|---|---|---|---|---|
| Smart sync | All rows | No | Yes | Yes — tombstone | Default refresh. Catches edits and deletes, only re-embeds rows whose content changed. Console button or semilayer sync only — no webhook path. |
| Incremental | WHERE updated_at > $cursor | No | Yes | No | Cheap polling. Fires automatically from syncInterval in config. Misses deletes and rows where the tracking column didn't move. |
| Records | Specific IDs from the change buffer | No | Yes | Yes — explicit action: 'delete' | CDC push. For sources with native change-data-capture. Sub-second latency, zero polling overhead. |
| Rebuild | All rows | Yes — destructive | n/a | n/a | Clean slate. Use after a mapping config change, partition corruption, or when certainty matters more than speed. |
The mental model: Smart sync is the default. Use Rebuild only when you need a clean slate. Use Incremental + Records to keep things fresh automatically between manual refreshes — Incremental via syncInterval for the cheap path, Records via the webhook for the precise path.
The easy path: config and Console
You do not need to wire up any infrastructure to keep most lenses reasonably fresh. Two zero-code options handle the common case.
Option A — syncInterval (Incremental, automatic)
Add one line to your lens config and SemiLayer handles the rest. Every N minutes the
worker fires an incremental scan: it reads rows where changeTrackingColumn > last_cursor,
re-embeds anything new or changed, and advances the cursor.
Under the hood, each lens's syncInterval is evaluated every minute by a global tick
handler. When a lens is due, the worker reads all rows changed since the last cursor and
re-embeds only those rows. No separate queue setup, no infrastructure to manage.
Tradeoffs:
| Setup | One config line |
| Latency | Up to the interval (e.g. 15 minutes for '15m') |
| Deletes | Not detected. Deleted rows don't appear in the scan — they linger in the index until a Smart sync or Rebuild. |
| Cost | Low — only reads changed rows |
When changeTrackingColumn doesn't exist: Incremental reads all rows on every tick
and skips nothing. This is equivalent to a full scan at the interval cost. If your table
has no reliable timestamp column, use Smart sync instead.
Option B — Smart sync on demand (Console or CLI)
Smart sync scans every row in the source, computes a content hash, and re-embeds only rows whose hash changed since the last sync. Rows that were deleted from the source are detected by their absence (tombstone) and removed from the index.
From the Console: Open a Lens → click Smart sync. No config change required.
From the CLI:
Smart sync has no webhook path — { "mode": "smart" } is not a valid ingest request
body and will return 400. It is only available via the Console button and
semilayer sync. Use { "mode": "full" } if you need to trigger a clean-slate rebuild
from external infrastructure.
Tradeoffs:
| Setup | None |
| Latency | Completes after the full scan (seconds to minutes depending on table size) |
| Deletes | Yes — tombstone detection. Rows absent from the source are purged. |
| Cost | Higher than incremental — reads every row to compute hashes |
Smart sync is the right choice when you've made changes that incremental can't detect:
hard deletes, bulk updates that bypassed updated_at, or a manual data correction.
Running it once after a migration is a common pattern.
Choosing between the two easy paths
| Question | Smart sync | syncInterval (Incremental) |
|---|---|---|
Your table has an updated_at column | Works | Works best |
| Your table hard-deletes rows | Works — tombstone detects them | Does not work — deletes linger |
| You want fully automatic, no manual intervention | Run periodically via cron or on a schedule | Set syncInterval and forget |
| Your table is large (millions of rows) | Slower — hashes all rows | Fast — only reads recent changes |
| You need changes in the index within seconds | Not ideal | Not ideal — use Records mode |
A common production setup is syncInterval: '15m' for continuous incremental refresh,
combined with a nightly Smart sync (scheduled via semilayer sync in cron) to catch
deletes and any drift that slipped through.
When to go further: the Records webhook
If neither easy path is precise enough — you need sub-second freshness, you want deletes to propagate immediately without a nightly Smart sync, or you have a high-write table where a 15-minute lag is too long — the Records webhook is the answer.
You call POST /v1/ingest/:lens from your infrastructure (a DB trigger, a Lambda, your
application code) with the exact IDs that changed and whether each was an upsert or
delete. SemiLayer processes only those rows.
This is what the rest of this guide is about.
Ingest Keys (ik_)
Every Environment has one ingest key auto-created at provisioning time. Ingest keys
are purpose-built for this one job: calling POST /v1/ingest/:lens.
| Prefix | What it can do |
|---|---|
ik_dev_ | Trigger ingest on development lenses |
ik_live_ | Trigger ingest on production lenses |
ik_ keys cannot call search, similar, query, or any other API. If one leaks, an
attacker can trigger an unnecessary re-index — annoying, not catastrophic. You can
revoke and rotate them freely.
sk_ (secret) keys also work for the ingest endpoint, but use ik_ in any
infrastructure that doesn't need full API access.
Console: Environment → API Keys → filter by type "Ingest"
CLI:
The endpoint
:lens is the lens name from your sl.config.ts.
Response
Or if absorbed by a debounce window:
202 Accepted in both cases. Either way the change will be processed.
Error responses
| Status | Meaning |
|---|---|
400 | Invalid request body — check mode and changes shape |
404 | Lens not found — check the lens name and environment |
429 | Rate limit exceeded (SaaS only) — retry after Retry-After seconds |
Enterprise deployments have no rate limits on the ingest endpoint. The 429 path
only applies to SaaS plans.
Mode reference
Smart sync (Console / CLI only)
Full-table scan with hash dedup and tombstone delete detection. Only available via
semilayer sync or the Console Smart sync button — there is no { "mode": "smart" }
webhook body. See The easy path above.
records mode — recommended for CDC
You tell SemiLayer exactly which rows changed and what to do with them. This is the most efficient mode: no full scan, no timestamp window — just targeted operations on the rows you name.
id is the value of the primaryKey field declared in your Lens, as a string.
Upsert: SemiLayer fetches the current row via the Bridge, re-embeds it, and updates the index. If the row doesn't exist in the source, its embedding is removed automatically.
Delete: The embedding is removed immediately. SemiLayer does not re-query your
database — it trusts the action field. No changeTrackingColumn needed.
Debounce: 2-second window per lens. Calls within the window are merged. If the same
ID appears multiple times, the last action wins — an upsert then delete for the
same ID correctly results in a delete.
Batch limit: Maximum 10,000 changes per request. Split larger batches.
incremental mode
SemiLayer queries rows where changeTrackingColumn > last_cursor, re-embeds whatever
changed, and advances the cursor. Normally fired automatically via syncInterval — you
rarely call this directly.
Requires changeTrackingColumn on your Lens (defaults to updated_at).
Debounce: 5-second window. Useful when bursts of change events arrive and you want them coalesced into one scan.
Limitation: incremental cannot detect hard deletes. It reads rows that changed —
rows that were deleted simply aren't there to read. Use Smart sync, Records mode with
action: 'delete', or a nightly Rebuild to purge them.
full (Rebuild) mode
Clears all existing embeddings for the lens and re-reads, re-embeds, and re-indexes every row from scratch.
Destructive. All vectors are wiped before the scan begins. If the rebuild fails mid-way, the index is empty until it completes.
Singleton dedup: Only one Rebuild runs per lens at a time.
Use Rebuild for:
- First-time indexing after initial push with
semilayer push --rebuild - After changing a field's
searchableweight ortransformchain (embeddings are stale) - Recovery after corruption or a botched migration
- When you want absolute certainty the index matches the source with no drift
How dedup and debounce work
SemiLayer's ingest queue uses debounce windows and singleton keys to absorb bursts.
| Mode | Dedup type | Window |
|---|---|---|
records | Debounce per lens | 2 seconds |
incremental | Debounce per lens | 5 seconds |
smart | Singleton per lens | Until job completes |
full | Singleton per lens | Until job completes |
In practice: If a Postgres trigger fires POST /v1/ingest/products (incremental) on
every row insert and you bulk-insert 5,000 rows in a transaction, you get one ingest
job queued — not 5,000. The first call starts a 5-second timer; every subsequent call
within the window returns status: 'deduplicated' and the timer resets. After 5 seconds
of quiet the job runs.
Handling deletes
This is the most important thing to get right.
Which modes propagate deletes:
| Mode | How deletes are handled |
|---|---|
smart | Tombstone detection — rows absent from the full scan are purged |
records | Explicit action: 'delete' — you send the delete, SemiLayer trusts it |
incremental | Not supported — deleted rows are invisible to the timestamp scan |
full (Rebuild) | All vectors wiped before rebuild — index always reflects current source |
For records mode: send the delete before or after the row is gone — SemiLayer
doesn't re-query the DB for deletes, so timing doesn't matter. But you must send it.
Patterns for capturing deletes per source:
| Source | Approach |
|---|---|
| PostgreSQL | AFTER DELETE trigger → pg_net.http_post |
| MySQL | AFTER DELETE trigger → changes table → poll |
| Prisma | $extends middleware — read IDs before deleteMany, then push deletes |
| Application code | Call the webhook in your delete service method |
| DynamoDB | DynamoDB Streams — REMOVE event type |
| MongoDB | Change Streams — delete operation type |
If you can't capture deletes: Use syncInterval for freshness and schedule a
nightly semilayer sync (Smart sync) to tombstone-detect any deletions. This is the
lowest-infrastructure delete strategy.
When a lens is paused
If a lens is paused, webhook calls are accepted and queued — not dropped. When
you resume, queued jobs run in order. The 202 response you receive still means "will
be processed."
10 Examples
Each example shows the complete wiring from database event to SemiLayer webhook.
1. PostgreSQL — pg_net trigger (INSERT / UPDATE / DELETE)
The cleanest Postgres approach. pg_net is available on Supabase, Neon, and most
managed Postgres providers. It makes HTTP requests directly from SQL triggers — no
application server involved.
Store the ik_ key in a Postgres configuration parameter so it isn't hardcoded:
alter system set app.semilayer_ingest_key = 'ik_live_...' then read it with
current_setting('app.semilayer_ingest_key').
Delete support: Yes — TG_OP = 'DELETE' maps to action: 'delete' in real time.
2. PostgreSQL — LISTEN / NOTIFY + Node listener
No pg_net extension required. A lightweight Node.js process listens on a Postgres
channel and calls the webhook. Good for environments where extensions aren't available.
Batching enhancement: Buffer notifications for 100ms and send them as a single
changes array to reduce HTTP calls during bulk operations.
3. Supabase — Database Webhook + Edge Function
Supabase has built-in database webhooks in the Dashboard. No trigger SQL needed.
Edge Function:
Dashboard wiring:
- Database → Webhooks → Create a new hook
- Table:
products, Events:INSERT,UPDATE,DELETE - Webhook URL: your Edge Function URL
- Add
SEMILAYER_INGEST_KEYin Edge Function secrets
Delete support: Yes — type === 'DELETE' maps directly.
4. MySQL / PlanetScale — Prisma $extends middleware
Prisma's query extensions intercept every write at the ORM layer. Works with any database Prisma supports — no trigger SQL needed.
Collect IDs before deleteMany — after deletion those rows are gone. The example
above does this correctly.
5. DynamoDB Streams → Lambda
DynamoDB Streams fires a Lambda with every insert, modify, and remove.
IAM permissions for the Lambda execution role:
Enable DynamoDB Streams on the table with StreamViewType: NEW_AND_OLD_IMAGES.
Delete support: Yes — REMOVE events carry OldImage with the deleted key.
6. MongoDB Change Streams
Change Streams give a real-time cursor over all writes. Run inside your app or as a dedicated microservice.
Resumability: Persist changeStream.resumeToken to durable storage so the watcher
can resume after a restart without missing events.
7. Express / Fastify — application service layer
If you own the write path, call the webhook directly from your service methods. No triggers, no separate process.
Call syncRecords after res.json() / res.end(). The write is committed before
the sync fires, and your API latency is unaffected even if SemiLayer is momentarily
slow.
8. Kafka consumer
Wire a Kafka consumer for event-driven architectures where writes publish as domain events. Works with AWS MSK, Confluent Cloud, and self-hosted Kafka.
9. AWS SQS — queue-driven sync
SQS gives guaranteed delivery and natural backpressure. Any service that writes to your DB also enqueues a message; a consumer drains the queue and calls SemiLayer.
For Lambda-based SQS consumers, SQS can invoke Lambda directly. Return normally to auto-delete messages; throw to retry. Add a Dead Letter Queue for poison messages.
10. Periodic cron + records mode (poll-and-push)
For legacy databases, third-party APIs, or flat files that don't emit change events,
a cron job polls for recent changes and pushes them with records mode. This is more
precise than incremental because you control exactly which IDs get pushed — including
soft-deleted rows.
Use a window slightly wider than your cron interval (6 minutes for a 5-minute cron)
to avoid missing records at the boundary. Duplicate upsert calls for unchanged rows
are harmless — the hash dedup step skips them.
Hard deletes with no deleted_at column? Schedule a nightly Smart sync instead of
a nightly Rebuild — it tombstone-detects missing rows without wiping the index first:
Choosing the right approach
| Scenario | Recommended |
|---|---|
| Just getting started, Postgres source | syncInterval: '15m' in config |
| Needs delete propagation, minimal setup | syncInterval + nightly semilayer sync |
Postgres with pg_net | Example 1 — SQL trigger → pg_net |
| Postgres without extensions | Example 2 — LISTEN/NOTIFY + listener |
| Supabase | Example 3 — built-in database webhook |
| MySQL / PlanetScale via Prisma | Example 4 — Prisma $extends |
| DynamoDB | Example 5 — DynamoDB Streams → Lambda |
| MongoDB / DocumentDB | Example 6 — Change Streams |
| You own the write path | Example 7 — service-layer sync call |
| Event-driven / Kafka | Example 8 — Kafka consumer |
| SQS / SNS / EventBridge | Example 9 — SQS consumer |
| Legacy DB or no change events | Example 10 — cron poll + records |
Troubleshooting
404 Not Found — Lens name in the URL doesn't match any lens in this environment.
Check you're using the correct environment's ik_ key and that semilayer push has run.
429 Too Many Requests — Rate limit exceeded (SaaS only). Read Retry-After and
wait. Use records mode and batch changes to reduce call volume.
Vectors not updating — Run semilayer status. If the lens is paused, calls queue
but don't process until semilayer resume. If error, check Jobs in the Console.
Deletes not removing from search results — You're likely using incremental mode or
syncInterval without a Smart sync. Neither can detect hard deletes. Switch to records
mode with action: 'delete', or add a nightly semilayer sync to tombstone-detect them.
deduplicated responses — Normal during high-write periods. Your change is merged
into an already-queued job and will still be processed.
changeTrackingColumn not found — Incremental reads all rows on every tick (no
filtering). Add the column or switch to Smart sync.