Airgap mode (runner-local credentials)

By default, database credentials for each source live encrypted in SemiLayer's database. We decrypt them at query time and ship them to the assigned runner just-in-time.

Airgap mode — credentialsLocation: "runner-local" — flips that: credentials never leave your machine. SemiLayer only knows the source's name and bridge; the actual connection URL comes from the runner's own environment.

Enable on a source

ℹ️

Flipping credentialsLocation on an existing source wipes the stored config. Rotate carefully; customers with live ingest jobs will see them fail until the runner environment is updated.

Via the Console: open the source → toggle Store credentials on runner only → save.

Via CLI:

semilayer sources update primary-db --credentials-location=runner-local

Wire the runner

The runner reads per-source env vars named SEMILAYER_SOURCE_<NAME>_<KEY> (all uppercase, hyphens → underscores). <NAME> is the source name; <KEY> is the bridge config field in upper-snake-case. For URL-style bridges (Postgres, MySQL, MongoDB, Redis) one _URL env var is enough:

docker run --rm \
  -e SEMILAYER_RUNNER_ID=<runner-uuid> \
  -e SEMILAYER_RUNNER_TOKEN=rk_... \
  -e SEMILAYER_SOURCE_PRIMARY_DB_URL=postgresql://... \
  ghcr.io/semilayer/runner:latest

For bridges that take structured config (ClickHouse, Snowflake, DynamoDB, BigQuery, …), set one env var per field. The runner camelCases each suffix back to the bridge config key it expects:

Env var	Bridge config key
`SEMILAYER_SOURCE_<NAME>_HOST`	`host`
`SEMILAYER_SOURCE_<NAME>_PORT`	`port`
`SEMILAYER_SOURCE_<NAME>_DATABASE`	`database`
`SEMILAYER_SOURCE_<NAME>_USERNAME`	`username`
`SEMILAYER_SOURCE_<NAME>_PASSWORD`	`password`
`SEMILAYER_SOURCE_<NAME>_REGION`	`region`
`SEMILAYER_SOURCE_<NAME>_ACCESS_KEY_ID`	`accessKeyId`
`SEMILAYER_SOURCE_<NAME>_SECRET_ACCESS_KEY`	`secretAccessKey`
`SEMILAYER_SOURCE_<NAME>_PROJECT_ID`	`projectId`
`SEMILAYER_SOURCE_<NAME>_SERVICE_ACCOUNT_EMAIL`	`serviceAccountEmail`

Pure-digit values (e.g. PORT=9000) are coerced to integers; everything else stays a string. Mixing _URL with per-key vars is allowed — both land in the dispatched config and the bridge picks what it needs.

Example for ClickHouse:

docker run --rm \
  -e SEMILAYER_RUNNER_ID=<runner-uuid> \
  -e SEMILAYER_RUNNER_TOKEN=rk_... \
  -e SEMILAYER_SOURCE_ANALYTICS_HOST=clickhouse.internal \
  -e SEMILAYER_SOURCE_ANALYTICS_PORT=9000 \
  -e SEMILAYER_SOURCE_ANALYTICS_DATABASE=events \
  -e SEMILAYER_SOURCE_ANALYTICS_USERNAME=reader \
  -e SEMILAYER_SOURCE_ANALYTICS_PASSWORD=*** \
  ghcr.io/semilayer/runner:latest

Every source assigned to the runner in runner-local mode needs at least one env var of its own. Sources in default (managed) mode ignore these — their config still comes from SemiLayer's DB.

What SemiLayer sees

Before the runner executes	What we know	What we don't
A `search` / `query` / `similar` API call	the lens name, the org+env, the RBAC decision, the user (if JWT), the query params (query text / where clauses)	the database URL, the DB user, the password, the TLS cert chain on your side
After the runner executes	the row shape your lens declares (mapped through `fields`), result row count	the raw source row shape before mapping

The query params still cross our boundary — that's how routing works, and you probably want server-side rate limiting on them anyway. What stays is the connection: IP, TLS, auth handshake, all of it.

Prove it

You can packet-capture on the runner host and confirm:

Outbound traffic to runner.semilayer.com:443 — one persistent WSS connection.
Outbound traffic to your DB host:port — short-lived, each correlating to an incoming job frame on the runner's socket.
No outbound traffic to SemiLayer carrying the DB URL in any form. Payloads over the WebSocket are query params + result rows, never credentials.

Tradeoffs

Per-env config. Each environment (dev / staging / prod) needs its own set of env vars on the runner. Docker Compose or Kubernetes secrets handle this fine; we don't ship a management layer.
Lost credentials → runner outage. If your runner container loses its env, sources go offline until you restore them. SemiLayer can't help — we never had them.
Ingest works the same. The ingest worker asks the runner to read the source, same as query dispatch. No divergent code path.
Smart sync works over the runner tunnel. Both on-demand (semilayer sync / Console button) and scheduled (smartSyncInterval in config) paths run through the same bridge executor. Full-scan traffic flows via the runner tunnel — for very large tables, consider the tunnel bandwidth when picking a cadence.