Pre-implementation research. No code changes made. Findings dated 2026-05-25.
UNNEST + dot product, which does not benefit from the index at all. Creating the index changes nothing in production until we also change the SQL.mif_admin.py to VECTOR_SEARCH(...). That part does require a code change. Gate it behind a flag like the existing SCENARIO_SEARCH_HYBRID.encoded-stage-394013.analytics.fct_calls_scenario_embeddings does not exist. The Dagster asset hardcodes the prod project. There is a 100-row sandbox test_hybrid_scenario_search in dev. Index validation effectively has to happen on prod (or via a manually copied partition in dev).| IVF | TreeAH recommended | |
|---|---|---|
| Algorithm | Inverted-file via k-means clustering | Google ScaNN — tree-quantized with asymmetric hashing (product quantization) |
| Best for | Smaller datasets, smaller query batches, when "stored column" optimization matters | Large vector tables, large query batches; "orders of magnitude faster and more cost-effective" |
| Distance types | COSINE / EUCLIDEAN / DOT_PRODUCT | COSINE / EUCLIDEAN / DOT_PRODUCT |
| Incremental refresh | Async, periodic | Automatic background refresh — typically 5–15 min after writes |
| Stored-column join elimination | Yes (helps if your SELECT only needs indexed columns) | No — base table join is preserved |
| Partitioning | Supported | Supported (2026 feature) — enables partition pruning |
| Recall vs latency knob | fraction_lists_to_search | fraction_leaf_nodes_to_search |
LEFT JOINs the calls table for preset/system_prompt enrichment, so we'd need a base-table join anyway. The one IVF advantage doesn't apply.CREATE VECTOR INDEX is a DDL that returns immediately; BigQuery builds the index in the background using free background slots.VECTOR_SEARCH still returns correct results during the build: it serves indexed rows from the index and brute-forces the not-yet-indexed remainder. No code-level fallback logic needed.INFORMATION_SCHEMA.VECTOR_INDEXES — watch coverage_percentage and last_refresh_time.CREATE VECTOR INDEX scenario_embedding_idx
ON `sesame-prod-426417.analytics.fct_calls_scenario_embeddings`(embedding)
OPTIONS (
index_type = 'TREE_AH',
distance_type = 'COSINE'
);
We can also pass tree_ah_options = '{"normalization_type":"NONE"}' — our vectors are already L2-normalized so skipping the built-in normalization is correct. Default leaf size is usually fine; tune only if recall is poor.
This is the answer to your "is there a way to index the data without giving people access?" question. There are two layers:
UNNEST cosine, which the planner cannot route to a vector index. So you can create the index in prod and nobody's query path changes. It just sits there, populated and idle.SCENARIO_SEARCH_HYBRID pattern: add a SCENARIO_SEARCH_ANN env var (or a Statsig flag) that flips handle_scenario_search from the brute-force similarities CTE to a VECTOR_SEARCH-based version. Off-by-default → no user-visible change.CREATE VECTOR INDEX ... on prod table.INFORMATION_SCHEMA.VECTOR_INDEXES until coverage_percentage = 100 and index_status = 'ACTIVE'.VECTOR_SEARCH for a handful of representative queries. Measure recall@K and wall time. No app code touched.| Where | Table | Status | Notes |
|---|---|---|---|
Prodsesame-prod-426417 |
analytics.fct_calls_scenario_embeddings |
Exists | 12,366,493 rows · 912,458 calls · 75 partitions · 2026-03-12 → 2026-05-24 · 92 GB logical · clustered on (context_mode, user_key) |
Devencoded-stage-394013 |
analytics.fct_calls_scenario_embeddings |
Does NOT exist | The Dagster asset hardcodes the prod project (see fct_calls_scenario_embeddings.py). There is no dev-side embeddings table being populated. |
Devencoded-stage-394013 |
analytics.test_hybrid_scenario_search |
Sandbox | 100-row toy table used during hybrid-search bring-up. Schema matches prod minus a couple of nullability annotations. |
| Column | Type | Notes |
|---|---|---|
ds | DATE | Partition key |
call_id | INT64 | |
call_uuid | STRING | |
window_idx | INT64 | |
user_key | STRING | Clustering key |
character_name | STRING | |
call_duration_s / start_time_ms / end_time_ms / num_utterances | FLOAT64 / INT64 | Window metadata |
value | STRING | Window text with prepended search keys. scenario_value_idx SEARCH INDEX is built on this column (BM25-ish via SEARCH()). |
embedding | ARRAY<FLOAT64> NOT NULL | 768-dim, L2-normalized. This is the column we vector-index. |
embedding_model / context_mode / batch_job_name | STRING | Clustering: context_mode |
created_at | TIMESTAMP | |
content_deleted | BOOL | Retention nulls out value when true; embedding stays. |
| Index | Type | Column | Status | Coverage |
|---|---|---|---|---|
scenario_value_idx | SEARCH (BM25-ish) | value | ACTIVE | 100% (last refresh 2026-05-25 08:57 UTC) |
| none | VECTOR | embedding | — | The gap this work fills. |
VECTOR_SEARCH today so the index sits idle until we flip a flag. This is the lowest-friction path.CREATE TABLE encoded-stage-394013.analytics.scenario_embeddings_ann_test AS SELECT * FROM prod WHERE ds = '2026-05-24', then CREATE VECTOR INDEX against it. Gives an independent sandbox to validate the DDL options and the VECTOR_SEARCH rewrite end-to-end before touching prod. Adds ~1 GB of dev storage; otherwise free.Yes — the dense ranking path in mif_admin.py needs to be rewritten to use VECTOR_SEARCH. Creating the index alone does nothing for it.
Today the similarities CTE does this (paraphrased):
SELECT
e.*,
(SELECT SUM(ev*qv) FROM UNNEST(e.embedding) ev WITH OFFSET i
JOIN UNNEST(q.embedding) qv WITH OFFSET j ON i = j)
/ NULLIF(SQRT(...) * SQRT(...), 0) AS similarity
FROM `fct_calls_scenario_embeddings` e
CROSS JOIN query_embedding q
LEFT JOIN `calls` c ON ...
WHERE e.ds BETWEEN @start_date AND @end_date
The planner cannot route hand-written UNNEST arithmetic to a vector index. To get acceleration we'd need something like:
WITH query_embedding AS (
SELECT ml_generate_embedding_result AS embedding
FROM ML.GENERATE_EMBEDDING(...)
)
SELECT
base.call_id, base.window_idx, base.ds, base.value,
base.character_name, base.start_time_ms,
1 - distance AS similarity
FROM VECTOR_SEARCH(
TABLE `sesame-prod-426417.analytics.fct_calls_scenario_embeddings`,
'embedding',
(SELECT embedding FROM query_embedding),
top_k => 200,
distance_type => 'COSINE',
options => '{"fraction_leaf_nodes_to_search": 0.05}'
)
The existing handler returns three things from one SQL family:
VECTOR_SEARCHVECTOR_SEARCH — needs every row's distance, not top-K.Options for the stats query:
asyncio.gather). Accelerate only the page query — stats stays the same. Simplest, smallest blast radius. Latency on stats stays where it is today (which is the bottleneck either way, but at least no regression).VECTOR_SEARCH and compute histogram on that. Faster but the histogram becomes "histogram of top-N by ANN", not "histogram of all in range" — a semantic change.My recommendation: do the first one — keep brute-force stats unchanged, accelerate only the page query. The page query is what drives perceived latency; stats can run in parallel and finish whenever.
The hybrid (RRF) path has a dense_top CTE that already takes LIMIT _HYBRID_TOP_K_PER_SIDE from similarities. That maps perfectly to VECTOR_SEARCH(top_k => _HYBRID_TOP_K_PER_SIDE). The lex side is unchanged. The BM25 search index (scenario_value_idx) already exists, so once we add the vector index both sides of RRF are accelerated.
encoded-stage-394013.analytics.scenario_embeddings_ann_test.VECTOR_SEARCH returns sensible top-K with acceptable recall vs brute-force.INFORMATION_SCHEMA.VECTOR_INDEXES until ACTIVE. No code change yet — index is dormant.SCENARIO_SEARCH_ANN flag in mif_admin.py. New code path uses VECTOR_SEARCH for the page (and the dense side of hybrid). Stats query stays brute-force. Default off.Sources: BigQuery — Manage vector indexes · BigQuery — Search embeddings with vector search · INFORMATION_SCHEMA.VECTOR_INDEXES · Google Cloud blog — TreeAH / ScaNN in BigQuery · Intro to vector search