Files
komp_ac/tantivy_todo.md
2026-05-17 13:10:44 +02:00

35 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

1. Add explicit reindex/backfill tooling.
Right now, only future PostTableData / PutTableData calls index rows. There should be an admin/dev command like:
ReindexProfile(profile_name)
ReindexTable(profile_name, table_name)
ReindexRow(profile_name, table_name, id)
This is the biggest missing piece.
2. Stop using relative ./tantivy_indexes.
Both writer and reader depend on the process working directory. Make it config/env-driven, e.g.
TANTIVY_INDEX_DIR.
3. Add index schema/version metadata.
If you change tokenizers/schema later, old indexes should fail with a clear “index version mismatch, reindex
required” instead of behaving strangely.
4. Batch index commits.
Current code opens a writer and commits per row. Fine for dev, not great for many inserts. A long-lived writer
task batching commits every N docs or every short interval would be more reliable and faster.
5. Make the indexing queue durable.
The current mpsc queue is in-memory. If the server crashes after DB insert but before indexing, search is stale.
For serious use, store pending index jobs in Postgres, process them, mark done.
6. Index only live rows intentionally.
handle_add_or_update currently fetches row by id without checking deleted = false, then search filters deleted
rows later. Id either skip indexing deleted rows or make delete/update semantics explicit.
7. Add typed fields for numbers/dates if you need range queries.
Right now numbers are converted to strings. Good for text search, bad for real numeric filtering/sorting. Tantivy
can do numeric/date fields, but JSON text fields are not enough for robust range search.
8. Decide column-name strategy.
Indexing lowercases raw DB JSON keys. If UI uses display names/aliases, column constraints can miss unless the
frontend sends exactly what the index expects. Id centralize display-name to physical-name mapping before
search.
9. Add delete hooks for table/profile deletion.
When a table or profile is deleted, the matching Tantivy docs/index directory should be cleaned by code, not
manually.