2.1 KiB
-
Add explicit reindex/backfill tooling. Right now, only future PostTableData / PutTableData calls index rows. There should be an admin/dev command like:
ReindexProfile(profile_name) ReindexTable(profile_name, table_name) ReindexRow(profile_name, table_name, id)
This is the biggest missing piece.
-
Stop using relative ./tantivy_indexes. Both writer and reader depend on the process working directory. Make it config/env-driven, e.g. TANTIVY_INDEX_DIR.
-
Add index schema/version metadata. If you change tokenizers/schema later, old indexes should fail with a clear “index version mismatch, reindex required” instead of behaving strangely.
-
Batch index commits. Current code opens a writer and commits per row. Fine for dev, not great for many inserts. A long-lived writer task batching commits every N docs or every short interval would be more reliable and faster.
-
Make the indexing queue durable. The current mpsc queue is in-memory. If the server crashes after DB insert but before indexing, search is stale. For serious use, store pending index jobs in Postgres, process them, mark done.
-
Index only live rows intentionally. handle_add_or_update currently fetches row by id without checking deleted = false, then search filters deleted rows later. I’d either skip indexing deleted rows or make delete/update semantics explicit.
-
Add typed fields for numbers/dates if you need range queries. Right now numbers are converted to strings. Good for text search, bad for real numeric filtering/sorting. Tantivy can do numeric/date fields, but JSON text fields are not enough for robust range search.
-
Decide column-name strategy. Indexing lowercases raw DB JSON keys. If UI uses display names/aliases, column constraints can miss unless the frontend sends exactly what the index expects. I’d centralize display-name to physical-name mapping before search.
-
Add delete hooks for table/profile deletion. When a table or profile is deleted, the matching Tantivy docs/index directory should be cleaned by code, not manually.