Files
komp_ac/tantivy_todo.md
2026-05-17 13:10:44 +02:00

2.1 KiB
Raw Permalink Blame History

  1. Add explicit reindex/backfill tooling. Right now, only future PostTableData / PutTableData calls index rows. There should be an admin/dev command like:

    ReindexProfile(profile_name) ReindexTable(profile_name, table_name) ReindexRow(profile_name, table_name, id)

    This is the biggest missing piece.

  2. Stop using relative ./tantivy_indexes. Both writer and reader depend on the process working directory. Make it config/env-driven, e.g. TANTIVY_INDEX_DIR.

  3. Add index schema/version metadata. If you change tokenizers/schema later, old indexes should fail with a clear “index version mismatch, reindex required” instead of behaving strangely.

  4. Batch index commits. Current code opens a writer and commits per row. Fine for dev, not great for many inserts. A long-lived writer task batching commits every N docs or every short interval would be more reliable and faster.

  5. Make the indexing queue durable. The current mpsc queue is in-memory. If the server crashes after DB insert but before indexing, search is stale. For serious use, store pending index jobs in Postgres, process them, mark done.

  6. Index only live rows intentionally. handle_add_or_update currently fetches row by id without checking deleted = false, then search filters deleted rows later. Id either skip indexing deleted rows or make delete/update semantics explicit.

  7. Add typed fields for numbers/dates if you need range queries. Right now numbers are converted to strings. Good for text search, bad for real numeric filtering/sorting. Tantivy can do numeric/date fields, but JSON text fields are not enough for robust range search.

  8. Decide column-name strategy. Indexing lowercases raw DB JSON keys. If UI uses display names/aliases, column constraints can miss unless the frontend sends exactly what the index expects. Id centralize display-name to physical-name mapping before search.

  9. Add delete hooks for table/profile deletion. When a table or profile is deleted, the matching Tantivy docs/index directory should be cleaned by code, not manually.