Introduce an expo-sqlite data layer so analysis results survive app restarts, manual/photo cleanup workflows are resumable, and ML indexing work is reused safely when assets or model version change.
- Main analysis flow: app/index.tsx
- Selection/cleanup flow: app/pickone.tsx
- Global in-memory state: contexts/PhotoGroupsContext.tsx
- Embedding pipeline: modules/ImageEmbedding.ts
- Similarity/grouping logic: modules/PhotoSimilarity.ts
Store:
schema_versionml_model_version(manual bump when embedding model changes)last_scan_started_at,last_scan_finished_atlast_cursor_asset_id(resume large scans)- User preferences: similarity threshold/window size/batch size
Store per photo asset:
asset_id(PK, from MediaLibrary identifier)uriwidth,height,creation_time,modification_timesource(library|manual)is_hidden(soft exclusion from UI)is_in_pickone_album(cached boolean for cleanup filtering)updated_at
Purpose: canonical local index and dedupe point for all workflows.
Store:
asset_id(FK -> assets)model_versionvector_blob(Float32Array packed as BLOB)dimensionscomputed_atasset_fingerprint(e.g.,modification_time + width + heighthash)
Purpose: recompute only when asset fingerprint or model version changes.
Store:
run_id(PK)status(running|completed|cancelled|failed)started_at,finished_atparams_json(threshold/window/batch)processed_count,error_count
Purpose: recover/resume long-running scans and provide diagnostics.
Store:
group_id(PK)run_id(FK)group_type(ml_similar|manual_selection)average_similaritycover_asset_idstatus(new|in_review|reviewed|archived)created_at,updated_at
Store:
group_id(FK)asset_id(FK)rankdecision(undecided|keep|delete)decided_at
Purpose: persist selection results from pairwise flow.
Store:
action_id(PK)group_idasset_idaction_type(add_to_pickone_album|restore|skip)action_status(pending|done|failed)error_messagecreated_at,completed_at
Purpose: durable clean-up results/history and retry queue for failures.
Store:
batch_id(PK)created_atsource_label(optional)asset_count
And map imported assets via assets.source='manual' + group_type='manual_selection'.
embeddingsphoto_groupsfrom ML runsis_in_pickone_albumflags (refresh opportunistically)
- Current screen thumbnails and pairwise image decode cache
- In-flight run buffer before DB flush (batch writes)
- Invalidate embedding when
asset_fingerprintchanges orml_model_versionchanges - Mark stale groups when any member embedding is stale
- Purge old completed runs beyond retention (e.g., keep last 10 runs or 30 days)
- Keep manual groups until user archives/deletes
- Add DB bootstrap + migrations and initialize at app startup.
- Add repository helpers (assets, embeddings, groups, cleanup actions, app state).
- Hydrate
PhotoGroupsContextfrom DB on launch (instead of empty memory state). - In app/index.tsx:
- Before embedding: upsert assets metadata
- Reuse cached embeddings where valid
- Persist streaming groups as they are discovered
- Persist run progress cursor every N assets
- In app/pickone.tsx:
- Persist each keep/delete decision to
group_items - Persist cleanup action records when adding to album
- Update action status after native/MediaLibrary success/failure
- On returning to home screen:
- Refresh pickone album membership cache for touched assets
- Filter/mark groups based on persisted cleanup status
- Add: lib/db/sqlite.ts (connection, pragmas)
- Add: lib/db/migrations.ts (migration runner)
- Add: lib/db/schema/001_initial.sql
- Add: lib/db/repos/assetsRepo.ts
- Add: lib/db/repos/embeddingsRepo.ts
- Add: lib/db/repos/groupsRepo.ts
- Add: lib/db/repos/cleanupRepo.ts
- Add: lib/db/repos/appStateRepo.ts
- Update: contexts/PhotoGroupsContext.tsx
- Update: app/index.tsx
- Update: app/pickone.tsx
- Use transactions for batch inserts/updates (assets + embeddings + groups)
- WAL mode + foreign keys on startup pragmas
- Batch sizes: 50-200 rows per transaction depending on device
- Keep vectors as BLOB for compactness; decode lazily only when needed
- Add indexes:
assets(asset_id)embeddings(asset_id, model_version)group_items(group_id)group_items(asset_id)cleanup_actions(action_status, created_at)
- Phase 1: dual-write (context + DB), read from context
- Phase 2: hydrate from DB, context as UI cache
- Phase 3: remove temporary fallback paths after validation
- Add debug screen/logging for run resume, stale embedding count, pending cleanup actions