Skip to content

Instantly share code, notes, and snippets.

@kirillshevch
Created February 28, 2026 15:34
Show Gist options
  • Select an option

  • Save kirillshevch/5a0dd592538e01f04afae80d4f976f68 to your computer and use it in GitHub Desktop.

Select an option

Save kirillshevch/5a0dd592538e01f04afae80d4f976f68 to your computer and use it in GitHub Desktop.

Expo SQLite Storage And Cache Plan

Goal

Introduce an expo-sqlite data layer so analysis results survive app restarts, manual/photo cleanup workflows are resumable, and ML indexing work is reused safely when assets or model version change.

Scope Anchors In Current Code

Data Model (What To Persist)

1) app_state (small key-value config/state)

Store:

  • schema_version
  • ml_model_version (manual bump when embedding model changes)
  • last_scan_started_at, last_scan_finished_at
  • last_cursor_asset_id (resume large scans)
  • User preferences: similarity threshold/window size/batch size

2) assets (normalized photo index)

Store per photo asset:

  • asset_id (PK, from MediaLibrary identifier)
  • uri
  • width, height, creation_time, modification_time
  • source (library | manual)
  • is_hidden (soft exclusion from UI)
  • is_in_pickone_album (cached boolean for cleanup filtering)
  • updated_at

Purpose: canonical local index and dedupe point for all workflows.

3) embeddings (ML index cache)

Store:

  • asset_id (FK -> assets)
  • model_version
  • vector_blob (Float32Array packed as BLOB)
  • dimensions
  • computed_at
  • asset_fingerprint (e.g., modification_time + width + height hash)

Purpose: recompute only when asset fingerprint or model version changes.

4) analysis_runs (scan jobs/session)

Store:

  • run_id (PK)
  • status (running | completed | cancelled | failed)
  • started_at, finished_at
  • params_json (threshold/window/batch)
  • processed_count, error_count

Purpose: recover/resume long-running scans and provide diagnostics.

5) photo_groups (grouping output)

Store:

  • group_id (PK)
  • run_id (FK)
  • group_type (ml_similar | manual_selection)
  • average_similarity
  • cover_asset_id
  • status (new | in_review | reviewed | archived)
  • created_at, updated_at

6) group_items (membership + decision)

Store:

  • group_id (FK)
  • asset_id (FK)
  • rank
  • decision (undecided | keep | delete)
  • decided_at

Purpose: persist selection results from pairwise flow.

7) cleanup_actions (what user did to clean)

Store:

  • action_id (PK)
  • group_id
  • asset_id
  • action_type (add_to_pickone_album | restore | skip)
  • action_status (pending | done | failed)
  • error_message
  • created_at, completed_at

Purpose: durable clean-up results/history and retry queue for failures.

8) manual_import_batches (manually added photos)

Store:

  • batch_id (PK)
  • created_at
  • source_label (optional)
  • asset_count

And map imported assets via assets.source='manual' + group_type='manual_selection'.

Cache Policy (What To Cache, Not Treat As Source Of Truth)

DB-backed cache (survives restart, safe to regenerate)

  • embeddings
  • photo_groups from ML runs
  • is_in_pickone_album flags (refresh opportunistically)

Memory-only cache (fast UI)

  • Current screen thumbnails and pairwise image decode cache
  • In-flight run buffer before DB flush (batch writes)

Eviction/Invalidation

  • Invalidate embedding when asset_fingerprint changes or ml_model_version changes
  • Mark stale groups when any member embedding is stale
  • Purge old completed runs beyond retention (e.g., keep last 10 runs or 30 days)
  • Keep manual groups until user archives/deletes

Integration Plan In Existing App

  1. Add DB bootstrap + migrations and initialize at app startup.
  2. Add repository helpers (assets, embeddings, groups, cleanup actions, app state).
  3. Hydrate PhotoGroupsContext from DB on launch (instead of empty memory state).
  4. In app/index.tsx:
  • Before embedding: upsert assets metadata
  • Reuse cached embeddings where valid
  • Persist streaming groups as they are discovered
  • Persist run progress cursor every N assets
  1. In app/pickone.tsx:
  • Persist each keep/delete decision to group_items
  • Persist cleanup action records when adding to album
  • Update action status after native/MediaLibrary success/failure
  1. On returning to home screen:
  • Refresh pickone album membership cache for touched assets
  • Filter/mark groups based on persisted cleanup status

Proposed File Additions/Changes

Write Strategy And Performance

  • Use transactions for batch inserts/updates (assets + embeddings + groups)
  • WAL mode + foreign keys on startup pragmas
  • Batch sizes: 50-200 rows per transaction depending on device
  • Keep vectors as BLOB for compactness; decode lazily only when needed
  • Add indexes:
    • assets(asset_id)
    • embeddings(asset_id, model_version)
    • group_items(group_id)
    • group_items(asset_id)
    • cleanup_actions(action_status, created_at)

Rollout Safety

  • Phase 1: dual-write (context + DB), read from context
  • Phase 2: hydrate from DB, context as UI cache
  • Phase 3: remove temporary fallback paths after validation
  • Add debug screen/logging for run resume, stale embedding count, pending cleanup actions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment