Skip to content

Instantly share code, notes, and snippets.

@dalequark
Created December 9, 2025 21:26
Show Gist options
  • Select an option

  • Save dalequark/89640f6f9ead3a941985b255ffc8d9e8 to your computer and use it in GitHub Desktop.

Select an option

Save dalequark/89640f6f9ead3a941985b255ffc8d9e8 to your computer and use it in GitHub Desktop.
project audit of making with ml

Project Audit Report

Here is an assessment of your legacy projects, how out-of-date they are, and what would be required to bring them into the modern AI era.

Summary

Most of these projects rely on libraries and APIs from 2020-2021. In the world of AI and cloud development, this is a significant gap.

  • Frontend/Backend Frameworks: React, Flutter, and Node.js versions are mostly End-of-Life (EOL) or several major versions behind, requiring significant refactoring (e.g., Flutter Null Safety, Firebase v9+ modular SDK).
  • AI Models: Many projects use specialized, task-specific models (AutoML, Universal Sentence Encoder, COCO-SSD). Today, Multimodal LLMs (like Gemini 1.5 or GPT-4o) can often replace these entire pipelines with a single API call.

1. ai_dubs (Video Dubbing & Translation)

Status: ⚠️ Moderately Outdated

  • Tech Stack: Python 3, Google Cloud Speech-to-Text / Text-to-Speech / Translation.
  • Issues:
    • Dependencies in requirements.txt are from early 2021 (e.g., pandas 1.2.0).
    • Uses pydub and moviepy which handle media well but may have API changes in newer versions.
  • Modern AI Upgrade:
    • Then: Chaining three separate APIs (Speech -> Text -> Translate -> Speech).
    • Now: Use a Multimodal Model (like Gemini 1.5 Pro) to ingest the video directly and output a translated script with timestamps, or use dedicated dubbing AI services (like ElevenLabs) that preserve voice characteristics.
  • Verdict: Refactor. The core logic is sound, but replacing the complex API chaining with a modern multimodal pipeline would simplify the code by 50%+.

2. discord_moderator (Toxic Message Filter)

Status: ☠️ Defunct (Major Rewrite Needed)

  • Tech Stack: Node.js, discord.js v12.
  • Issues:
    • Critical: Discord introduced "Slash Commands" and "Intents" in newer API versions. discord.js v12 is incompatible with modern Discord bots.
    • Uses Google's Perspective API, which is still valid but often requires specific access approval.
  • Modern AI Upgrade:
    • Then: Perspective API (classification scores).
    • Now: OpenAI's Moderation API (free tier available) or a small LLM (like Llama 3 8B or Gemma) running locally/cheaply can provide much more nuanced context-aware moderation.
  • Verdict: Abandon & Rewrite. You would spend more time fixing the broken Discord library integration than writing a new bot from scratch using discord.js v14.

3. instafashion (Find Similar Clothes)

Status: ⚠️ Heavily Outdated / Deprecated

  • Tech Stack: Python Notebook, Google Vision Product Search.
  • Issues:
    • Relying on "Vision Product Search" often requires creating and indexing product sets in Google Cloud, a heavy enterprise workflow.
    • Python dependencies are from mid-2020.
  • Modern AI Upgrade:
    • Then: Training specific object detection models on product catalogs.
    • Now: Zero-shot Multimodal AI. You can simply upload an image to GPT-4o or Gemini and ask: "Find me items similar to the outfit in this photo and list search terms for them." No training or indexing required for personal use.
  • Verdict: Abandon. The approach used here is "Enterprise Search" heavy. Modern LLMs solve this "out of the box" for hobbyist/demo purposes.

4. petcam (Object Detection for Pets)

Status: 🟠 Outdated (Frontend & Backend)

  • Tech Stack: React 17, Firebase v8, Node 12 (Cloud Functions).
  • Issues:
    • Firebase v8 uses a namespaced syntax that is completely different from the modern modular v9+ SDK.
    • Node 12 is End-of-Life (security risk).
    • Uses coco-ssd (TensorFlow.js), a basic object detector.
  • Modern AI Upgrade:
    • Then: COCO-SSD (often inaccurate, limited classes).
    • Now: MediaPipe Object Detection or YOLOv8 (via TF.js or ONNX) running in the browser. These are significantly faster and more accurate.
  • Verdict: Updateable. The logic is simple (camera.js). You could swap the AI model for MediaPipe tasks fairly easily, but the Firebase/React upgrade will be tedious work.

5. semantic_ml (Text Similarity Search)

Status: 🟠 Outdated Model

  • Tech Stack: Node.js, TensorFlow.js v2.
  • Issues:
    • Uses the Universal Sentence Encoder (USE). While robust, it's heavy for a browser/Node implementation compared to modern alternatives.
  • Modern AI Upgrade:
    • Then: Universal Sentence Encoder.
    • Now: Embeddings API (OpenAI text-embedding-3) or local transformer models (like Xenova/transformers.js) which run BERT/MiniLM directly in Node.js/Browser with higher accuracy.
  • Verdict: Replace Logic. The code is simple enough that you can keep the project structure but swap the embedding engine for a modern library like langchain.js or transformers.js.

6. sports_ai (Tennis Serve Analysis)

Status: ⚠️ Complex & Deprecated Dependencies

  • Tech Stack: Python Notebook, Google Cloud Video Intelligence, AutoML Vision.
  • Issues:
    • AutoML Vision Object Detection is a "heavy" enterprise tool. Training models via the GUI and calling them via API is expensive and slow for this use case.
  • Modern AI Upgrade:
    • Then: Cloud Video Intelligence for pose + Custom AutoML model for ball tracking.
    • Now: MediaPipe Pose (runs locally on CPU/GPU, free, real-time) + YOLOv8 for ball tracking. You can do this entirely in Python (using OpenCV) without calling paid Cloud APIs.
  • Verdict: Rewrite. Using MediaPipe will make this faster, cheaper, and runnable on your laptop without internet.

7. video_archive (Searchable Video Library)

Status: 🔴 Critical Updates Needed

  • Tech Stack: Flutter v1.x (pre-null safety), Firebase v0.x, Node 10.
  • Issues:
    • Flutter: The Dart language underwent a massive shift to "Null Safety" (version 2.12+). Migrating this codebase requires touching almost every file.
    • Backend: Node 10 is very old. Firebase Functions now require newer Node versions.
  • Modern AI Upgrade:
    • Then: Google Video Intelligence API (Labels, Text, Speech).
    • Now: Gemini 1.5 Pro has a massive context window (1M+ tokens) and is natively multimodal. You can literally upload a 1-hour video and ask: "What time does the baby smile?" or "Find me all clips with a red car" without setting up a complex indexing pipeline with Algolia.
  • Verdict: Abandon / Concept Port. The "Video Archive" concept is powerful, but the tech debt here (Flutter v1 -> v3) is overwhelming. It would be faster to build a new lightweight web app using Next.js + Gemini 1.5 Flash.

Recommendation

If you want to revive one project to impress people with how much AI has changed:

👉 Pick ai_dubs or video_archive but strictly as a "LLM Wrapper" rewrite.

Instead of the complex pipelines of 2020, you can now solve these problems with:

  1. Input: Video File.
  2. Process: Send to Gemini 1.5 Pro (Google) or GPT-4o (OpenAI).
  3. Output: Get perfect transcripts, translations, or search results instantly.

Detailed Modernization Plans

Instafashion Modernization Plan

Goal: Replace the deprecated Google Vision Product Search with a modern, local, open-source multimodal search pipeline.

Tech Stack:

  • Language: Python 3.10+
  • Model: CLIP (Contrastive Language-Image Pre-training) via sentence-transformers or HuggingFace.
  • Database: ChromaDB (Local Vector Database).
  • UI: Streamlit (for rapid prototyping).

Verifiable Steps:

  1. Environment Setup & Dependencies

    • Create a new instafashion/modern directory.
    • Create requirements.txt with torch, transformers, pillow, chromadb, streamlit.
    • Verification: pip install -r requirements.txt runs successfully.
  2. Data Ingestion & Embedding

    • Create ingest.py.
    • Load local images (from instafashion/assets or a sample set).
    • Generate embeddings for each image using a pre-trained CLIP model.
    • Store embeddings in a persistent ChromaDB collection.
    • Verification: Run script. Check ChromaDB collection count matches image count.
  3. Search Logic

    • Create search.py.
    • Implement function to accept a query image or text.
    • Convert query to embedding.
    • Perform nearest neighbor search in ChromaDB.
    • Return paths of matching images.
    • Verification: CLI test. Query with "red dress" or an image file, ensure reasonable results returned.
  4. Interactive UI

    • Create app.py using Streamlit.
    • Interface to upload a file or enter text.
    • Display query image and a grid of result images.
    • Verification: Launch app (streamlit run app.py), upload image, verify visual matches appear.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment