Last active
March 11, 2026 12:17
-
-
Save nutanc/ae0aaf52d19b17a88c11d53a0a8888a1 to your computer and use it in GitHub Desktop.
Creating wisprflow alternative
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Create a Wisprflow alternative project with local asr model and local LLM model which can be tried on a desktop. The flow should be the asr model gets the transcription and gives the context of the open window to the LLM which fixes the transcription and pastes it into the text area with the cursor. | |
| Thought for 8s | |
| Planning Wisprflow Alternative | |
| Researched ASR (faster-whisper), LLM (Ollama), and desktop automation (xdotool/xclip) approaches. Checked existing workspace structure. | |
| Files Edited | |
| Implementation Plan | |
| Task | |
| Progress Updates | |
| 1 | |
| Researching approach and creating implementation plan | |
| 2 | |
| Creating implementation plan and task breakdown | |
| I've drafted the implementation plan for VoiceFlow — a local Wisprflow alternative. Key design decisions: | |
| ASR: faster-whisper (base.en model, ~150MB, runs on CPU/GPU) | |
| LLM: Ollama with llama3.2 (you'll need Ollama installed + model pulled) | |
| Desktop integration: xdotool + xclip for window context detection and text pasting (X11 only) | |
| Hotkey: Ctrl+Shift+Space push-to-talk via pynput | |
| Two things I'd like your input on: | |
| Do you use X11 or Wayland? The current plan uses X11 tools — Wayland needs different tools. | |
| Do you have a preferred Ollama model already installed, or should I default to llama3.2? | |
| Building Cloud Wisprflow for Windows | |
| Built the complete cloud edition at /mnt/d/code/python/open_wispr_flow/cloud/ with 10 files: main.py (orchestrator with push-to-talk hotkey), config.py, recorder.py, transcriber.py (OpenAI Whisper), window_context.py (win32gui), llm_corrector.py (OpenAI GPT), text_paster.py, build_exe.bat, build_exe.py, requirements.txt, and README.md. | |
| Files Edited | |
| Walkthrough | |
| Task | |
| README.md | |
| build_exe.py | |
| build_exe.bat | |
| main.py | |
| Progress Updates | |
| 1 | |
| Creating project files with OpenAI APIs and Windows-compatible active window detection | |
| 2 | |
| Creating transcriber, window context, LLM corrector, text paster, and main orchestrator modules | |
| 3 | |
| Creating main.py orchestrator, README with EXE build steps, and build script | |
| 4 | |
| Updating task checklist and creating walkthrough | |
| Done! The Cloud Edition is at open_wispr_flow/cloud/. Here's how to get going on Windows: | |
| Run from source (PowerShell): | |
| powershell | |
| cd D:\code\python\open_wispr_flow\cloud | |
| pip install -r requirements.txt | |
| $env:OPENAI_API_KEY = "sk-your-key-here" | |
| python main.py | |
| Build EXE: | |
| powershell | |
| pip install pyinstaller | |
| .\build_exe.bat | |
| # or: python build_exe.py | |
| Output: dist\OpenWisprFlow.exe | |
| How it works: | |
| Hold Ctrl+Shift+Space → speak → release → corrected text appears at your cursor | |
| Uses OpenAI Whisper for transcription + GPT-4o-mini for context-aware correction | |
| Detects the active window (VS Code, Chrome, etc.) to improve corrections | |
| Cost: ~$0.007 per dictation (~140 uses per $1) | |
| No huge model downloads needed — just pip install and your API key. | |
| Walkthrough | |
| Open | |
| I am in wsl in windows. How to build exe. | |
| You cannot build a Windows .exe from WSL — PyInstaller builds for the OS it runs on, so WSL would produce a Linux binary. You need to use Windows Python instead. | |
| Here are the steps: | |
| 1. Open a Windows Command Prompt or PowerShell (not WSL) | |
| Press Win+R, type cmd, hit Enter. | |
| 2. Check if Windows Python is installed | |
| cmd | |
| python --version | |
| If not installed, download from python.org/downloads — check "Add to PATH" during install. | |
| 3. Navigate to the project | |
| cmd | |
| cd D:\code\python\open_wispr_flow\cloud | |
| 4. Install dependencies + PyInstaller | |
| cmd | |
| pip install -r requirements.txt | |
| pip install pyinstaller | |
| 5. Build the EXE | |
| cmd | |
| build_exe.bat | |
| Or manually: | |
| cmd | |
| pyinstaller --onefile --name "OpenWisprFlow" --console --hidden-import=win32gui --hidden-import=win32api --hidden-import=win32con --hidden-import=pynput --hidden-import=pynput.keyboard --hidden-import=pynput.keyboard._win32 --hidden-import=pynput._util --hidden-import=pynput._util.win32 --hidden-import=sounddevice --hidden-import=_sounddevice_data main.py | |
| 6. Run it | |
| cmd | |
| set OPENAI_API_KEY=sk-your-key-here | |
| dist\OpenWisprFlow.exe | |
| Key point: The build must happen from Windows Python, not WSL Python. Even for testing before building the EXE, you should run python main.py from a Windows terminal so that win32gui (active window detection) and the hotkey listener work properly. | |
| 'pyinstaller' is not recognized as an internal or external command, | |
| operable program or batch file. | |
| Thought for <1s | |
| PyInstaller installed but isn't on your PATH. Use it as a Python module instead: | |
| cmd | |
| python -m PyInstaller --onefile --name "OpenWisprFlow" --console --hidden-import=win32gui --hidden-import=win32api --hidden-import=win32con --hidden-import=pynput --hidden-import=pynput.keyboard --hidden-import=pynput.keyboard._win32 --hidden-import=pynput._util --hidden-import=pynput._util.win32 --hidden-import=sounddevice --hidden-import=_sounddevice_data main.py | |
| Or if python doesn't work, try | |
| py | |
| : | |
| cmd | |
| py -m PyInstaller --onefile --name "OpenWisprFlow" --console main.py | |
| Tip: If even python -m PyInstaller fails with "No module named PyInstaller", reinstall it: | |
| cmd | |
| python -m pip install pyinstaller | |
| This is working. Can we now make a similar version for macos | |
| macOS support added! The same codebase now works on both platforms. Here's what was added: | |
| New files | |
| requirements_mac.txt — same deps minus pywin32 | |
| build_mac.sh — macOS build script with Darwin-specific imports | |
| Updated files | |
| text_paster.py — now uses Cmd+V on macOS (instead of Ctrl+V) | |
| README.md — added full macOS install, run, and build instructions | |
| macOS Quick Start | |
| bash | |
| cd open_wispr_flow/cloud | |
| pip3 install -r requirements_mac.txt | |
| export OPENAI_API_KEY=sk-your-key-here | |
| python3 main.py | |
| Build macOS binary | |
| bash | |
| chmod +x build_mac.sh | |
| ./build_mac.sh | |
| # Output: dist/OpenWisprFlow | |
| ⚠️ On macOS, you'll need to grant Accessibility, Microphone, and Input Monitoring permissions in System Settings → Privacy & Security. | |
| README.md | |
| Open | |
| Can we make this into a installable package. | |
| Done! Here's what was added for proper platform installers: | |
| Windows Installer (.exe setup wizard) | |
| build_installer_win.bat — one-click: builds EXE → creates installer | |
| installer_win.iss — Inno Setup script with: | |
| API key prompt during install (saved to user env vars) | |
| Start Menu & Desktop shortcuts | |
| Optional "Start on Windows startup" | |
| Proper uninstaller | |
| launcher.bat — auto-prompts for API key on first run if not set | |
| Steps on your Windows machine: | |
| cmd | |
| cd D:\code\python\open_wispr_flow\cloud | |
| REM If you have Inno Setup installed: | |
| build_installer_win.bat | |
| REM If not, download it free from: https://jrsoftware.org/isdl.php | |
| Output: installer_output\OpenWisprFlow_Setup.exe | |
| macOS Installer (.dmg) | |
| build_installer_mac.sh — builds .app bundle + .dmg with: | |
| Native macOS dialog for API key on first launch | |
| Drag-to-Applications layout in DMG | |
| API key saved to ~/.open_wispr_flow/config.json | |
| Steps on a Mac: | |
| bash | |
| chmod +x build_installer_mac.sh | |
| ./build_installer_mac.sh | |
| Output: dist/OpenWisprFlow_Installer.dmg | |
| README.md | |
| Open | |
| Adding File Logging and macOS Noconsole | |
| Added file logging to | |
| main.py | |
| (writes to ~/.open_wispr_flow/app.log). Updated | |
| build_mac.sh | |
| and | |
| build_installer_mac.sh | |
| with --noconsole, all module --add-data entries, and hidden imports for pystray/PIL/tkinter. | |
| Files Edited | |
| requirements_mac.txt | |
| build_installer_mac.sh | |
| build_mac.sh | |
| main.py | |
| Progress Updates | |
| 1 | |
| Adding file logging to main.py and updating mac build scripts to use --noconsole | |
| 2 | |
| Wiring up logging in main() and updating mac build scripts | |
| 3 | |
| Completing final updates to requirements_mac.txt | |
| Done! Both platforms now run without a terminal window: | |
| Changes | |
| build_exe.bat — --console → --noconsole (already done) | |
| build_mac.sh — --console → --noconsole + all new hidden imports for tray/settings | |
| build_installer_mac.sh — added all new module --add-data and hidden imports | |
| requirements_mac.txt — added pystray and Pillow | |
| main.py — in tray mode, all output goes to ~/.open_wispr_flow/app.log (auto-truncated at 1MB) | |
| Rebuild on Windows | |
| cmd | |
| build_exe.bat | |
| Log file location | |
| ~/.open_wispr_flow/app.log | |
| (On Windows: C:\Users\<you>\.open_wispr_flow\app.log) | |
| Thought for 1s | |
| Let me look at your current file to understand the context of your question about the Ctrl+Win key combination. | |
| Analyzed | |
| main.py#L1-217 | |
| Let me also check the tray app to see how hotkeys are currently handled: | |
| Searched | |
| hotkey | |
| 32 results | |
| Searched | |
| ctrl | |
| 34 results | |
| In pynput (the library your project uses for keyboard listening), the Windows key is represented as: | |
| keyboard.Key.cmd or keyboard.Key.cmd_l / keyboard.Key.cmd_r | |
| In pynput, the Windows/Super/Command key is unified under Key.cmd regardless of OS: | |
| Key pynput constant | |
| Left Win/Cmd keyboard.Key.cmd_l | |
| Right Win/Cmd keyboard.Key.cmd_r | |
| Either Win/Cmd keyboard.Key.cmd | |
| So to detect Ctrl + Win, you'd check for something like: | |
| python | |
| if key in (keyboard.Key.cmd_l, keyboard.Key.cmd_r): | |
| win_pressed = True | |
| and combine it with the existing ctrl_pressed check. | |
| I also noticed your | |
| README.md | |
| already mentions the hotkey as Ctrl+CMD(Win) (line 8), but the actual implementation in | |
| tray_app.py | |
| and | |
| main.py | |
| uses Ctrl+Shift+Space. If you want to change the hotkey to Ctrl+Win, you'd replace the shift_pressed / keyboard.Key.shift checks with keyboard.Key.cmd_l / keyboard.Key.cmd_r checks instead. | |
| Would you like me to make that change? | |
| yes | |
| Thought for 6s | |
| Changing Hotkey to Ctrl+Win | |
| Updated hotkey from Ctrl+Shift+Space to Ctrl+Win in | |
| config.py | |
| , | |
| tray_app.py | |
| , | |
| main.py | |
| , and | |
| settings_ui.py | |
| . | |
| Files Edited | |
| build_installer_mac.sh | |
| README.md | |
| settings_ui.py | |
| main.py | |
| tray_app.py | |
| config.py | |
| Progress Updates | |
| 1 | |
| Reading tray_app.py and config.py to understand full scope of changes | |
| 2 | |
| Applying hotkey changes to all files: config.py, tray_app.py, main.py, settings_ui.py, README.md | |
| 3 | |
| Updating README.md and build_installer_mac.sh documentation | |
| Done! The hotkey has been changed from Ctrl+Shift+Space to Ctrl+Win across all files: | |
| File Change | |
| config.py HOTKEY_COMBINATION updated to <ctrl>+<cmd> | |
| tray_app.py Listener now tracks win_pressed via keyboard.Key.cmd_l/cmd_r/cmd instead of shift; triggers on Ctrl+Win press, releases on either key release | |
| main.py Same hotkey logic change for CLI mode | |
| settings_ui.py Label changed to "Ctrl + Win" | |
| README.md Hotkey documentation updated | |
| How it works: Hold Ctrl+Win to start recording, release either key to stop and process. In pynput, the Windows key maps to keyboard.Key.cmd. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment