Skip to content

Instantly share code, notes, and snippets.

@louis030195
Created February 25, 2026 02:26
Show Gist options
  • Select an option

  • Save louis030195/ac350a6f6b356d5b8718403cdd0779e2 to your computer and use it in GitHub Desktop.

Select an option

Save louis030195/ac350a6f6b356d5b8718403cdd0779e2 to your computer and use it in GitHub Desktop.
Screenpipe pipe: daily meeting transcription with Azure Speech-to-Text + centralized upload
schedule enabled
daily
true

Transcribe today's meetings from screenpipe audio recordings using Microsoft Azure Speech-to-Text, then upload the transcripts to a centralized location.

Task

  1. Query screenpipe for all audio recordings from today (full workday: 8am to 6pm)
  2. For each audio chunk, collect the transcription text, speaker info, and timestamps
  3. Group consecutive audio chunks into "meetings" — a meeting is a continuous stretch of audio with gaps no longer than 5 minutes
  4. For each detected meeting, call the Azure Speech-to-Text API to re-transcribe the source audio file using the custom voice model
  5. Write each meeting transcript to the output directory AND upload to the centralized endpoint

Search API

GET http://localhost:3030/search?content_type=audio&start_time=<ISO8601>&end_time=<ISO8601>&limit=200

Extra params: q (keyword), speaker_name, offset (pagination).

Full API reference: https://docs.screenpi.pe/llms-full.txt

Azure Speech-to-Text

Use the Azure Speech REST API to transcribe each meeting's source MP4 file with your custom voice model.

Batch transcription endpoint:

POST https://<REGION>.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions
Authorization: Ocp-Apim-Subscription-Key <AZURE_SPEECH_KEY>
Content-Type: application/json

{
  "contentUrls": ["<direct_url_to_audio_file>"],
  "locale": "en-US",
  "displayName": "Meeting <date> <time>",
  "model": {
    "self": "https://<REGION>.api.cognitive.microsoft.com/speechtotext/v3.2/models/<CUSTOM_MODEL_ID>"
  },
  "properties": {
    "diarizationEnabled": true,
    "wordLevelTimestampsEnabled": true,
    "punctuationMode": "DictatedAndAutomatic"
  }
}

If batch transcription is too slow or the audio files are local-only (not accessible via URL), use the real-time REST API instead — read the MP4 file from disk, convert to WAV, and POST to:

POST https://<REGION>.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US
Authorization: Ocp-Apim-Subscription-Key <AZURE_SPEECH_KEY>
Content-Type: audio/wav

Configuration

The pipe needs these environment variables (set them in screenpipe settings or as system env vars):

Variable Description
AZURE_SPEECH_KEY Azure Cognitive Services Speech API key
AZURE_SPEECH_REGION Azure region (e.g., eastus, westeurope)
AZURE_CUSTOM_MODEL_ID Custom voice model ID (optional, omit for default model)
UPLOAD_ENDPOINT URL to POST transcripts to (e.g., your internal API, SharePoint, S3 presigned URL)
UPLOAD_API_KEY Auth token/key for the upload endpoint (optional)

Meeting Detection Logic

  1. Sort all audio results by timestamp
  2. Walk through chronologically — if the gap between two consecutive chunks is > 5 minutes, start a new meeting
  3. Each meeting gets: start_time, end_time, list of speakers, source file paths
  4. Skip meetings shorter than 2 minutes (likely false positives)

Output Format

For each meeting, produce a JSON transcript and a human-readable markdown file:

JSON (for upload):

{
  "meeting_id": "<date>_<start_time>",
  "date": "2025-02-24",
  "start_time": "10:00:00",
  "end_time": "11:15:00",
  "duration_minutes": 75,
  "speakers": ["Alice", "Bob"],
  "source": "azure-speech-custom",
  "transcript": [
    { "speaker": "Alice", "time": "10:00:12", "text": "Let's start with the Q2 roadmap..." },
    { "speaker": "Bob", "time": "10:01:05", "text": "I think we should prioritize..." }
  ],
  "summary": "Discussion about Q2 roadmap priorities..."
}

Markdown (for output/ dir):

# Meeting Transcript — 2025-02-24 10:00 AM

**Duration:** 1h 15m
**Speakers:** Alice, Bob

## Transcript

**[10:00] Alice:** Let's start with the Q2 roadmap...

**[10:01] Bob:** I think we should prioritize...

## Summary

Brief AI-generated summary of the key points discussed.

## Action Items

- [ ] Item extracted from conversation

Upload

POST each meeting JSON to the centralized endpoint:

POST <UPLOAD_ENDPOINT>
Authorization: Bearer <UPLOAD_API_KEY>
Content-Type: application/json
Body: <the meeting JSON above>

If upload fails, save the JSON to ./output/failed/ for retry.

Rules

  • Process the FULL day — paginate through all results using offset
  • Group audio into meetings by time proximity (5-min gap threshold)
  • Skip very short audio chunks (< 2 min total) — not real meetings
  • Include speaker names when available from screenpipe's speaker diarization
  • Generate a brief AI summary for each meeting (2-3 sentences)
  • Extract action items mentioned in the conversation
  • If Azure API fails, fall back to screenpipe's built-in transcription and note it in the output
  • Write all transcripts to ./output/ as both .json and .md files
  • Never include raw audio file paths in uploaded data (privacy)
  • Redact anything that looks like passwords, API keys, or credentials
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment