| schedule | enabled |
|---|---|
daily |
true |
Transcribe today's meetings from screenpipe audio recordings using Microsoft Azure Speech-to-Text, then upload the transcripts to a centralized location.
- Query screenpipe for all audio recordings from today (full workday: 8am to 6pm)
- For each audio chunk, collect the transcription text, speaker info, and timestamps
- Group consecutive audio chunks into "meetings" — a meeting is a continuous stretch of audio with gaps no longer than 5 minutes
- For each detected meeting, call the Azure Speech-to-Text API to re-transcribe the source audio file using the custom voice model
- Write each meeting transcript to the output directory AND upload to the centralized endpoint
GET http://localhost:3030/search?content_type=audio&start_time=<ISO8601>&end_time=<ISO8601>&limit=200
Extra params: q (keyword), speaker_name, offset (pagination).
Full API reference: https://docs.screenpi.pe/llms-full.txt
Use the Azure Speech REST API to transcribe each meeting's source MP4 file with your custom voice model.
Batch transcription endpoint:
POST https://<REGION>.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions
Authorization: Ocp-Apim-Subscription-Key <AZURE_SPEECH_KEY>
Content-Type: application/json
{
"contentUrls": ["<direct_url_to_audio_file>"],
"locale": "en-US",
"displayName": "Meeting <date> <time>",
"model": {
"self": "https://<REGION>.api.cognitive.microsoft.com/speechtotext/v3.2/models/<CUSTOM_MODEL_ID>"
},
"properties": {
"diarizationEnabled": true,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic"
}
}
If batch transcription is too slow or the audio files are local-only (not accessible via URL), use the real-time REST API instead — read the MP4 file from disk, convert to WAV, and POST to:
POST https://<REGION>.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US
Authorization: Ocp-Apim-Subscription-Key <AZURE_SPEECH_KEY>
Content-Type: audio/wav
The pipe needs these environment variables (set them in screenpipe settings or as system env vars):
| Variable | Description |
|---|---|
AZURE_SPEECH_KEY |
Azure Cognitive Services Speech API key |
AZURE_SPEECH_REGION |
Azure region (e.g., eastus, westeurope) |
AZURE_CUSTOM_MODEL_ID |
Custom voice model ID (optional, omit for default model) |
UPLOAD_ENDPOINT |
URL to POST transcripts to (e.g., your internal API, SharePoint, S3 presigned URL) |
UPLOAD_API_KEY |
Auth token/key for the upload endpoint (optional) |
- Sort all audio results by timestamp
- Walk through chronologically — if the gap between two consecutive chunks is > 5 minutes, start a new meeting
- Each meeting gets: start_time, end_time, list of speakers, source file paths
- Skip meetings shorter than 2 minutes (likely false positives)
For each meeting, produce a JSON transcript and a human-readable markdown file:
JSON (for upload):
{
"meeting_id": "<date>_<start_time>",
"date": "2025-02-24",
"start_time": "10:00:00",
"end_time": "11:15:00",
"duration_minutes": 75,
"speakers": ["Alice", "Bob"],
"source": "azure-speech-custom",
"transcript": [
{ "speaker": "Alice", "time": "10:00:12", "text": "Let's start with the Q2 roadmap..." },
{ "speaker": "Bob", "time": "10:01:05", "text": "I think we should prioritize..." }
],
"summary": "Discussion about Q2 roadmap priorities..."
}Markdown (for output/ dir):
# Meeting Transcript — 2025-02-24 10:00 AM
**Duration:** 1h 15m
**Speakers:** Alice, Bob
## Transcript
**[10:00] Alice:** Let's start with the Q2 roadmap...
**[10:01] Bob:** I think we should prioritize...
## Summary
Brief AI-generated summary of the key points discussed.
## Action Items
- [ ] Item extracted from conversationPOST each meeting JSON to the centralized endpoint:
POST <UPLOAD_ENDPOINT>
Authorization: Bearer <UPLOAD_API_KEY>
Content-Type: application/json
Body: <the meeting JSON above>
If upload fails, save the JSON to ./output/failed/ for retry.
- Process the FULL day — paginate through all results using
offset - Group audio into meetings by time proximity (5-min gap threshold)
- Skip very short audio chunks (< 2 min total) — not real meetings
- Include speaker names when available from screenpipe's speaker diarization
- Generate a brief AI summary for each meeting (2-3 sentences)
- Extract action items mentioned in the conversation
- If Azure API fails, fall back to screenpipe's built-in transcription and note it in the output
- Write all transcripts to
./output/as both.jsonand.mdfiles - Never include raw audio file paths in uploaded data (privacy)
- Redact anything that looks like passwords, API keys, or credentials