Skip to content

Instantly share code, notes, and snippets.

@alik604
Created January 23, 2026 01:19
Show Gist options
  • Select an option

  • Save alik604/4ddbeece3b71ed6c79538ea5a787afb4 to your computer and use it in GitHub Desktop.

Select an option

Save alik604/4ddbeece3b71ed6c79538ea5a787afb4 to your computer and use it in GitHub Desktop.
Python OpenTelemetry Logging Flow - In Depth

OpenTelemetry Logging Flow - In Depth

This document explains the complete journey of a log from your Python application to Grafana through OpenTelemetry and Loki.

l*ink to code: https://github.com/grafana/loki-fundamentals/blob/microservice-otel-collector/greenhouse/loggingfw.py

1. Application Initialization

In your application (e.g., bug_service.py):

from loggingfw import CustomLogFW
import logging

# Create custom logging framework instance
logFW = CustomLogFW(service_name='bug_service', instance_id='1')
handler = logFW.setup_logging()
logging.getLogger().addHandler(handler)

# Later in code...
logging.info(f"Bug triggered in {service_url}")

2. CustomLogFW Setup (loggingfw.py)

Step 2a: Create Resource

self.logger_provider = LoggerProvider(
    resource=Resource.create({
        "service.name": service_name,      # "bug_service"
        "service.instance.id": instance_id  # "1"
    })
)

What this does:

  • Creates metadata that will be attached to EVERY log
  • Resource = attributes that describe the source of telemetry data
  • These become labels in Loki: {service_name="bug_service", service_instance_id="1"}

Step 2b: Set Global Logger Provider

set_logger_provider(self.logger_provider)
  • Registers this as the global OTEL logger provider
  • Any OTEL-aware logging will use this configuration

Step 2c: Create OTLP Exporter

exporter = OTLPLogExporter(
    endpoint="otel-collector:4317",  # gRPC endpoint
    insecure=True                     # No TLS
)

What this does:

  • Creates a gRPC client that will send logs to the OTEL Collector
  • endpoint: DNS name resolves via Docker network to collector container
  • insecure=True: No SSL/TLS encryption (fine for internal Docker network)

Step 2d: Add Batch Processor

self.logger_provider.add_log_record_processor(
    BatchLogRecordProcessor(exporter)
)

BatchLogRecordProcessor behavior:

  • Batches multiple log records together before sending
  • Default settings:
    • max_queue_size: 2048 logs
    • schedule_delay_millis: 5000ms (5 seconds)
    • max_export_batch_size: 512 logs

Batching logic:

IF (queue has 512 logs) OR (5 seconds passed since last export)
THEN export batch to collector

Benefits:

  • Reduces network overhead (one request per batch vs per log)
  • Better throughput
  • Lower CPU usage

Step 2e: Create Logging Handler

handler = LoggingHandler(
    level=logging.NOTSET,
    logger_provider=self.logger_provider
)

LoggingHandler bridges Python logging → OTEL:

  • Intercepts Python logging.info(), logging.error(), etc.
  • Converts Python LogRecord → OTEL LogRecord
  • Passes to LoggerProvider for export

3. Log Emission Flow

logging.info(f"Bug triggered in {service_url}")

Step 3a: Python's logging module

1. Creates LogRecord object with:
   - message: "Bug triggered in http://user_service:5001"
   - level: INFO (20)
   - timestamp: 2026-01-22T14:30:15.123Z
   - logger name: root
   - filename, line number, function name

Step 3b: LoggingHandler.emit()

2. Receives LogRecord
3. Converts to OTEL LogRecord format:
   {
     timestamp: 1737556215123000000,  # nanoseconds
     severity_number: 9,               # INFO = 9 in OTEL spec
     severity_text: "INFO",
     body: "Bug triggered in http://user_service:5001",
     attributes: {
       "code.filepath": "bug_service.py",
       "code.lineno": 34,
       "code.function": "bug_mode_worker"
     },
     resource: {
       "service.name": "bug_service",
       "service.instance.id": "1"
     }
   }

Step 3c: BatchLogRecordProcessor

4. Adds LogRecord to internal queue
5. Waits for batch conditions:
   - Queue size >= 512 OR
   - 5 seconds elapsed
6. When triggered, exports batch

Step 3d: OTLPLogExporter

7. Serializes batch to Protobuf format (OTLP protocol)
8. Sends gRPC request to otel-collector:4317
   
   gRPC call:
   Service: opentelemetry.proto.collector.logs.v1.LogsService
   Method: Export
   Payload: ExportLogsServiceRequest (protobuf)

4. OTEL Collector Reception (otel-config.yaml)

Step 4a: OTLP Receiver

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317  # Listens here

What happens:

  • gRPC server receives ExportLogsServiceRequest
  • Deserializes protobuf → internal data model
  • Passes logs to pipeline

Step 4b: Batch Processor

processors:
  batch:

Default batch processor settings:

  • timeout: 200ms
  • send_batch_size: 8192 records

Why batch again?

  • Collector receives from MULTIPLE services simultaneously
  • Batches all logs together before sending to Loki
  • Further optimization for downstream systems

Step 4c: OTLP HTTP Exporter

exporters:
  otlphttp/logs:
    endpoint: "http://loki:3100/otlp"
    tls:
      insecure: true

Transformation:

  • Collector converts logs to OTLP/HTTP format (JSON or protobuf)
  • Sends HTTP POST to Loki's OTLP endpoint
  • Loki endpoint: /otlp/v1/logs

HTTP Request to Loki:

POST http://loki:3100/otlp/v1/logs HTTP/1.1
Content-Type: application/json

{
  "resourceLogs": [{
    "resource": {
      "attributes": [{
        "key": "service.name",
        "value": {"stringValue": "bug_service"}
      }]
    },
    "scopeLogs": [{
      "logRecords": [{
        "timeUnixNano": "1737556215123000000",
        "severityNumber": 9,
        "severityText": "INFO",
        "body": {"stringValue": "Bug triggered in ..."},
        "attributes": [...]
      }]
    }]
  }]
}

5. Loki Ingestion

Step 5a: OTLP Endpoint

  • Loki receives HTTP POST at /otlp/v1/logs
  • Parses OTLP format

Step 5b: Label Extraction

Loki extracts labels from resource attributes:
- service.name → service_name="bug_service"
- service.instance.id → service_instance_id="1"
- severity → level="INFO"

Step 5c: Log Line Creation

Loki formats log line:
timestamp="2026-01-22T14:30:15.123Z" 
level="INFO" 
service_name="bug_service" 
message="Bug triggered in http://user_service:5001"

Step 5d: Indexing

  • Creates index entry with labels: {service_name="bug_service", level="INFO"}
  • Stores log content in chunks
  • Compresses and writes to storage

6. Grafana Query

Step 6a: User enters LogQL query

{service_name="bug_service"} |= "triggered"

Step 6b: Grafana → Loki HTTP API

GET http://loki:3100/loki/api/v1/query_range?query={service_name="bug_service"}

Step 6c: Loki processes query

  • Uses label index to find matching log streams
  • Applies filters (|= "triggered")
  • Returns results

Step 6d: Grafana displays

  • Renders logs in UI
  • Shows labels, timestamps, log content

Complete Flow Diagram

┌─────────────────────────────────────────────────────────┐
│ 1. Python Application (bug_service.py)                 │
│    logging.info("Bug triggered")                        │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 2. LoggingHandler (OTEL SDK)                           │
│    - Converts Python LogRecord → OTEL LogRecord        │
│    - Adds Resource attributes                          │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 3. BatchLogRecordProcessor                             │
│    - Queues logs (max 2048)                            │
│    - Batches every 5s or 512 logs                      │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼ gRPC (protobuf)
┌─────────────────────────────────────────────────────────┐
│ 4. OTEL Collector (otel-collector:4317)                │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Receiver: OTLP/gRPC                             │ │
│    │ - Deserializes protobuf                         │ │
│    └───────────────┬─────────────────────────────────┘ │
│                    ▼                                    │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Processor: Batch                                │ │
│    │ - Re-batches from multiple sources              │ │
│    └───────────────┬─────────────────────────────────┘ │
│                    ▼                                    │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Exporter: OTLP/HTTP                             │ │
│    │ - Converts to HTTP/JSON                         │ │
│    └───────────────┬─────────────────────────────────┘ │
└────────────────────┼─────────────────────────────────────┘
                     │
                     ▼ HTTP POST
┌─────────────────────────────────────────────────────────┐
│ 5. Loki (loki:3100/otlp)                               │
│    - Extracts labels from OTLP resources               │
│    - Indexes by labels                                  │
│    - Stores log content                                 │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼ LogQL Query
┌─────────────────────────────────────────────────────────┐
│ 6. Grafana (grafana:3000)                              │
│    - Displays logs                                      │
│    - Filters by labels                                  │
└─────────────────────────────────────────────────────────┘

Key Concepts

Why OTEL Collector in the middle?

  1. Decoupling: Apps don't need to know about Loki
  2. Buffering: Collector handles backpressure if Loki is slow
  3. Processing: Can add/modify attributes, sample, filter
  4. Multiple backends: Can send same logs to Loki, Elasticsearch, S3
  5. Protocol translation: Receives gRPC, sends HTTP

Performance characteristics

  • Batching reduces overhead: 1 request per 512 logs instead of 512 requests
  • gRPC efficient: Binary protocol, HTTP/2 multiplexing
  • Async export: Logging doesn't block application code
  • Queue prevents loss: If collector is down, logs queue up (until queue full)

Failure modes

  1. App → Collector fails: Logs queue in BatchProcessor (max 2048), then drop oldest
  2. Collector → Loki fails: Collector buffers, retries with exponential backoff
  3. Collector crashes: App logs are lost (no persistent queue by default)

To improve reliability

  • Add persistent queue in collector
  • Deploy multiple collector instances
  • Monitor queue sizes and export failures

Summary

The flow optimizes for:

  • Performance: Batching at multiple levels reduces network overhead
  • Reliability: Queuing and buffering handle temporary failures
  • Flexibility: Collector can route to multiple backends
  • Standards: OTLP is vendor-neutral, works with many observability tools

The tradeoff is complexity - more moving parts than writing directly to Loki, but better suited for production microservices architectures.

@alik604
Copy link
Author

alik604 commented Jan 23, 2026

note: logs arn't persisted after crash, and there isn't a spill over (buffer/rotating log) for the logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment