OpenTelemetry Logging Flow - In Depth

This document explains the complete journey of a log from your Python application to Grafana through OpenTelemetry and Loki.

l*ink to code: https://github.com/grafana/loki-fundamentals/blob/microservice-otel-collector/greenhouse/loggingfw.py

1. Application Initialization

In your application (e.g., bug_service.py):

from loggingfw import CustomLogFW
import logging

# Create custom logging framework instance
logFW = CustomLogFW(service_name='bug_service', instance_id='1')
handler = logFW.setup_logging()
logging.getLogger().addHandler(handler)

# Later in code...
logging.info(f"Bug triggered in {service_url}")

2. CustomLogFW Setup (loggingfw.py)

Step 2a: Create Resource

self.logger_provider = LoggerProvider(
    resource=Resource.create({
        "service.name": service_name,      # "bug_service"
        "service.instance.id": instance_id  # "1"
    })
)

What this does:

Creates metadata that will be attached to EVERY log
Resource = attributes that describe the source of telemetry data
These become labels in Loki: {service_name="bug_service", service_instance_id="1"}

Step 2b: Set Global Logger Provider

set_logger_provider(self.logger_provider)

Registers this as the global OTEL logger provider
Any OTEL-aware logging will use this configuration

Step 2c: Create OTLP Exporter

exporter = OTLPLogExporter(
    endpoint="otel-collector:4317",  # gRPC endpoint
    insecure=True                     # No TLS
)

What this does:

Creates a gRPC client that will send logs to the OTEL Collector
endpoint: DNS name resolves via Docker network to collector container
insecure=True: No SSL/TLS encryption (fine for internal Docker network)

Step 2d: Add Batch Processor

self.logger_provider.add_log_record_processor(
    BatchLogRecordProcessor(exporter)
)

BatchLogRecordProcessor behavior:

Batches multiple log records together before sending
Default settings:
- max_queue_size: 2048 logs
- schedule_delay_millis: 5000ms (5 seconds)
- max_export_batch_size: 512 logs

Batching logic:

IF (queue has 512 logs) OR (5 seconds passed since last export)
THEN export batch to collector

Benefits:

Reduces network overhead (one request per batch vs per log)
Better throughput
Lower CPU usage

Step 2e: Create Logging Handler

handler = LoggingHandler(
    level=logging.NOTSET,
    logger_provider=self.logger_provider
)

LoggingHandler bridges Python logging → OTEL:

Intercepts Python logging.info(), logging.error(), etc.
Converts Python LogRecord → OTEL LogRecord
Passes to LoggerProvider for export

3. Log Emission Flow

logging.info(f"Bug triggered in {service_url}")

Step 3a: Python's logging module

1. Creates LogRecord object with:
   - message: "Bug triggered in http://user_service:5001"
   - level: INFO (20)
   - timestamp: 2026-01-22T14:30:15.123Z
   - logger name: root
   - filename, line number, function name

Step 3b: LoggingHandler.emit()

2. Receives LogRecord
3. Converts to OTEL LogRecord format:
   {
     timestamp: 1737556215123000000,  # nanoseconds
     severity_number: 9,               # INFO = 9 in OTEL spec
     severity_text: "INFO",
     body: "Bug triggered in http://user_service:5001",
     attributes: {
       "code.filepath": "bug_service.py",
       "code.lineno": 34,
       "code.function": "bug_mode_worker"
     },
     resource: {
       "service.name": "bug_service",
       "service.instance.id": "1"
     }
   }

Step 3c: BatchLogRecordProcessor

4. Adds LogRecord to internal queue
5. Waits for batch conditions:
   - Queue size >= 512 OR
   - 5 seconds elapsed
6. When triggered, exports batch

Step 3d: OTLPLogExporter

7. Serializes batch to Protobuf format (OTLP protocol)
8. Sends gRPC request to otel-collector:4317
   
   gRPC call:
   Service: opentelemetry.proto.collector.logs.v1.LogsService
   Method: Export
   Payload: ExportLogsServiceRequest (protobuf)

4. OTEL Collector Reception (otel-config.yaml)

Step 4a: OTLP Receiver

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317  # Listens here

What happens:

gRPC server receives ExportLogsServiceRequest
Deserializes protobuf → internal data model
Passes logs to pipeline

Step 4b: Batch Processor

processors:
  batch:

Default batch processor settings:

timeout: 200ms
send_batch_size: 8192 records

Why batch again?

Collector receives from MULTIPLE services simultaneously
Batches all logs together before sending to Loki
Further optimization for downstream systems

Step 4c: OTLP HTTP Exporter

exporters:
  otlphttp/logs:
    endpoint: "http://loki:3100/otlp"
    tls:
      insecure: true

Transformation:

Collector converts logs to OTLP/HTTP format (JSON or protobuf)
Sends HTTP POST to Loki's OTLP endpoint
Loki endpoint: /otlp/v1/logs

HTTP Request to Loki:

POST http://loki:3100/otlp/v1/logs HTTP/1.1
Content-Type: application/json

{
  "resourceLogs": [{
    "resource": {
      "attributes": [{
        "key": "service.name",
        "value": {"stringValue": "bug_service"}
      }]
    },
    "scopeLogs": [{
      "logRecords": [{
        "timeUnixNano": "1737556215123000000",
        "severityNumber": 9,
        "severityText": "INFO",
        "body": {"stringValue": "Bug triggered in ..."},
        "attributes": [...]
      }]
    }]
  }]
}

5. Loki Ingestion

Step 5a: OTLP Endpoint

Loki receives HTTP POST at /otlp/v1/logs
Parses OTLP format

Step 5b: Label Extraction

Loki extracts labels from resource attributes:
- service.name → service_name="bug_service"
- service.instance.id → service_instance_id="1"
- severity → level="INFO"

Step 5c: Log Line Creation

Loki formats log line:
timestamp="2026-01-22T14:30:15.123Z" 
level="INFO" 
service_name="bug_service" 
message="Bug triggered in http://user_service:5001"

Step 5d: Indexing

Creates index entry with labels: {service_name="bug_service", level="INFO"}
Stores log content in chunks
Compresses and writes to storage

6. Grafana Query

Step 6a: User enters LogQL query

{service_name="bug_service"} |= "triggered"

Step 6b: Grafana → Loki HTTP API

GET http://loki:3100/loki/api/v1/query_range?query={service_name="bug_service"}

Step 6c: Loki processes query

Uses label index to find matching log streams
Applies filters (|= "triggered")
Returns results

Step 6d: Grafana displays

Renders logs in UI
Shows labels, timestamps, log content

Complete Flow Diagram

┌─────────────────────────────────────────────────────────┐
│ 1. Python Application (bug_service.py)                 │
│    logging.info("Bug triggered")                        │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 2. LoggingHandler (OTEL SDK)                           │
│    - Converts Python LogRecord → OTEL LogRecord        │
│    - Adds Resource attributes                          │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 3. BatchLogRecordProcessor                             │
│    - Queues logs (max 2048)                            │
│    - Batches every 5s or 512 logs                      │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼ gRPC (protobuf)
┌─────────────────────────────────────────────────────────┐
│ 4. OTEL Collector (otel-collector:4317)                │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Receiver: OTLP/gRPC                             │ │
│    │ - Deserializes protobuf                         │ │
│    └───────────────┬─────────────────────────────────┘ │
│                    ▼                                    │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Processor: Batch                                │ │
│    │ - Re-batches from multiple sources              │ │
│    └───────────────┬─────────────────────────────────┘ │
│                    ▼                                    │
│    ┌─────────────────────────────────────────────────┐ │
│    │ Exporter: OTLP/HTTP                             │ │
│    │ - Converts to HTTP/JSON                         │ │
│    └───────────────┬─────────────────────────────────┘ │
└────────────────────┼─────────────────────────────────────┘
                     │
                     ▼ HTTP POST
┌─────────────────────────────────────────────────────────┐
│ 5. Loki (loki:3100/otlp)                               │
│    - Extracts labels from OTLP resources               │
│    - Indexes by labels                                  │
│    - Stores log content                                 │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼ LogQL Query
┌─────────────────────────────────────────────────────────┐
│ 6. Grafana (grafana:3000)                              │
│    - Displays logs                                      │
│    - Filters by labels                                  │
└─────────────────────────────────────────────────────────┘

Key Concepts

Why OTEL Collector in the middle?

Decoupling: Apps don't need to know about Loki
Buffering: Collector handles backpressure if Loki is slow
Processing: Can add/modify attributes, sample, filter
Multiple backends: Can send same logs to Loki, Elasticsearch, S3
Protocol translation: Receives gRPC, sends HTTP

Performance characteristics

Batching reduces overhead: 1 request per 512 logs instead of 512 requests
gRPC efficient: Binary protocol, HTTP/2 multiplexing
Async export: Logging doesn't block application code
Queue prevents loss: If collector is down, logs queue up (until queue full)

Failure modes

App → Collector fails: Logs queue in BatchProcessor (max 2048), then drop oldest
Collector → Loki fails: Collector buffers, retries with exponential backoff
Collector crashes: App logs are lost (no persistent queue by default)

To improve reliability

Add persistent queue in collector
Deploy multiple collector instances
Monitor queue sizes and export failures

Summary

The flow optimizes for:

Performance: Batching at multiple levels reduces network overhead
Reliability: Queuing and buffering handle temporary failures
Flexibility: Collector can route to multiple backends
Standards: OTLP is vendor-neutral, works with many observability tools

The tradeoff is complexity - more moving parts than writing directly to Loki, but better suited for production microservices architectures.

alik604/otel-logging-flow-explained.md