This document explains the complete journey of a log from your Python application to Grafana through OpenTelemetry and Loki.
l*ink to code: https://github.com/grafana/loki-fundamentals/blob/microservice-otel-collector/greenhouse/loggingfw.py
In your application (e.g., bug_service.py):
from loggingfw import CustomLogFW
import logging
# Create custom logging framework instance
logFW = CustomLogFW(service_name='bug_service', instance_id='1')
handler = logFW.setup_logging()
logging.getLogger().addHandler(handler)
# Later in code...
logging.info(f"Bug triggered in {service_url}")self.logger_provider = LoggerProvider(
resource=Resource.create({
"service.name": service_name, # "bug_service"
"service.instance.id": instance_id # "1"
})
)What this does:
- Creates metadata that will be attached to EVERY log
Resource= attributes that describe the source of telemetry data- These become labels in Loki:
{service_name="bug_service", service_instance_id="1"}
set_logger_provider(self.logger_provider)- Registers this as the global OTEL logger provider
- Any OTEL-aware logging will use this configuration
exporter = OTLPLogExporter(
endpoint="otel-collector:4317", # gRPC endpoint
insecure=True # No TLS
)What this does:
- Creates a gRPC client that will send logs to the OTEL Collector
endpoint: DNS name resolves via Docker network to collector containerinsecure=True: No SSL/TLS encryption (fine for internal Docker network)
self.logger_provider.add_log_record_processor(
BatchLogRecordProcessor(exporter)
)BatchLogRecordProcessor behavior:
- Batches multiple log records together before sending
- Default settings:
max_queue_size: 2048 logsschedule_delay_millis: 5000ms (5 seconds)max_export_batch_size: 512 logs
Batching logic:
IF (queue has 512 logs) OR (5 seconds passed since last export)
THEN export batch to collector
Benefits:
- Reduces network overhead (one request per batch vs per log)
- Better throughput
- Lower CPU usage
handler = LoggingHandler(
level=logging.NOTSET,
logger_provider=self.logger_provider
)LoggingHandler bridges Python logging → OTEL:
- Intercepts Python
logging.info(),logging.error(), etc. - Converts Python LogRecord → OTEL LogRecord
- Passes to LoggerProvider for export
logging.info(f"Bug triggered in {service_url}")1. Creates LogRecord object with:
- message: "Bug triggered in http://user_service:5001"
- level: INFO (20)
- timestamp: 2026-01-22T14:30:15.123Z
- logger name: root
- filename, line number, function name
2. Receives LogRecord
3. Converts to OTEL LogRecord format:
{
timestamp: 1737556215123000000, # nanoseconds
severity_number: 9, # INFO = 9 in OTEL spec
severity_text: "INFO",
body: "Bug triggered in http://user_service:5001",
attributes: {
"code.filepath": "bug_service.py",
"code.lineno": 34,
"code.function": "bug_mode_worker"
},
resource: {
"service.name": "bug_service",
"service.instance.id": "1"
}
}
4. Adds LogRecord to internal queue
5. Waits for batch conditions:
- Queue size >= 512 OR
- 5 seconds elapsed
6. When triggered, exports batch
7. Serializes batch to Protobuf format (OTLP protocol)
8. Sends gRPC request to otel-collector:4317
gRPC call:
Service: opentelemetry.proto.collector.logs.v1.LogsService
Method: Export
Payload: ExportLogsServiceRequest (protobuf)
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # Listens hereWhat happens:
- gRPC server receives ExportLogsServiceRequest
- Deserializes protobuf → internal data model
- Passes logs to pipeline
processors:
batch:Default batch processor settings:
timeout: 200mssend_batch_size: 8192 records
Why batch again?
- Collector receives from MULTIPLE services simultaneously
- Batches all logs together before sending to Loki
- Further optimization for downstream systems
exporters:
otlphttp/logs:
endpoint: "http://loki:3100/otlp"
tls:
insecure: trueTransformation:
- Collector converts logs to OTLP/HTTP format (JSON or protobuf)
- Sends HTTP POST to Loki's OTLP endpoint
- Loki endpoint:
/otlp/v1/logs
HTTP Request to Loki:
POST http://loki:3100/otlp/v1/logs HTTP/1.1
Content-Type: application/json
{
"resourceLogs": [{
"resource": {
"attributes": [{
"key": "service.name",
"value": {"stringValue": "bug_service"}
}]
},
"scopeLogs": [{
"logRecords": [{
"timeUnixNano": "1737556215123000000",
"severityNumber": 9,
"severityText": "INFO",
"body": {"stringValue": "Bug triggered in ..."},
"attributes": [...]
}]
}]
}]
}- Loki receives HTTP POST at
/otlp/v1/logs - Parses OTLP format
Loki extracts labels from resource attributes:
- service.name → service_name="bug_service"
- service.instance.id → service_instance_id="1"
- severity → level="INFO"
Loki formats log line:
timestamp="2026-01-22T14:30:15.123Z"
level="INFO"
service_name="bug_service"
message="Bug triggered in http://user_service:5001"
- Creates index entry with labels:
{service_name="bug_service", level="INFO"} - Stores log content in chunks
- Compresses and writes to storage
{service_name="bug_service"} |= "triggered"
GET http://loki:3100/loki/api/v1/query_range?query={service_name="bug_service"}- Uses label index to find matching log streams
- Applies filters (
|= "triggered") - Returns results
- Renders logs in UI
- Shows labels, timestamps, log content
┌─────────────────────────────────────────────────────────┐
│ 1. Python Application (bug_service.py) │
│ logging.info("Bug triggered") │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. LoggingHandler (OTEL SDK) │
│ - Converts Python LogRecord → OTEL LogRecord │
│ - Adds Resource attributes │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 3. BatchLogRecordProcessor │
│ - Queues logs (max 2048) │
│ - Batches every 5s or 512 logs │
└────────────────────┬────────────────────────────────────┘
│
▼ gRPC (protobuf)
┌─────────────────────────────────────────────────────────┐
│ 4. OTEL Collector (otel-collector:4317) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Receiver: OTLP/gRPC │ │
│ │ - Deserializes protobuf │ │
│ └───────────────┬─────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Processor: Batch │ │
│ │ - Re-batches from multiple sources │ │
│ └───────────────┬─────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Exporter: OTLP/HTTP │ │
│ │ - Converts to HTTP/JSON │ │
│ └───────────────┬─────────────────────────────────┘ │
└────────────────────┼─────────────────────────────────────┘
│
▼ HTTP POST
┌─────────────────────────────────────────────────────────┐
│ 5. Loki (loki:3100/otlp) │
│ - Extracts labels from OTLP resources │
│ - Indexes by labels │
│ - Stores log content │
└────────────────────┬────────────────────────────────────┘
│
▼ LogQL Query
┌─────────────────────────────────────────────────────────┐
│ 6. Grafana (grafana:3000) │
│ - Displays logs │
│ - Filters by labels │
└─────────────────────────────────────────────────────────┘
- Decoupling: Apps don't need to know about Loki
- Buffering: Collector handles backpressure if Loki is slow
- Processing: Can add/modify attributes, sample, filter
- Multiple backends: Can send same logs to Loki, Elasticsearch, S3
- Protocol translation: Receives gRPC, sends HTTP
- Batching reduces overhead: 1 request per 512 logs instead of 512 requests
- gRPC efficient: Binary protocol, HTTP/2 multiplexing
- Async export: Logging doesn't block application code
- Queue prevents loss: If collector is down, logs queue up (until queue full)
- App → Collector fails: Logs queue in BatchProcessor (max 2048), then drop oldest
- Collector → Loki fails: Collector buffers, retries with exponential backoff
- Collector crashes: App logs are lost (no persistent queue by default)
- Add persistent queue in collector
- Deploy multiple collector instances
- Monitor queue sizes and export failures
The flow optimizes for:
- Performance: Batching at multiple levels reduces network overhead
- Reliability: Queuing and buffering handle temporary failures
- Flexibility: Collector can route to multiple backends
- Standards: OTLP is vendor-neutral, works with many observability tools
The tradeoff is complexity - more moving parts than writing directly to Loki, but better suited for production microservices architectures.
note: logs arn't persisted after crash, and there isn't a spill over (buffer/rotating log) for the logging