- Metrics: Datadog collects metrics from your infrastructure, applications, and services. Metrics are time-series data points that help you monitor performance and health.
- Logs: Datadog aggregates logs from your systems, applications, and cloud providers. Logs are essential for troubleshooting and understanding system behavior.
- Traces (APM): Application Performance Monitoring (APM) traces help you track requests across distributed systems, providing insights into latency, errors, and bottlenecks.
- Dashboards: Customizable visualizations of metrics, logs, and traces. Dashboards help you monitor key performance indicators (KPIs) in real-time.
- Monitors: Alerts that notify you when metrics, logs, or traces deviate from expected thresholds or patterns.
- Service Map: A real-time visualization of the relationships between your services and their dependencies.
- Install the Datadog Agent: The Datadog Agent is a lightweight software that collects metrics, logs, and traces from your infrastructure and applications. Supported platforms include Kubernetes, Docker, AWS, Azure, GCP, and on-premises servers.
- Integrations: Datadog offers integrations with over 600 technologies, such as AWS, Kubernetes, PostgreSQL, Redis, and more. Configure integrations to collect data from these sources.
- API Keys: Use API keys to authenticate the Datadog Agent and integrations.
Use tags to organize and filter metrics (e.g., env:prod, region:us-east-1). Monitor key metrics like cpu_usage, memory_usage, disk_usage, and network_bytes_in/out.
Configure log collection by enabling log forwarding in the Datadog Agent. Use log processing pipelines to parse, enrich, and filter logs. Set up log-based alerts for specific patterns or anomalies.
Instrument your application code to send traces to Datadog. Monitor latency, error rates, and throughput for services and endpoints. Use flame graphs and trace search to identify performance bottlenecks.
Visualize dependencies between services and their health status. Identify upstream and downstream impacts of service issues.
Use pre-built templates or create custom dashboards. Add widgets like time-series graphs, heatmaps, and tables.
Use tags to filter and group data in dashboards. Apply global filters (e.g., environment, region) to view specific subsets of data.
Share dashboards with team members or embed them in external tools.
- Metric Monitors: Trigger alerts based on metric thresholds.
- Log Monitors: Trigger alerts based on log patterns or counts.
- APM Monitors: Trigger alerts based on trace metrics like latency or error rates.
- Synthetic Monitors: Test application availability and performance from external locations.
Configure notifications to send alerts to email, Slack, Google Chat, PagerDuty, or other channels. Use escalation policies to ensure critical alerts are addressed promptly.
Link alerts to incidents and track resolution progress.
- Anomaly Detection: Use machine learning to detect anomalies in metrics and logs.
- Forecasting: Predict future trends based on historical data.
- SLOs and SLIs: Define Service Level Objectives (SLOs) and track Service Level Indicators (SLIs) to measure reliability.
- Custom Metrics: Send custom metrics from your applications using Datadog's API or libraries.
- Tagging: Use consistent and meaningful tags across your infrastructure and applications (e.g., env, service, team).
- Granularity: Collect metrics and logs at the appropriate granularity to balance observability and cost.
- Collaboration: Share dashboards, monitors, and insights with your team to foster collaboration.
- Integration with CI/CD: Integrate Datadog with your CI/CD pipelines to monitor deployments and detect regressions.
- Dashboards: Use pre-configured dashboards for common services like Kubernetes, AWS, and databases. Example: EventBus Observability Dashboards.
- Monitors: Set up monitors for critical metrics and logs. Example: Datadog Monitors.