Skip to content

Instantly share code, notes, and snippets.

@victorherraiz
Last active October 16, 2025 08:25
Show Gist options
  • Select an option

  • Save victorherraiz/a4f29d8565f8d936cd837bdd8f919b6e to your computer and use it in GitHub Desktop.

Select an option

Save victorherraiz/a4f29d8565f8d936cd837bdd8f919b6e to your computer and use it in GitHub Desktop.
Effective Logging

Effective Logging

Licence: Mit Licence

Providing a strongly opinionated perspective derived from external recommendations and personal experience, this document describes how to implement logging effectively. It includes guidance on best practices, configuration, and conventions. This document is targeted at developers who are responsible for the logging of an application.

Introduction

When we discuss logging, we refer to the process of recording information about the execution of a program. This information can be used for debugging or monitoring purposes. The logging actor should be the application itself, not the framework, or the libraries, or the user journeys. The audience should be developers and anyone who could understand and act upon the information logged.

Security

Protect Confidential/Personal Data

Do not log any sensitive customer information (e.g. credentials, confidential/personal data, security tokens). If you must include any sensitive information, use trace level and never enable that level in a production environment, or in any environment that uses actual customer data. Failing to do so, will result in data breaches, GDPR infringements, confidential information will leak in to systems that are not designed to protect or store such data. This is relevant in the frontend and the backend systems.

If you do not completely sure if the data is confidential/personal, do not log it. Some examples:

  • PII (Personally Identifiable Information)

    • Direct identifiers: Full name, social security number / national ID, driver’s license number, Passport number, personal email address, phone number, home address.

    • Indirect or quasi-identifiers (may not identify a person alone, but can if combined with others): date of birth, ZIP/Postal code, gender, employer, GPS location.

    • Special category personal attributes (per GDPR & similar laws): racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, sexual orientation, gender identity

  • PHI (Protected Health Information): medical history, diagnoses, insurance, lab results

  • FCI (Financial Customer Information): bank account numbers, credit card data, investment info.

Do not trust third-party libraries. If you are using a third-party library, do not trust that it will not log sensitive information. Use a safe level (e.g. warn) or disable the logging in the library. Some libraries log sensitive information even at error level (requests, responses…​). Open an issue in the library to fix it and disconnect the logging in the library until it is fixed.

Protect System/Environment Secrets

Do not log any secret information (e.g. passwords, tokens, keys). As in the previous section, if you must do so, use trace level, and never enable that level in a production environment. Failing to do so will result in leaking system secrets, which could lead to data breaches.

Consider Protecting Data Classes

DTOs, Entities, Aggregates could have fields with personal or confidential information. Even if they are not logged we should consider overwriting toString() method to prevent data leakage by mistake.

We could consider three options:

  • Omit

  • Redact

  • Mask

The recommended option is "omit". It is the simplest, most performant, and resistant to refactors.

Simple Java example:

User.java
record User(Integer id, String name, String email, LocalDate birthday) {
    @Override
    public String toString() {
        return User.class.getSimpleName() + " {id=" + id + "}";
    }
}

Using StringJoiner for complex scenarios:

User.java
record User(Integer id, String name, String email, LocalDate birthday) {
    @Override
    public String toString() {
        // Useful when there are many fields
        return new StringJoiner(", ", User.class.getSimpleName() + "{", "}")
                .add("id=" + id)
                .toString();
    }
}

Logging Versus …​

Audit

Do not use logging as an audit replacement. Logging is not a substitute for auditing. Auditing is a separate process that should be implemented differently. Logging is for debugging and monitoring, while auditing is for compliance, security and user activity tracking.

  • They could have different retention policies.

  • They could have different storage systems.

  • They have different access policies: Auditing should be restricted to a few people because it could contain personal and confidential data, while logging should be accessible to a larger number of people.

  • The content is different: Auditing should contain information about user activity, while logging should contain information about system activity.

Traceability

Do not use logging as a traceability replacement. Traceability is a separate process that should be implemented differently. Logging is for debugging and monitoring, while traceability is for tracking user journeys, system interactions, and business processes.

  • They could have different retention policies.

  • They could have different storage systems.

  • Traceability might not have what you are looking for. Usually, and especially in high traffic applications, not all journeys are recorded.

Metrics

Do not use logging as a metrics replacement. Metrics are a separate process that should be implemented differently. Logging is for debugging and monitoring, while metrics are for measuring performance, usage, alerts, and other quantitative aspects of the system.

  • They could have different retention policies.

  • They could have different storage systems: Metrics are usually aggregated and stored in a time-series database, while logging is stored in a log management system.

Performance

Logging can have an impact on performance, and it could lead to an increased cost (e.g. Cloud Services), especially if it is not implemented correctly. It is important to use logging judiciously and to avoid logging too much information. You could design the most performant logging system: async, non-blocking, etc., but if you log too much information, it will choke the system and degrade performance. In the end, every log will behave synchronously, or it will be dropped. Every log message has a cost, normally negligible, but use them wisely.

Log Levels

This section describes what is the purpose of different log levels, for framework-specific documentation see the references section.

  • fatal, critical: Mostly crashes and system failures (e.g. the service fails to start). Not available on every platform, and it usually writes to stderr. After writing at this level, the process usually ends.

  • error: Problems that have an impact on the normal behaviour of the application and could cause problems for customers. Frequently requires immediate action. It should contain as much non-redundant information as possible, but take care not to expose confidential data.

  • warn: Any warning that has no impact on the normal behaviour of the application or customers, but must be amended by the clients or the application itself. It should include the information needed to trace the issue but remain succinct. For example, this level should detect API client validation errors, reverse engineering attempts, circuit breakers open events, retries. Often does not require immediate action.

  • info/config: For system events like initialisations and shutdowns. Use simple sentences. Do not use this level for logging every user interaction, there are other ways to monitor the customer journeys like traceability, audit, metrics, or events. By failing to do so, the log storage systems will be inundated with useless information, which could result in an increase in costs.

  • debug/fine: Any information that helps a developer to understand the flow of the application and debugging issues. Do not include big chunks of data, use trace level instead.

  • trace/finer/finest: Extended information that helps to debug the application, big chucks of data. If trace level is not available, use debug for the same purpose, but do not write personal or confidential data.

An application that behaves normally should produce very few messages in production or even none.

Although, info level is no replacement for other solutions and could have large operational costs, here are some examples of when it makes sense to use info level:

  • Starting Services

    • "Starting service X on port Y"

    • "Email service and waiting for events"

  • Shutting down services:

    • "Shutting down service X"

    • "Email service stopped"

  • Configuration Loaded or Reloaded (or any other configuration that is not sensitive):

    • "Production configuration loaded"

    • "Service started with 5 threads"

  • Administrative actions:

    • "Cache forced eviction by Administrator".

    • "New scheduler task created by Administrator"

  • Non-frequent events that are not errors:

    • "Batch job started"

    • "Batch job finished after processing 1000 records" (if there is an error, consider warn or error level instead)

Following these practices will declutter your log, and make it easier to monitor the application and debug errors and warnings.

Examples

REST API

On controller/adapter logs:

  • Responses with HTTP status 5xx should write at least one error in the logs.

  • Responses with HTTP status 4xx should write warn, debug or trace logs. Logs of type warn should be succinct or avoided for high traffic applications. You could use access logs instead.

  • Responses with HTTP status 2xx or 3xx should write in debug or trace as top level.

Other layers of the application could produce other logs like error, warn, debug or trace.

As you could see, there is no info level.

Libraries/Utilities

If you have control over the library, it should use only debug and trace. Any exception produced inside the library should be thrown and documented, but not logged. The consumer of the library should consider if the exception should be logged or not. The usage and type of the exceptions are out of the scope of this document.

If you do not have control over the library, try to disable logging in the library or use a safe level (e.g. warn).

If the library/framework is used as runtime (e.g. Spring Boot, Express.js, KoaJS), it could log in info level some data, like startups and shutdowns. For example, a web server or a database connection pool.

Environment Configuration

Note
Setting, for example, info means that warn, error and fatal are also enabled.
  • Environments that use personal/confidential data or runs performance tests: Use info (e.g. Production, Friends and Family, Partners acceptance…​)

  • Any other environment: info, debug or trace. Consider costs and enable only relevant loggers. (e.g. Preproduction, Development, QA, Integration, Local)

Some loggers/modules/packages will need adjustments: Instead of info, consider using warn or error for external libraries if they log too much information or sensitive data.

Structured Logging

Structured logging is a way to log information in a structured format, such as JSON. This allows for easier parsing and analysis of log data by a machine. It is especially useful when working with large amounts of log data or when integrating with log management systems.

But structured logging may not be the best option in local environments, where the log is parsed by humans. In this case, it is better to use a human-readable format.

There are many ways to configure your app to log in different formats in different environments.

There are many formats for structured logging like ECS and Logstash, but you could use your own, as long as it is consistent and machine-readable.

Debugging and Post-Mortem Analysis

There is no such thing as a bug-free software. Therefore, every time an error occurs in any environment, it should be logged. Think about the following:

  • What information should be logged to help you to debug the issue? (No personal or confidential data)

  • Is there enough contextual information?

  • Is it clear who is the culprit?

  • Is the level appropriate? (e.g. error or warn)

  • We are adding too much noise to the log?

This is not a one-time task, but a continuous process. Every time you find an issue, you should review the logging configuration and the log messages to ensure that they are appropriate for debugging and post-mortem analysis.

If, for some reason, you have to enable the debug level in production, you should consider the following:

  • If possible, only enable the debug level for the component that is failing, not for the whole application.

  • Ensure that the debug level of those components does not log personal or confidential data.

  • If possible, restrict the traffic that you want to log, to avoid security or performance issues.

Avoiding Error Logging Cascades

Logging too much too often can lead to a situation where the log system becomes overwhelmed, degrading performance, filling up storage, and making it difficult to find relevant information.

This situation could happen when there is an error in a third-party library/dependency, and every request ends up logging the same error over and over again.

One way to avoid this situation and make the application resilient is to use a circuit breaker pattern. This pattern allows you to detect when a service is failing and stop sending requests. This is going to solve not only the logging issue, but also the performance and availability problems that could arise from a failing service. Before writing the log, the code should check whether the circuit breaker is open or not (e.g. normally the exception is different when the circuit is open)

MDC (Mapped Diagnostic Context)

MDC is a mechanism that allows you to add contextual information to log messages as key-value pairs.

Some examples:

  • Actor Information (e.g. user ID, actor type)

  • Current Aggregate ID (e.g. policy ID, account ID)

  • Trace Information (e.g. trace ID, span ID)

As always, do not log personal or confidential data. And remember to clean the MDC after the request is processed; otherwise you could leak information to other requests.

SLF4J and Logback support MDC natively, and you can use it in your application to add contextual information to your log messages:

public void processRequest(String userId, String policyId) {
    // Add contextual information to the MDC
    MDC.put("userId", userId);
    MDC.put("policyId", policyId);
    try {
        // Process the request log will include userId and policyId
        logger.debug("Processing request");
    } finally {
      // Clean up the MDC
      MDC.remove("userId");
      MDC.remove("policyId");
    }
}

Best practices

  • Avoid superfluous string concatenation when possible. Examples:

    • SLF4J: Instead of logger.trace("Number of bananas: " + bananas.length);, do logger.trace("Number of bananas: {}", bananas.length);

  • If an item requires extensive logic or creating resources, use conditional logging or suppliers. Examples:

    • SLF4J: Use logger.isDebugEnabled() of logger.atDebug()

    • Log4j: Use logger.debug(() → "Number of bananas: " + someExpensiveCalculations());

  • Add component information: Almost every log library could create child loggers or add component information to every entry. This is helpful not only to trace back the component, but to enable or disable levels per component. Examples:

    • Logback (SLF4J) level for package: <logger name="chapters.configuration" level="INFO"/>

    • Spring Boot (SLF4J) level for package: logging.level.org.example.application.domain=INFO

  • Do not log errors more than once: If you throw an exception, do not log it, use catch or interception mechanisms (e.g. advisor) to log only once and use a stack to find out the root cause.

  • Banners and logos: We all love a good banner, but they are not useful in production. They are just noise and take space. Disconnect them in by default, especially in production or when structured logging is enabled.

@jabrena
Copy link

jabrena commented May 16, 2025

Hi @victorherraiz,

I like the whole document but I have a discrepancy in this particular point:

Libraries/Utilities Example
If you have control over the library, it should use only debug and trace. Any exception produced inside the library should be propagated, but not logged.

If any library raises a runtime exception, for any reason, the rule about the propagation is not healthy following the modern java approach for the developments. The idea is to mitigate the creation of Custom Business Exceptions as general rule, because Exceptions could be considered as another kind of GOTO.

image

so, if a library throws an exception, log in the origin, maybe with a warn level because the logic was able to solve the exceptional situation and continue the flow in different ways but try to not propagate, it is expensive.

In funcitional environments Either<L, R> is a good solution, maybe Optional< T > is not enough and Result< T > is not available in Popular FP java libraries.

Other notes to review are:

  • "Use parameterized logging instead of string concatenation"
  • "Monitor logging impact on application performance"

finally, the document doesn´t mention:

  • regular logs auditories in the teams to improve the logs (Understand what logs the applications)
  • Open telemetry in the part about structural logging
  • Meanful log messages (Useful messages when exist problems)

@victorherraiz
Copy link
Author

victorherraiz commented May 21, 2025

@jabrena Thank you for your comments, I will try to address all your points.

  • About exceptions: I do not really like exceptions, I prefer the Optional and Result approach, but you are going to lose some valuable information, for example, the stack trace. Filling the stack trace is a costly operation but will be negligible in most of our applications with plenty of parsing and IO. The actual problem is deciding inside the library what is exceptional for the consumer. An HTTP client library could decide to throw an exception when it gets a 500 status, but maybe the application that uses the library is a monitor of the status of several distributed services and a 500 is expected and not exceptional. But that is a subject out of the scope of this document.
  • "Use parameterized logging instead of string concatenation" is already in the best practices
  • "Monitor logging impact on application performance", I will add a section for performance.
  • "regular log audits in the teams to improve the logs", improving the logs could be done reactively or proactively. I will mention that in best practices
  • "Meaningful log messages (Useful messages when exist problems)" Same as above, but is rather difficult to give advice on that. I will mention on that in the best practices

Keep tune for the next revision!!

@jabrena
Copy link

jabrena commented Jun 15, 2025

Another PR:

Guard expensive log calls.
When building verbose messages at DEBUG or TRACE level, especially those involving method calls or complex string concatenations, wrap them in a level check or use suppliers:

if (logger.isDebugEnabled()) {
    logger.debug("Detailed state: {}", computeExpensiveDetails());
}

// using Supplier/Lambda expression
logger.atDebug()
	.setMessage("Detailed state: {}")
	.addArgument(() -> computeExpensiveDetails())
    .log();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment