Application logging and how to effectively use it

Shubham Gupta
Ula Engineering
Published in
11 min readAug 26, 2021

--

In the previous article, we have seen why logging is important and we also got to know about common logging levels and how to use them. Please refer here: Application Logging and its importance

In this article, we will be looking more in-depth about what to log, what not to log, how to log, and what are some of the best practices.

Application logs reveal information on both internal and external events that are visible to the application during its runtime. When a bug, security breach, or anomaly is present in software deployment, application logs are the most helpful and reliable evidence to conduct a proper root cause analysis of the incident. Logs play a crucial role in developers’ investigation of a bug or an outage.

Lack of meaningful logs makes troubleshooting time-consuming or even impossible.

I have seen people wrestle with tools such as trace, or deploy new code to production during an outage just to get more logs to diagnose a problem.

It’s time to get serious about logging

Once you’re working on an application that is not running on your desktop, logging messages (including exceptions) are usually your only lifeline to quickly discovering why something in your app isn’t working correctly. Sure, APM tools can alert you to memory leaks and performance bottlenecks, but generally lack enough detail to help you solve a specific problem, i.e. why can’t this user log in, or why isn’t this record processing?
In a real-world production environment, you usually don’t have the luxury of debugging. And so, logging files can be the only thing you have to go off of when attempting to diagnose an issue that’s not easy to reproduce.

What to Log

We now have a definition of what is logging, and some guidelines for how to use each log level. Now, it is time for the big question: “what should I be logging?” This is not an easy question to answer, and it depends very much on the nature of the application which is being developed.

Incoming and outgoing messages

When components communicate with each other via message passing, both incoming and outgoing messages must be recorded with API endpoint URLs, request parameters, request origin and intermediary IPs, request headers, auth info, request and response bodies, business context, timestamps, and internal processing steps.

Service and function invocations

When a service or function is invoked, it is a good practice to log its context in a lower log level, mostly for debugging purposes (use TRACE or DEBUG). Having these logs helps us investigate issues related to business logic easily, especially when we don’t have the privilege of attaching a debugger to our application (e.g. deployments to testing, staging, pre-prod environments).

System events

System events must capture information on behavior events (startups, stops, restarts, security events), changeover modes (cold, warm, hot), inter-service communication (handshakes, connection building statuses — connect, disconnect, reconnect, retry events), service instance IDs, actively serving APIs, actively listening IP and port ranges, configurations loaded (initial load up and dynamic updates), overall service health, and anything that helps to understand the behavior of the system.

Threats and vulnerabilities

Revealing threats and vulnerabilities with the help of application runtime and log messages is an art that every enterprise software developer must learn. In general, security breaches and catastrophic failures in applications do not happen suddenly. Most of the time, there are some clues that no one notices in the first place. Therefore we must always log suspicious human activities as well as the system behaviors. When a suspicious event is noticed, make sure the logs capture all information related to it, ideally a full-stack trace including parameter values and additional information available from the application context.

Errors and warnings with stack trace

This is one of the most important things that we should consider logging. If any error occurred and we throw the exception or return any kind of error, we should log it properly with some context. We also need to identify what is error vs warning and need to log accordingly. We should also consider logging stack trace so that we can reach the exact location where the error is originated easily. We will talk about this in later sections.

What Not to Log

Avoid Logging Sensitive Information

Sensitive information shouldn’t be present in logs or should be masked. Passwords, credit card numbers, social security numbers, access tokens, and so on — all of that may be dangerous if leaked or accessed by those who shouldn’t see that. There are two things you ought to consider.

Avoid Excessive Logging

Another practice to be avoided is logging too much information. This can happen in an attempt to capture all potentially relevant data.

One possible and very real issue with this approach is decreased performance. However, with the evolution of logging libraries, you now have the tools to make this less of a concern.

Too many log messages can also lead to difficulty in reading a log file and identifying the relevant information when a problem does occur.

One way to reduce the number of log lines of code is by logging important information across cross-cutting concerns in the system.

How to log

Use Meaningful and Developer-Friendly Messages

One of the crucial things when it comes to creating logs, yet one of the not-so-easy ones is using meaningful messages. Your log events should include messages that are unique to the given situation, clearly describe them and inform the person reading them. Imagine a communication error occurred in your application.

Let’s take an example:

LOGGER.warn("Communication error");

But you could also create a message like this:

LOGGER.warn("Error while sending document %s to events Elasticsearch server, response code %d, response message %s. The message sending will be retried.", documentId, responseCode, responseMessage);

You can easily see that the first message will just inform the person that there is some error but there is a lack of information and that makes it difficult to debug and find the exact problem and location.

While the second provides enough information to debug the error properly.

Finally, think about the size and verbosity of the message. Don’t log information that is too verbose.

If your log is meaningful then it’s information, if it’s not then it’s merely a waste of resources.

Logging More Details with Diagnostic Contexts

Errors and failures need best-effort inspections. To make that possible, the application must provide the domain experts of the system with sufficient background information and the business and technology contexts. For instance, if a request or message is failing to process, in addition to the error message, it is very helpful to log the failed request body, too.

Contextual information may be vital when trying to diagnose a fault. unfortunately, these vital last bits of information are often missing. Take, for example, the quintessential logging example which we encounter every day.

[2021-06-29 16:25:34] [ERROR ] An error occurred!

Now we are having logs but there is no meaningful message or any context. We know that an error appeared somewhere in the application. But, we don’t know where it happened, we don’t know what kind of error it was, we only know when it happened. Now look at a message with slightly more contextual information(covering )

[2021-07-07 16:25:34] [main] [ERROR ] com.example.logging.ParsingErrorExample - A parsing error occurred for user with id 1234!

The same log record, but a lot more contextual information. We know the thread in which it happened, we know what class the error was generated at. We modified the message as well to include the user that the error happened for, so we can get back to the user if needed. We could also include additional information like diagnostic contexts. Think about what you need and include it.

Some more examples:

Adding context to #6 and #7 with a relevant message here:

Adding just a little bit of context has helped in identifying that what was the error during the search and updating the delivery date. Otherwise, we will just keep searching for the error.

Logging Stack Traces

The stack trace is helpful in reaching the line which is creating the problem. Stack traces provide a snapshot of the program’s active method calls, allowing you to pinpoint your place in the program’s execution.

The above code will result in an exception being thrown and a log message that will be printed to the console with our default configuration will look as follows:

111:42:18.952 ERROR - Error while executing main thread

As you can see there is not a lot of information there. We only know that the problem occurred, but we don’t know where it happened, or what the problem was, etc. Not very informative.

Now, look at the same code with a slightly modified logging statement:

As you can see, this time we’ve included the exception object itself in our log message:

LOGGER.error("Error while executing main thread", ioe);

That would result in the following error log in the console with our default configuration:

ERROR - Error while executing main thread 2java.io.IOException: This is an I/O error 3    at com.example.logging.Log4JException.main(Log4JException.java:13) [main/:?]

It contains relevant information — i.e. the name of the class, the method where the problem occurred, and finally the line number where the problem happened. Of course, in real-life situations, the stack traces will be longer, but you should include them to give you enough information for proper debugging.

Logging Exceptions

When dealing with exceptions and stack traces you shouldn’t only think about the whole stack trace, the lines where the problem appeared, and so on. You should also think about how not to deal with exceptions.

Avoid silently ignoring exceptions. You don’t want to ignore something important. For example, do not do this:

Also, don’t just log an exception and throw it further. That means that you just pushed the problem up the execution stack. Avoid things like this as well:

Make each log message unique across the system

One mistake that most newbie developers make is to copy-paste the same log message in multiple files while letting the final log aggregate fill with similar log lines coming from multiple different parts of the system. When they do that, it is not easy to trace the exact location that triggered the event in the code. If the wording cannot be changed, at least mention the log source with the log message to make the final log lines look different from each other. Also, if the logging functionality is handled by a parent class, make sure you send an identifier during the initialization and use it when writing log messages for child behaviors.

Attach a traceable event ID/token to messages

Usually, when an event/message enters into the system, a unique ID is assigned to that event/message. This ID will be carried forward throughout the next stages of processing, and it helps us trace the flow of an event throughout the system, especially for debugging, concurrency handling, and data-related operations. Ideally, in a single system, there should be only one unique ID for an event throughout all modules and services.

Service : elasticsearch, Detail: 2021-08-25T04:01:56.212Z, id: 87d4s-n98d9-9a8742jsd error: Order does not exists in Elastic search with id: ODI1234567

Use Appropriate Log Level

When writing your application code think twice about a given log message. Not every bit of information is equally important and not every unexpected situation is an error or a critical message. Also, using the logging levels consistently — information of a similar type should be on a similar severity level.

Based on the deployment environment, the active log level also must be changed. The recommended convention for production environments is to print logs up to INFO level. In other environments, logs will be printed up to either DEBUG or TRACE level, according to the level of granularity preferred by the Dev and SysOps teams.

Please refer to what are different levels and how to use them here:

Please refer to what are different levels and how to use them here: Application Logging and its importance

You really need to understand what needs to be logged and in which mode as you really don’t want to pile up huge logs and have no use of them.

Not all exceptions are errors

A commonly observed paradigm is the following:

Every single time an exception is caught, an ERROR level message is sent to the log. The problem with this is that, quite often, exceptions are anticipated or even expected. With the definition given above, ERROR level messages are intended for those where the user experience is affected or degraded in some way. When catching an exception, think twice before logging it as an ERROR …

For example

2021-07-07T14:17:24.392Z error: Failed to Fetch User details from Redis for userId - XYZ112233

this is actually a warning or a debug statement. Simply if the user is not found in the Redis then we will check in the DB and refill the Redis but still, we logged it as the error is wrong. So it should be changed to either warn or debug.

Don’t Log Too Much or Too Little

As developers, we tend to think that everything might be important. On the other hand, we sometimes do the opposite — we don’t add logging where we should or we log only FATAL and ERROR log levels. Both approaches will not do very well. When writing your code and adding logging, think about what will be important to see if the application is working properly and what will be important to be able to diagnose a wrong application state and fix it. Use this as your guiding light to decide what and where to log. Keep in mind that adding too many logs will end up in information fatigue and not having enough information will result in the inability to troubleshoot.

Why not log everything?

The more information we have at our disposal, the easier it is to diagnose a fault, so the question “why not log everything?” does sound like quite a reasonable one.

There are numerous problems with excessive logging:

  • Lines of logging code are like any other, they take time to write and maintain. Excessive logging is costly.
  • If the application is writing 1000’s of lines to the log each second, performance will be affected.
  • Long and detailed log files full of superfluous information can take a long time to analyse, masking the really important information.

The simple fact is, highly detailed log messages regarding each and every method invoked in turn are almost always unnecessary. With the right level of context information available, it is an easy task to re-trace the program execution — fortunately, computer programs are deterministic!

Conclusion

Considering each and every good practice may not be easy to implement right away, especially for applications that are already live and working in production. But if you take the time and roll the suggestions out one after another you will start seeing an increase in the usefulness of your logs.

--

--