TransWikia.com

How to insert logging information without making the code a mess

Software Engineering Asked on October 29, 2021

When working with some complex algorithms, I would like to have a way to keep track of some information for easy debugging. I just need to see sometimes how things are going, and have some easy way to extract data so I don’t have to spend time through a debugger trying to get the same data.

So far, I’ve been printing stuff to stdout for the development, then removed the printing code once the implementation was deemed ready. Once in a while, I still have to look closer to what happens, and that means quite a lot of time spent in the debugger checking all the places where things may go wrong. (of course, there are assertions, but I need to get some quantitative values to understand some issues).

Logging comes to mind. There are some quite powerful logger libraries out there, and there is plenty of customization in logging levels and filtering. Running a piece of software with logging information is easy, and parsing those logs won’t be too hard for my case. The problem remains, what are good practices of inserting logging information?

Consider the following scenario:

step1() {
      Log.verbose("Starting step1");
      //do stuff
      Log.verbose("Step1 part A finished");
      // do more stuff
      Log.debug("Intermediary status in step1);
}

main() {
      Log.info("Started program. Beginning step1");

      step1();

      Log.info("Step1 Completed. Starting step2");
      Log.debug("Step2 parameters ...");

      step2();

      Log.info("Step 2 completed sucessfully");
      Log.debug("Final status");
}

If relevant, I am working in C++.

6 Answers

Counterintuitively, logging can lower readability

You've stumbled on a key issue here. Logging can inject orthogonal incentives for code readability.

When looking at code in general, the more terse a code snippet is (while still doing the same work), the more complex it is to read. So the incentive is to favor uncondensed code.

While you do want to introduce logging, if a large fraction of your code (text characters) pertains to logging, it starts distracting from the business logic (= non-logging code). In order to not distract from the business logic, the incentive becomes to favor condensed logging code.

But logging code is still code, and can be argued to be part of the business logic, e.g. when the requirements explicitly express the need for logging. So which is it, should we condense logging code or keep it uncondensed?

There is no one-size-fits-all answer here. You need to weigh your logging needs (specificity, verbosity, frequency, log levels, ...) against the impact it will have on the readability of your non-logging code.


Logging tips to minimize readability impact

Don't overdo the logging

You need to be sensitive to the amount of log messages you output. Too few and the logging becomes incomplete and debugging will be a guessing game, too much and the logs grow to massive proportions and debugging will be a jungle expedition, on top of the code also having to deal with this additional logging code. In either case, the benefit of having logs is being undercut and you should try to find the balance.

In your code example, you've overdone your logging, as there are two separate methods both logging that they are "starting step 1". That's twice the log messages you need.

I's arguable which one you should keep. I generally favor keeping the meta-information out of a method itself, and have each method only log its internal logic, not its own purpose. Something along the lines of:

reticulateSplines() {
      Log.verbose("Flogging the plumbus");

      // Plumbus flogging logic

      Log.verbose("Porting the amons");

      // Amon porting logic
}

main() {
      Log.info("Started program");
      Log.info("Reticulating splines");

      step1();

      Log.info("Finished reticulating splines");
      Log.info("Program finished");
}

Avoid "finished X" messages where possible

These messages essentially also double the size of your log, and they usually don't really contribute much to the overall purpose of the log.

Generally speaking, when an error is logged and you're investigating it, you're only interested in the logging above the error. In my above example, if an exception was encountered during the "amon porting" logic, then your log should look something like:

Started program
Reticulating splines
Flogging the plumbus
Porting the amons
ERROR - NullReferenceException has been encountered

If every relevant job is prepended with a log message, then you know for a fact that when an error is logged, it occured during the job which was logged right before the error occurred, in this case "Porting the amons".

If there are nesting concerns here, where you want to be able to see that flogging the plumbus and porting the amons are subtasks of the larger spline reticulating logic, you could do that using the "finished XXX" messages to reveal the nested hierarchy. However, in order to cut down on the log size, it would be better if you used other methods of displaying the hierarchy, e.g. using indentation:

Started program  
  Reticulating splines  
    Flogging the plumbus  
    Porting the amons  
    ERROR - NullReferenceException has been encountered

Depending on your log style (flat file, xml/json file, database, ...), the way in which you reveal that hierarchy can change. But I generally suggest avoiding using "Finished XXX" messages when other ways of showing hierarchy are possible.

Doing so retains the hierarchical information without duplicating all the log messages. This will keep your code and your logs cleaner and more to the point.

Often, log messages denote method boundaries

Note that this only applies to log messages in the style of "Doing X".

When you feel like your logic needs to be logged using multiple "doing X" messages, such as flogging the plumbus and porting the amons, that generally implies that these two tasks are separate jobs, and therefore should be abstracted into methods of their own.

This means helps keeping your actual business code from being distracted by its logging. If your subtasks are abstracted into methods of their own, and the task itself merely consists of the orchestration of its subtasks and the logging of those tasks being done, then you're going to find that the logging isn't really obfuscating the business logic anymore since you've separated them.

reticulateSplines() {
      Log.verbose("Flogging the plumbus");
      flog(myPlumbus);

      Log.verbose("Porting the amons");
      port(amons);
}

flog(Plumbus myPlumbus) {
    // Plumbus flogging logic
}

port(Amon[] amons) {
    // Amon porting logic
}

Note that even if you personally favor putting the "Doing X" log method inside the X method itself, which is a valid approach too, that still means your methods will only contain one of those messages, right at the top. The rest of the method body is then not being distracted by having additional log messages interspersed among its logic.

Understand who your reader is

Tailor your log messages to who will be reading them. If this log is intended to be read and interpreted by an end-user, you're going to need to explain a lot more in the log message than when your reader is a developer.

Try to minimize the log message length while keeping the message content clear to your reader.

Cut down on the fluff

Your code example already does this, but I wanted to explicitly point it out anyway, should you use a different approach/library for logging in the future.

By fluff, I mean the logging code that surrounds the log message itself. Using an example from C# and NLog, this is a bad way of doing things:

LogManager.GetCurrentClassLogger().Log("Reticulating splines", LogLevel.Info);

78 characters total, of which only 20 are the log message itself. That's a lot of fluff. It would be a lot better if we cut down on the fluff:

log.Info("Reticulating splines");

13 characters of fluff, 20 message characters. Much more readable. This of course means that you need to declare and instantiate your log variable, but that can be done in another location, as far away from the business logic as possible, so it minimizes the distraction.

You might think that you could further cut down the size by replacing the message itself with an identifier, and then storing a dictionary of log messages which the logger makes use of. E.g.:

log.Info(538); 

While that is technically possible, it's actually overkill to the point of negating your intention. Now, the code has become more complex, and it's not clear anymore which message is being logged. That means that you've just thrown out the documentative nature of your logging, where it can act as both a logging action and a sort of code comment at the same time.

Furthermore, most IDEs will color code strings, and over time you learn to look at the code while ignoring lines whose color coding shows that it's a string literal. For example:

enter image description here

When I look at this code, my mind instinctively skips over the string literals because it has been conditioned to do so with experience.

This is a variation on banner blindness where people learn to very quickly block out part of their perception when that part has historically been proven to not be interesting to them.

If you remove the string literals, you will generally lose out on the ability to mentally filter out those lines, since they look a lot more like your business code now:

enter image description here

Answered by Flater on October 29, 2021

What you're doing isn't too bad IMHO, but a few tips that may help:

make your data types easy to log

Personally, I provide operator<< streaming operators.

As well as the usual streaming operators for custom classes, you can also provide wrappers for alternative formatting, or formatting things you don't feel you should provide an operator<< for (so other code won't pick it up by accident). For example:

template <typename Container>
struct Stream
{
    Stream(const Container& c) : c_{c} { }

    friend std::ostream& operator<<(std::ostream& os, const Stream& x)
    {
        os << "{ ";
        for (const auto& element : s.c_)
            os << element << ' ';
        return os << '}';
    }
};

Then you can easily stream a container - e.g. std::cout << Stream{my_vector};.

Utilise preprocessor macros

Make use of macros to provide richer logging messages from less verbose logging statements (any third party logging library you pick likely already does this). Specifically:

  • macros can internally utilise __FILE__, __LINE__, __func__ etc. to document where in the source code the logging is generated, without having each call to a logging function pass in this information

  • macros can stringify arguments - which can let them do something like log the source code being passed to the macro, while still running it and returning a value to the surrounding code.

There's the obvious kind of stuff that you'd probably get from any third-party logging library, e.g:

enum Severity { Debug, Info, Warn, Error } g_log_threshold;

#define LOG(SEVERITY, MSG) 
    do { 
        if (SEVERITY < g_log_threshold) break; 
        std::cout << __FILE__ << ':' << __LINE__ << " " #SEVERITY << MSG << 'n'; 
    } while (false)

#define LOG_INFO(MSG) LOG(Info, MSG)

Then there's less common stuff that you can implement atop most libraries if you consider it useful:

#define VARS_HLPA(R, VARS, I, ELEM) 
    BOOST_PP_STRINGIZE(BOOST_PP_SEQ_ELEM(I, VARS)) "=" << ELEM << " "
#define VARS(...) BOOST_PP_SEQ_FOR_EACH_I(VARS_HLPA, 
    BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__), 
    BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) ""

// WARNING: using GCC Statement Expressions extension - not Standard C++
#define LINFO(EXPR, MSG) 
    ({ auto&& result = (CODE); 
       LOG_INFO(#CODE "=" << result << ": " << MSG); 
       result; })

You can then have code that combines useful processing with logging:

auto x = LINFO(calc_volatility(x, y),
               VARS(x, y) << " any extra info");

Obviously it's made the code more cryptic, so it's a matter of judgement (likely based on how important logging is to the potential user(s) thereof, and whether having automatic logging of bits of source code is useful or overly verbose or cryptic). Something to consider anyway.

Answered by Tony on October 29, 2021

First of all, you should always prefer using a logging framework (or at least some kind of abstraction), so that you can switch between different output formats and targets (stdout, files, ...). For very small applications you may continue logging to stdout, as the output can be piped to a file, too, but using a dedicated logging framework is a much more scalable approach.

You should not log every little detail (every statement in code), as this will both clutter your code and the resulting log output. Don't use logging as a poor mans method for debugging:

foo();
Log.info("foo called");
bar();
Log.info("bar called");
...

The actual idea behind logging is to provide context about what is (really) happening in your code. Usually, any log will only be read when something is not working the way it should. Just ask yourself what information may be important in this situation. For some applications this necessary context may be easy to determine (e.g. the content of a request for a web server), for others rather complicated. Consider logs as one tool to find the source of problems in your code (beside debugging, stack traces, tests).

Use a consistent way to log in your code. Your example code already applies to this rule, as any log statement starts with Log.<level>(...). This way you will soon get used to these statements. Log statements that align with the code in terms of naming may even support readibility by giving additional context not only when reading logs (when the code runs) but also when reading the code.

If you think that your code becomes a mess even with legit log statements, you may check for tools that help by hiding log statements from your code. Doing a quick search I found this extension for Visual Studio Code that allows you to hide lines based on a regular expression. Similar functionality may be available for other editors or IDEs.

Answered by Lukas Körfer on October 29, 2021

For these scenarios, I often find it helpful to setup a function to log debug messages:

//some stuff....
debug(“Thing A happened”);

debug(“Doing Thing B”);
// some more stuff...

Then by setting a flag, I can enable/disable the actual outputting of the messages:

const SHOW_DEBUG = true;

function debug(message) {
  if ( ! SHOW_DEBUG )
    return;
  
  // output message
}

This way you end up leaving the debug messages in the code and they tend to act as additional documentation. It’s trivial to enable/disable them. It’s even possible to tie the constant into you build pipeline so that these are always disabled in production (or otherwise built for release).

Answered by Mazaryk on October 29, 2021

One way of hiding things is to make them so extremely obvious that they can be naturally ignored. E.g.

step1() {
  LOG_TO_DEBUG_CONSOLE("### step1 START");
  //do stuff
  LOG_TO_DEBUG_CONSOLE("### step1 PART A");
  // do more stuff
  LOG_TO_DEBUG_CONSOLE("### step1 FINISH");
}

If the lines are visually similar and unlike the real code, it doesn't take long before you don't notice them anymore.

Answered by Ray Butterworth on October 29, 2021

A logging statement in the middle of the code usually signals a good place to split the method. In your example, the "part A" of step 1 should probably be a method call to step1_partA.

If you can keep your methods small and make it clear what they take as arguments and return, then you can reduce the problem to "how do I log method entry and exit points"? This is usually either done manually - with a call to the logger at the beginning and at the end of the method, which is already much cleaner than logs peppered randomly in the code - or using a more advanced solution such as an aspect programming library.

Answered by Maciej Stachowski on October 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP