How to use structured logs in Python


Logs standards

Definition of logs in The Twelve-Factor App

Treat logs as event streams Logs provide visibility into the behavior of a running app. (...)

Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services. Logs in their raw form are typically a text format with one event per line (though backtraces from exceptions may span multiple lines). Logs have no fixed beginning or end, but flow continuously as long as the app is operating.

Main logging attributes usually are:

  • when - timestamp when event occurred (the best option is to have a timestamp of each log entry)
  • where - source of the event (could be path to file, where event occurred)
  • what - description of the event that occurred (plus type of the event - log level)
  • who (who or what caused this event, for example, user or device identifier)
  • outcome - success or failure of the event

Examples

Let's take a look how would we usually log something in our application:

import logging

my_logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

my_logger.info(
    "Hello Pythonista! Conference name %s, talk name %s, key_id = %s"
    % ("EuroPython", "Can we deploy yet?", "1234")
)

We will get the following log:

INFO:__main__:Hello Pythonista! Conference name EuroPython, talk name Can we deploy yet?, key_id = 1234

How can we improve our logging, make it more readable? My favourite python package is Structlog, which brings structure into our logs. I used it in production and it is very easy to build dashboards to monitor logs, as well as check what actually happened and where.

Take a closer look into the following comparison of standard and structured logger:

import structlog
import logging

logger_structlog = structlog.get_logger()
logger_standard = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

logger_structlog.info(
    "Hello Pythonista!",
    key_id="1234",
    conference_name="EuroPython",
    talk_name="Can we deploy yet?",
)
logger_standard.info(
    "Hello Pythonista! Conference name %s, talk name %s, key_id = %s"
    % ("EuroPython", "Can we deploy yet?", "1234")
)

As a result, we will get:

2020-07-19 22:07.23 Hello Pythonista!              conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?

INFO:__main__:Hello Pythonista! Conference name EuroPython, talk name Can we deploy yet?, key_id = 1234

First log entry would be from Structlog and the second - from standard logger.

Features of Structlog

Binding

If we have some data, which we want to have in each log message, would be very inconvenient to write the same key-value pairs in every log entry. Structlog offers a very elegant solution to it:

import structlog

logger_structlog = structlog.get_logger(__name__)

logger_structlog = logger_structlog.bind(
    key_id="1234", conference_name="EuroPython", talk_name="Can we deploy yet?"
)
try:
    raise Exception("Oh, something went wrong...")
except Exception:
    logger_structlog.exception("logging exception")

logger_structlog.info("Some info")

Log would look like:

2020-07-19 22:11.07 logging exception              conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?
Traceback (most recent call last):
  File "my_awesome_code.py", line 9, in <module>
    raise Exception("Oh, something went wrong...")
Exception: Oh, something went wrong...
2020-07-19 22:11.07 Some info                      conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?

In both log entries we can see the same key-value pairs! Awesome, it really works!

How about formatting logs in json and also adding a custom log level?

No problem with Structlog:

import structlog
import logging

EUROPYTHON = 35
structlog.stdlib.EUROPYTHON = EUROPYTHON
structlog.stdlib._NAME_TO_LEVEL["europython"] = EUROPYTHON
structlog.stdlib._LEVEL_TO_NAME[EUROPYTHON] = "europython"
logging.addLevelName(EUROPYTHON, "europython")


def europython(self, msg, *args, **kw):
    return self.log(EUROPYTHON, msg, *args, **kw)


structlog.stdlib._FixedFindCallerLogger.europython = europython
structlog.stdlib.BoundLogger.europython = europython

structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.add_log_level,
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.UnicodeDecoder(),
        structlog.processors.JSONRenderer(indent=2, sort_keys=True),
    ],
    context_class=dict,
    logger_factory=structlog.stdlib.LoggerFactory(),
    wrapper_class=structlog.stdlib.BoundLogger,
    cache_logger_on_first_use=True,
)


my_logger = structlog.get_logger("test")

my_logger.europython("Hello Pythonista!")
my_logger.info("Hello Pythonista!")
my_logger.error("Hello Pythonista!")

In this case we will get only 2 log entries, as info would be filtered out:

{
  "event": "Hello Pythonista!",
  "level": "europython",
  "logger": "test",
  "timestamp": "2020-07-19T20:14:17.257605Z"
}
{
  "event": "Hello Pythonista!",
  "level": "error",
  "logger": "test",
  "timestamp": "2020-07-19T20:14:17.258269Z"
}

More magic with different processors

Let's take a look into one example from official documentation

import datetime
import logging
import sys
import structlog

# This example is taken from https://www.structlog.org/en/stable/index.html
# For more examples and information, check official documentation of structlog

logging.basicConfig(stream=sys.stdout, format="%(message)s", level=logging.INFO)


def add_timestamp(_, __, event_dict):
    event_dict["timestamp"] = datetime.datetime.utcnow()
    return event_dict


def censor_password(_, __, event_dict):
    pw = event_dict.get("password")
    if pw:
        event_dict["password"] = "*CENSORED*"
    return event_dict


log = structlog.wrap_logger(
    logging.getLogger(__name__),
    processors=[
        add_timestamp,
        censor_password,
        structlog.processors.JSONRenderer(indent=1, sort_keys=True),
    ],
)
log.warning("something", password="secret")

We will get the following:

{
 "event": "something",
 "password": "*CENSORED*",
 "timestamp": "datetime.datetime(2020, 7, 19, 20, 17, 41, 746100)"
}

Tracing with Structlog

We can bind trace_id into each log message and trace it across different services if they have the same logging configuration:

import uuid
import structlog

# This example is taken from https://www.structlog.org/en/stable/index.html
# For more examples and information, check official documentation of structlog


logger = structlog.get_logger()

structlog.configure(
    processors=[
        structlog.processors.KeyValueRenderer(key_order=["event", "trace_id"],),
    ],
    context_class=structlog.threadlocal.wrap_dict(dict),
    logger_factory=structlog.stdlib.LoggerFactory(),
)


log = logger.bind(trace_id=str(uuid.uuid4()))
log.error("user logged in", user="test-user")

As a result we will get:

event='user logged in' trace_id='ad97e42d-a681-44f6-9ec6-fe3e00ecd105' user='test-user'

Used sources and books to read

  1. Talk by Markus Holtermann about Logging and Structlog
  2. Structlog official documentation
  3. Secure Logging Standards
  4. Photo by Markus Winkler on Unsplash