Logs standards
Definition of logs in The Twelve-Factor App
Treat logs as event streams Logs provide visibility into the behavior of a running app. (...)
Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services. Logs in their raw form are typically a text format with one event per line (though backtraces from exceptions may span multiple lines). Logs have no fixed beginning or end, but flow continuously as long as the app is operating.
Main logging attributes usually are:
- when - timestamp when event occurred (the best option is to have a timestamp of each log entry)
- where - source of the event (could be path to file, where event occurred)
- what - description of the event that occurred (plus type of the event - log level)
- who (who or what caused this event, for example, user or device identifier)
- outcome - success or failure of the event
Examples
Let's take a look how would we usually log something in our application:
import logging
my_logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
my_logger.info(
"Hello Pythonista! Conference name %s, talk name %s, key_id = %s"
% ("EuroPython", "Can we deploy yet?", "1234")
)
We will get the following log:
INFO:__main__:Hello Pythonista! Conference name EuroPython, talk name Can we deploy yet?, key_id = 1234
How can we improve our logging, make it more readable? My favourite python package is Structlog, which brings structure into our logs. I used it in production and it is very easy to build dashboards to monitor logs, as well as check what actually happened and where.
Take a closer look into the following comparison of standard and structured logger:
import structlog
import logging
logger_structlog = structlog.get_logger()
logger_standard = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
logger_structlog.info(
"Hello Pythonista!",
key_id="1234",
conference_name="EuroPython",
talk_name="Can we deploy yet?",
)
logger_standard.info(
"Hello Pythonista! Conference name %s, talk name %s, key_id = %s"
% ("EuroPython", "Can we deploy yet?", "1234")
)
As a result, we will get:
2020-07-19 22:07.23 Hello Pythonista! conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?
INFO:__main__:Hello Pythonista! Conference name EuroPython, talk name Can we deploy yet?, key_id = 1234
First log entry would be from Structlog and the second - from standard logger.
Features of Structlog
Binding
If we have some data, which we want to have in each log message, would be very inconvenient to write the same key-value pairs in every log entry. Structlog offers a very elegant solution to it:
import structlog
logger_structlog = structlog.get_logger(__name__)
logger_structlog = logger_structlog.bind(
key_id="1234", conference_name="EuroPython", talk_name="Can we deploy yet?"
)
try:
raise Exception("Oh, something went wrong...")
except Exception:
logger_structlog.exception("logging exception")
logger_structlog.info("Some info")
Log would look like:
2020-07-19 22:11.07 logging exception conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?
Traceback (most recent call last):
File "my_awesome_code.py", line 9, in <module>
raise Exception("Oh, something went wrong...")
Exception: Oh, something went wrong...
2020-07-19 22:11.07 Some info conference_name=EuroPython key_id=1234 talk_name=Can we deploy yet?
In both log entries we can see the same key-value pairs! Awesome, it really works!
How about formatting logs in json and also adding a custom log level?
No problem with Structlog:
import structlog
import logging
EUROPYTHON = 35
structlog.stdlib.EUROPYTHON = EUROPYTHON
structlog.stdlib._NAME_TO_LEVEL["europython"] = EUROPYTHON
structlog.stdlib._LEVEL_TO_NAME[EUROPYTHON] = "europython"
logging.addLevelName(EUROPYTHON, "europython")
def europython(self, msg, *args, **kw):
return self.log(EUROPYTHON, msg, *args, **kw)
structlog.stdlib._FixedFindCallerLogger.europython = europython
structlog.stdlib.BoundLogger.europython = europython
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.UnicodeDecoder(),
structlog.processors.JSONRenderer(indent=2, sort_keys=True),
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
my_logger = structlog.get_logger("test")
my_logger.europython("Hello Pythonista!")
my_logger.info("Hello Pythonista!")
my_logger.error("Hello Pythonista!")
In this case we will get only 2 log entries, as info would be filtered out:
{
"event": "Hello Pythonista!",
"level": "europython",
"logger": "test",
"timestamp": "2020-07-19T20:14:17.257605Z"
}
{
"event": "Hello Pythonista!",
"level": "error",
"logger": "test",
"timestamp": "2020-07-19T20:14:17.258269Z"
}
More magic with different processors
Let's take a look into one example from official documentation
import datetime
import logging
import sys
import structlog
# This example is taken from https://www.structlog.org/en/stable/index.html
# For more examples and information, check official documentation of structlog
logging.basicConfig(stream=sys.stdout, format="%(message)s", level=logging.INFO)
def add_timestamp(_, __, event_dict):
event_dict["timestamp"] = datetime.datetime.utcnow()
return event_dict
def censor_password(_, __, event_dict):
pw = event_dict.get("password")
if pw:
event_dict["password"] = "*CENSORED*"
return event_dict
log = structlog.wrap_logger(
logging.getLogger(__name__),
processors=[
add_timestamp,
censor_password,
structlog.processors.JSONRenderer(indent=1, sort_keys=True),
],
)
log.warning("something", password="secret")
We will get the following:
{
"event": "something",
"password": "*CENSORED*",
"timestamp": "datetime.datetime(2020, 7, 19, 20, 17, 41, 746100)"
}
Tracing with Structlog
We can bind trace_id
into each log message and trace it across different services if they have the same logging configuration:
import uuid
import structlog
# This example is taken from https://www.structlog.org/en/stable/index.html
# For more examples and information, check official documentation of structlog
logger = structlog.get_logger()
structlog.configure(
processors=[
structlog.processors.KeyValueRenderer(key_order=["event", "trace_id"],),
],
context_class=structlog.threadlocal.wrap_dict(dict),
logger_factory=structlog.stdlib.LoggerFactory(),
)
log = logger.bind(trace_id=str(uuid.uuid4()))
log.error("user logged in", user="test-user")
As a result we will get:
event='user logged in' trace_id='ad97e42d-a681-44f6-9ec6-fe3e00ecd105' user='test-user'