Python Logging with Datadog

At Mergify, we generate a pretty large amount of logs. Every time an event is received from GitHub for a particular pull request, our engine computes a new state for it. Doing so, it logs some informational statements about what it's doing — and any error that might happen.

This information is precious to us. Without proper logging, it'd be utterly impossible for us to debug any issue. As we needed to store and index our logs somewhere, we picked Datadog as our log storage provider.

Datadog offers real-time indexing of our logs. The ability to search our records that fast is compelling as we're able to retrieve log about a GitHub repository or a pull request with a single click.

Python Logging with Datadog
Our custom Datadog log facets

To achieve this result, we had to inject our Python application logs into Datadog. To set up the Python logging mechanism, we rely on daiquiri, a fantastic library I maintained for several years now. Daiquiri leverages the regular Python logging module, making its a no-brainer to set up and offering a few extra features.

We recently added native support for the Datadog agent in daiquiri, making it even more straightforward to log from your Python application.

Enabling log on the Datadog agent

Datadog has extensive documentation on how to configure its agent. This can be summarized to adding logs_enabled: true in your agent configuration. Simple as that.

You then need to create a new source for the agent. The easiest way to connect your application and the Datadog agent is using the TCP socket. Your application will write logs directly to the Datadog agent, which will forward the entries to Datadog backend.

Create a configuration file in conf.d/python.d/conf.yaml with the following content:

init_config:

instances:

logs:
  - type: tcp
    port: 10518
    source: python
    service: <YOUR SERVICE NAME>
    sourcecategory: sourcecode
conf.d/python.d/conf.yaml

Setting up daiquiri

Once this is done, you need to configure your Python application to log to the TCP socket configured in the agent above.

The Datadog agent expects logs in JSON format being sent, which is what daiquiri does for you. Using JSON allows to embed any extra fields to leverage fast search and indexing. As daiquiri provides native handling for extra fields, you'll be able to send those extra fields without trouble.

First, list daiquiri in your application dependency. Then, set up logging in your application this way:

import daiquiri

daiquiri.setup(
  outputs=[
    daiquiri.output.Datadog(),
  ],
  level=logging.INFO,
)

This configuration logs to the default TCP destination localhost:10518 — though you can pass the host and port argument to change that. You can customize the outputs as you wish by checking out daiquiri documentation. For example, you could also include logging to stdout by adding daiquiri.output.Stream(sys.stdout) in the output list.

Using extra

When using daiquiri, you're free to use logging.getLogger to get your regular logging object. However, by using the alternative daiquiri.getLogger function, you're enabling the native use of extra arguments — which is quite handy. That means you can pass any arbitrary key/value to your log call, and see it up being embedded in your log data — up to Datadog.

Here's an example:

import daiquiri

[…]

log = daiquiri.getLogger(__name__)
log.info("User did something important", user=user, request_id=request_id)

The extra keyword argument passed to log.info will be directly shown as attributes in Datadog logs:

Python Logging with Datadog
One of the log line of our Mergify engine

All those attributes can then be used to search or to display custom views. This is really powerful to monitor and debug any kind of service.

Python Logging with Datadog

A log object per object

When passing extra arguments, it is easy to make mistakes and forget some. This especially can happen when your application wants to log information for a particular object.

The best pattern to avoid this is to create a custom log object per object:

import daiquiri

class MyObject:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.log = daiquiri.getLogger("MyObject", x=self.x, y=self.y)

    def do_something(self):
        try:
            self.call_this()
        except Exception:
            self.log.error("Something bad happened")

By using the self.log object as defined above, there's no way for your application to miss some extra fields for an object. All your logs will look in the same style and will end up being indexed correctly in Datadog.

Log Design

The extra arguments from the Python loggers are often dismissed, and many developers stick to logging strings with various information included inside. Having a proper explanation string, plus a few extra key/value pairs that are parsable by machines and humans, is a better way to do logging. Leveraging engines such as Datadog allow to store and query those logs in a snap.

This is way more efficient than trying to parse and grep strings yourselves!