How to track S3 with matomo?

Over the years, I’ve struggled to properly track static content. We host millions of PDFs, and what happens is that we log clicks from our website, but our docs sometimes go viral and when that happens, we miss the clicks from other websites. We eventually fixed that by creating the decorator below, which tracks things in Matomo when they are served by Django:

def track_in_matomo(
    original_func: Callable = None,
    timeout: float = 0.5,
    check_bots: bool = True,
) -> Callable:
    """A decorator to track a request in Matomo.

    This decorator is needed on static assets that we want to track because
    those assets can only be tracked by client-side Matomo if they are accessed
    by a user clicking a link in CourtListener itself. If, for example,
    somebody shares a link or otherwise clicks it outside of CourtListener, we
    don't have an opportunity to run our client-side code on that item, and we
    won't be able to track it.

    The code here wraps a view so that when somebody accesses something like a
    PDF from an external site (and only from an external site), we track that
    properly. If people have a CourtListener referer, we ignore them under the
    assumption that they got tracked client-side.

    For the design pattern, see: https://stackoverflow.com/a/24617244/64911

    :param original_func: The function that we're wrapping.
    :param timeout: The amount of time the Matomo tracking request has to
    respond. If it does not respond in this amount of time, we time out and
    move on. Note that timing out can be OK! It only means that we didn't wait
    for the response, not that the tracking didn't happen. It's not crazy to
    set this value to a tiny fraction of a second and just ignore responses
    from matomo.
    :param check_bots: Whether to check bots before hitting Matomo. Matomo
    itself has robust bot detection, so we can rely on that in general, but
    it's generally better to do some basic blocking here too to avoid even
    involving Matomo if we can. Set this to False if you prefer to rely
    exclusively on Matomo's bot detection.
    :returns the result of the wrapped function
    """

    def _decorate(f: Callable) -> Callable:
        @wraps(f)
        def wrapper(*args, **kwargs):
            t1 = time.time()
            result = f(*args, **kwargs)  # Run the view
            t2 = time.time()

            if settings.DEVELOPMENT:
                # Disable tracking during development.
                return result

            request = args[0]  # Request is always first arg.
            if check_bots and is_bot(request):
                return result

            url = request.build_absolute_uri()
            referer = request.META.get("HTTP_REFERER", "")
            url_domain = tldextract.extract(url)
            ref_domain = tldextract.extract(referer)
            if url_domain == ref_domain:
                # Referer domain is same as current. Don't count b/c it'll be
                # caught by client-side Matomo tracker already.
                return result

            try:
                # See: https://developer.matomo.org/api-reference/tracking-api
                requests.get(
                    settings.MATOMO_URL,
                    timeout=timeout,
                    params={
                        "idsite": settings.MATOMO_SITE_ID,
                        "rec": 1,  # Required but unexplained in docs.
                        "url": url,
                        "download": url,
                        "apiv": 1,
                        "urlref": referer,
                        "ua": request.META.get("HTTP_USER_AGENT", ""),
                        "gt_ms": int((t2 - t1) * 1000),  # Milliseconds
                        "send_image": 0,
                    },
                )
            except RequestException:
                logger.debug(
                    "Matomo tracking request had an error (likely "
                    "timeout?) out for URL: %s" % url
                )
            return result

        return wrapper

    if original_func:
        return _decorate(original_func)
    return _decorate

That’s cool! And if we put a timeout of 0.01 seconds we can ignore the responses from Matomo and handle viral content kind of OK. (It’d be nice to configure matomo to not even respond to this kind of request, but I digress.)

Alas, it’s time to move to S3 and I’m wondering if there’s a way to translate the above to, perhaps, a lambda@edge function that similarly pings Matomo when something is downloaded. This seems like something that’d be really needed, but I’m surprised it doesn’t already exist. Or maybe it does?

Are there any canned solutions like this? I saw the log importer, but I know we won’t be diligent about using that at my org.

Anything else?

Thanks,

Mike