Over the years, I’ve struggled to properly track static content. We host millions of PDFs, and what happens is that we log clicks from our website, but our docs sometimes go viral and when that happens, we miss the clicks from other websites. We eventually fixed that by creating the decorator below, which tracks things in Matomo when they are served by Django:
def track_in_matomo(
original_func: Callable = None,
timeout: float = 0.5,
check_bots: bool = True,
) -> Callable:
"""A decorator to track a request in Matomo.
This decorator is needed on static assets that we want to track because
those assets can only be tracked by client-side Matomo if they are accessed
by a user clicking a link in CourtListener itself. If, for example,
somebody shares a link or otherwise clicks it outside of CourtListener, we
don't have an opportunity to run our client-side code on that item, and we
won't be able to track it.
The code here wraps a view so that when somebody accesses something like a
PDF from an external site (and only from an external site), we track that
properly. If people have a CourtListener referer, we ignore them under the
assumption that they got tracked client-side.
For the design pattern, see: https://stackoverflow.com/a/24617244/64911
:param original_func: The function that we're wrapping.
:param timeout: The amount of time the Matomo tracking request has to
respond. If it does not respond in this amount of time, we time out and
move on. Note that timing out can be OK! It only means that we didn't wait
for the response, not that the tracking didn't happen. It's not crazy to
set this value to a tiny fraction of a second and just ignore responses
from matomo.
:param check_bots: Whether to check bots before hitting Matomo. Matomo
itself has robust bot detection, so we can rely on that in general, but
it's generally better to do some basic blocking here too to avoid even
involving Matomo if we can. Set this to False if you prefer to rely
exclusively on Matomo's bot detection.
:returns the result of the wrapped function
"""
def _decorate(f: Callable) -> Callable:
@wraps(f)
def wrapper(*args, **kwargs):
t1 = time.time()
result = f(*args, **kwargs) # Run the view
t2 = time.time()
if settings.DEVELOPMENT:
# Disable tracking during development.
return result
request = args[0] # Request is always first arg.
if check_bots and is_bot(request):
return result
url = request.build_absolute_uri()
referer = request.META.get("HTTP_REFERER", "")
url_domain = tldextract.extract(url)
ref_domain = tldextract.extract(referer)
if url_domain == ref_domain:
# Referer domain is same as current. Don't count b/c it'll be
# caught by client-side Matomo tracker already.
return result
try:
# See: https://developer.matomo.org/api-reference/tracking-api
requests.get(
settings.MATOMO_URL,
timeout=timeout,
params={
"idsite": settings.MATOMO_SITE_ID,
"rec": 1, # Required but unexplained in docs.
"url": url,
"download": url,
"apiv": 1,
"urlref": referer,
"ua": request.META.get("HTTP_USER_AGENT", ""),
"gt_ms": int((t2 - t1) * 1000), # Milliseconds
"send_image": 0,
},
)
except RequestException:
logger.debug(
"Matomo tracking request had an error (likely "
"timeout?) out for URL: %s" % url
)
return result
return wrapper
if original_func:
return _decorate(original_func)
return _decorate
That’s cool! And if we put a timeout of 0.01 seconds we can ignore the responses from Matomo and handle viral content kind of OK. (It’d be nice to configure matomo to not even respond to this kind of request, but I digress.)
Alas, it’s time to move to S3 and I’m wondering if there’s a way to translate the above to, perhaps, a lambda@edge function that similarly pings Matomo when something is downloaded. This seems like something that’d be really needed, but I’m surprised it doesn’t already exist. Or maybe it does?
Are there any canned solutions like this? I saw the log importer, but I know we won’t be diligent about using that at my org.
Anything else?
Thanks,
Mike