I wish to run a SQL query, of my Matomo MySQL database to find which reports have not yet been archived.
What query or queries might I run to do that?
I have not yet tested the following. Pasting a long Chat GPT response here to circle back to:
In Matomo, “what still needs archiving?” can be seen from two angles:
-
what’s queued or in-progress to (re)archive, and
-
what should exist (given you have traffic) but has no archive row yet.
Below are battle-tested SQL snippets you can run against your Matomo MySQL. They don’t assume any plugin specifics and work for “All visits” (no segment). I’ve annotated each so you can adapt them.
1) What is queued / in progress to be archived
Matomo pushes work into matomo_archive_invalidations (this is the queue that core:archive consumes). It tracks site, period, date range, segment hash, status, and timestamps. (developer.matomo.org)
-- Pending (queued) invalidations
SELECT idinvalidation, idsite, period, date1, date2, name, segment, status, ts_started
FROM matomo_archive_invalidations
WHERE status = 0
ORDER BY idsite, period, date1, date2;
-- In-progress invalidations (and highlight ones that look stuck >24h)
SELECT idinvalidation, idsite, period, date1, date2, name, segment, status, ts_started
FROM matomo_archive_invalidations
WHERE status = 1
ORDER BY ts_started;
-- Likely-stuck jobs (started long ago)
SELECT idinvalidation, idsite, period, date1, date2, name, segment, status, ts_started
FROM matomo_archive_invalidations
WHERE status = 1
AND ts_started < NOW() - INTERVAL 24 HOUR;
Why this works: archive_invalidations is the work list that core:archive processes; status and ts_started show queued vs. running (and potentially stuck) work. (developer.matomo.org, Analytics Platform - Matomo)
2) Days with visits but no day archive row yet
Archive status rows live in the monthly matomo_archive_numeric_YYYY_MM tables as name LIKE 'done%' rows (the “done flag”). If there’s traffic for a day but no corresponding done% row for that day, the report hasn’t been archived. (developer.matomo.org)
Replace 2025_07 below with the month you care about; repeat per month in your range.
-- Days that had visits but no day archive (period=1) “done%” row at all (All visits = empty segment)
WITH days AS (
SELECT
v.idsite,
DATE(v.visit_first_action_time) AS d
FROM matomo_log_visit v
WHERE v.visit_first_action_time >= '2025-07-01'
AND v.visit_first_action_time < '2025-08-01'
GROUP BY v.idsite, DATE(v.visit_first_action_time)
)
SELECT d.idsite, d.d AS missing_day
FROM days d
LEFT JOIN matomo_archive_numeric_2025_07 an
ON an.idsite = d.idsite
AND an.period = 1 -- 1 = day
AND an.date1 = d.d
AND an.date2 = d.d
AND (an.segment IS NULL OR an.segment = '')
AND an.name LIKE 'done%' -- any done flag (all-plugins or plugin-specific)
-- if there is **no** done row, it was never archived
WHERE an.idarchive IS NULL
ORDER BY d.idsite, d.d;
This flags true “missing” archives (no done% row at all). (Matomo stores archive status as done* rows; values include states like OK, invalidated, partial, etc.) (developer.matomo.org)
Notes
•periodcodes: Matomo archives by period (day/week/month/year). You can adapt the same pattern for weekly/monthly/yearly by changingperiodand thedate1/date2logic. Matomo pre-archives day/week/month/year, while custom ranges are always on-demand. (developer.matomo.org)
• If you segment heavily, swap thesegmentfilter to a specific segment hash (or remove it to include segmented archives). Segment hashes are stored/used in done flags. (developer.matomo.org)
3) Invalidated (must be reprocessed) archives that haven’t been re-archived yet
When Matomo invalidates an archive, it (a) sets that archive’s done flag to INVALIDATED and (b) enqueues an entry in archive_invalidations. You can list the invalidations and see whether a newer OK archive exists yet for the same site/period/date/segment. (developer.matomo.org)
-- For one month table; repeat per month in your window
-- Show invalidations that don't yet have a newer “done%” archive row
WITH inv AS (
SELECT idsite, period, date1, date2, segment, MAX(ts_invalidated) AS ts_inv
FROM matomo_archive_invalidations
WHERE status IN (0,1) -- queued or in-progress
GROUP BY idsite, period, date1, date2, segment
)
SELECT
i.idsite, i.period, i.date1, i.date2, i.segment, i.ts_inv,
MAX(an.ts_archived) AS last_archived_ts
FROM inv i
LEFT JOIN matomo_archive_numeric_2025_07 an
ON an.idsite = i.idsite
AND an.period = i.period
AND an.date1 = i.date1
AND an.date2 = i.date2
AND COALESCE(an.segment,'') = COALESCE(i.segment,'')
AND an.name LIKE 'done%'
GROUP BY i.idsite, i.period, i.date1, i.date2, i.segment, i.ts_inv
HAVING last_archived_ts IS NULL OR last_archived_ts < i.ts_inv
ORDER BY i.idsite, i.period, i.date1;
This surfaces “still invalidated / not re-archived since the invalidation time” per period. (Matomo’s behavior: invalidation marks done as INVALIDATED and adds a row to the queue.) (developer.matomo.org)
Tips & gotchas
-
Monthly partitioning: archive tables are monthly (
…_YYYY_MM). Run queries per affected month(s). (developer.matomo.org) -
Segments: If you want only “All visits,” use
an.segment IS NULL OR an.segment = ''. To include a specific segment, filtersegmentto its MD5 hash. (developer.matomo.org) -
States of
doneflags: Thedone*row is the archive’s status; values include OK, ERROR, INVALIDATED, and PARTIAL. For pure “missing,” you don’t need the numeric values—you just check absence of anydone%row. (developer.matomo.org) -
Queue is source of truth for “what’s next”: Anything waiting to be (re)archived appears in
matomo_archive_invalidationsuntil core:archive processes it. (developer.matomo.org)