Short term storage of data

I’m considering self-hosting Matomo for a project involving collecting analytics from a number of distinct websites.

Some websites have a significant number of page views, and I am concerned about keeping costs low, particularly where storage is involved.

Somewhat counter-intuitively, we are not actually concerned about a bunch of data or the typical use case for analytics. So I have a few questions:

  1. Is it possible to restrict what data is collected and stored by Matomo? Essentially anything beyond the number of page views is not that important for our use case.
  2. Regardless of the answer to the above, is it possible to clean up / remove data older than, say, the last 90 days?

Thanks for any advice.

Hi! You’re asking all the right questions — Matomo is very flexible when it comes to customizing data collection and storage, especially for low-footprint setups.

1. Restricting What Data Is Collected
Yes, Matomo allows you to control what data is collected quite granularly. Here are a few options:

  • You can disable tracking features like IP address, user agents, referrers, goals, session recordings, etc., in the Privacy settings or via the tracking code itself.
  • Customize the tracking script (JavaScript or API) to send only the data you care about — for example, just trackPageView() without events, goals, etc.
  • Use Matomo Tag Manager to fine-tune what data is fired and when.

For your use case, just focusing on page views makes it very lightweight.

2. Automatically Removing Data Older Than X Days
Absolutely — Matomo includes data retention settings under Settings > Privacy > Anonymize Data > Data Retention. You can:

  • Automatically delete log data older than a specified number of days (e.g., 90).
  • Retain only summarized report data (e.g., daily/weekly/monthly aggregates), which takes much less space.

You can also configure core:delete-logs-data as a scheduled task via CRON if you want even finer control.

Pro tip: Enabling “Keep basic aggregated reports” but discarding raw logs is often the best middle ground for performance and storage efficiency.