Delete invalidation data in the database

x users
0 segments (0 pre-processed, 0 processed in real-time)
x goals
x tracking failures
x websites
x activated plugins
Matomo version: 5.0.3
MySQL version: 10.5.23-MariaDB-0+deb11u1
PHP version: 8.3.6

I had created a segment and saved it. I had forgotten it and have make a monthly mysqldump of the matomo data base. The database was 100 MB bigger as expected. So, i am a little bit confused and remember me on the segment. So, i have deleted the segment in the matomo backend. In the next step, i have wait for the next cron job. After that, i have make a new mysqldump and it is furthermore 100 MB bigger as expected. So, i think, the segment was not deleted in the database.

My question is: How can i delete a/all segment reports in the database manually?

Matomo is horrible…

i have tested little bit things.

First, i found a database table segment, and in this table all segment settings are saved included the deleted segments! Matomo is also like Facebook and Co.: delete is not delete. “Delete” is only a database entry (0/1).

Next, i have tested core:invalidate-report-data with and without --segment.

Tested with the ID of the segments as --segment=1,2,3,4,5,6,7,8,9,10,11,12

In SegmentExpression.php line 135:
The segment condition ‘1’ is not valid.

Tested with the segment name as --segment='1PageDirectView'

In SegmentExpression.php line 135:
The segment condition ‘1PageDirectView’ is not valid.

The value are directly copy&paste from the database table.

The manpage wrote:

–segment
List of segments to invalidate report data for. This can be the segment string itself, the segment name from the UI or the ID of the segment. If specifying the segment definition, make sure it is encoded properly (it should be the same as the segment parameter in the URL. (multiple values allowed)

It works only with the segment string, and i had tested it only with single segment string.

Next step re-archived: core:archive with and without --skip-all-segments.

The result: The database is yet 100 MB + 30 MB increased as expected (summary 130 MB). 30 MB increased only today through the commands above. increase, increase, increase. only increase. I’m really frustrated.

@heurteph-ei please help.

Questions:

  1. Where (in which database table) are the segment reports saved?
  2. Can the segment reports be deleted manually from the database?
  3. Can the database table segment be deleted or emptied manually?

For the increasing of the database see here:

I have found this database table:

  • archive_invalidations
    – Rows: ~165.020
    – Size: 20,1 MiB

Why is this data are always saved and increase the database? Is this database table cleaned up with this command?

./console core:purge-old-archive-data all

or only this cleaned it up?

./console database:optimize-archive-tables all

So … I had a misunderstanding and had to clarify it. Yet I have new questions about the topic.

Where are the saved segments stored in the database? Are the segments saved/stored in the same way as reports? It is possible to delete the segments separatelly?

The matter is very confusing. The mysql dumps have every time in the last days different sizes. No compression is used. The dumps are *.sql. No deletion of data or reports or elsewhere is activated. Only a cronjob @hourly core:archive.

actually: 2024-04-01 - 1,654 MB

expected: 2024-05-01 - 1,684 MB

actually: 2024-05-01 - 1,780 MB (~100 MB bigger as expected)

actually: 2024-05-05 - 1,800 MB

actually: 2024-05-06 - 1,738 MB

Then I run: php ./console core:purge-old-archive-data all

actually: 2024-05-06 - 1,738 MB (~50 MB bigger as expected)

It is really confusing.

Hi @melbao
Are you sure the segmented report is the thing that makes your DB too high?
I suggest you check at: :gear: > Diagnostic > ** Database usage**. Then you’ll be able to see which tables are the biggest.

On segments management for archiving, you have to get the segment definition in order to deal with this (but you discovered this in your 2nd message).
For your questions:

I don’t know, this is very technical things. For me, segmented reports should be in the same tables as the “not-segmented” reports, as my understanding is that “All visits” is a kind of segment (with no segmentation rule).
Also, in my understanding, the segment table is a table where all segments of all users and of all sites are defined. If no segment is present in the table, the data archiving should just run on the “All visits” segment…

Hi @heurteph-ei , thanks for your answer. I don’t understand “segmentation” really in the case of database.

The Plugin DBStats (Core) is inactive. I had used phpMyAdmin.

Yes, the segmentation definitions are stored in the database table segment.

  • Rows: 12
  • Size: 16.0 KiB

Current, all segmentations are “deleted” (database table entry).

deleted tinyint(4) NOT NULL DEFAULT 0,

deleted = 1

The question about this is, why are the deleted segmentations are furthermore stored in the database? There is no option in the backend to reactivate it. OK, it’s not so relevant, because 16 KiB.

Same question about the database table archive_invalidations.

  • Rows: ~165.020
  • Size: 20,1 MiB

Why are this data furthermore stored in the database? It’s a little bit relevant with 20 MiB.

as my understanding is that “All visits” is a kind of segment (with no segmentation rule)

It’s a nice thinking. So, with this, segmentations are not separated reports, and segmentations used the same “common” reports, and a segmentation are only filtered the common reports “on-the-fly” by delivery/displaying. The question is yet: Is this true?

If no segment is present in the table, the data archiving should just run on the “All visits” segment…

This is the oposite of the thinking before. So, the “data archiving” stored separate segmentation reports in the database, if a segmentation is present/active?

Are you sure the segmented report is the thing that makes your DB too high?

I’m not sure. I only observed that to date in my database the old archive_* database tables (2021-11 - today) are increased in counterpart of a database dump 1 month behind.

Detailed infos here: Database old archive data strong increased

I don’t know whether saving a segmentation (i.e. not just “test”, but “save”) is to blame for this. For this question is the linked topic.

This question here is reduced to: Why are the “deleted” segmentations and the past invalidations are furthermore stored in the database table segment and archive_invalidations?

News:

I run few commands manually:

  • console core:archive --url=https://example.com/
  • console core:run-scheduled-tasks --force
  • console core:purge-old-archive-data all

After that, the database tables

  • segment = 12 rows, 16.0 KiB (all “deleted”).
  • archive_invalidations = 0 rows.

The database mysql dump is yet 1,733 MB (furthermore 50 MB more as expected).

The TRUNCATE of the database table archive_invalidations is fine, because it was 20 MiB.

The furthermore storing of “deleted” segments is not so fine. But it is only 16 KiB.

Are you sure the segmented report is the thing that makes your DB too high?

I think so, because, in the month (april) between the monthly mysql dumps …

  • 2024-04-01 - 1,650 MB
  • 2024-04-07 - 1,733 MB

… the only what was done differently compared to the months before was a “Save” of a segment (and afterwards deleting. This Problem: Problem with Segments).

So, let us close this Topic and let us continue here: