I wanted to remove a high-traffic site from the database and followed this article to do so.
After I deleted the archive tables and rebuild them with archive.sh I recognized that two tables were way to large in comparison to the others. Please see Screenshot1 for the numbers.
Investigating this, I found that some lines with identical content (apart from primary key and timestamp) exist multiple times in the monthly tables (please see Screenshot2). First I thought the archive script mistakenly writes a new entry every time it runs, but the timestamps are also within a single run of the script. It also seems to concern all types of reports, not only the action-related you can see on the attached screenshot.
I first recognized this with version 1.3 but updated to 1.4 since. Deleting and rebuilding the archive tables after the update did not improve the situation.
Since I analyze many sites with Piwik, my guess is that the archive script in some cases does not reset its collected data when switching to the next website. Meaning, for every website the already written data of the archived websites before gets written again. For example: After archiving website 1 & 2, archiving website 3 also writes the collected data for the sites 1 and 2 again. This might also be the reason for the memory issues some mentioned.
Strange tough, that it does not affect data of all months.