I came back from vacations and the piwik had 1,2 Billion registers in the piwik_log_link_visit_action, total DB around 440GB.
So a friend of mine removed the python twisted portion of the code and called the piwik php functions directly from the shell in a stripped down tracker.php script, it made the import much more efficient than called apache to do it, and reuse the mysql connections.
First action was to reduce it to ~700 million deleting old logs.
Mostly because the disc space, I divided the database in 2 machines, and partitioned the tables piwik_log_visit (by id) and the piwik_log_link_visit_action (by date). Now I have 2 piwik installations.
Then I stripped the archive.sh to run only 1 day each 2 hours, and only 2 days, 3 weeks, 2 months and 2 years during the night.
It is still collecting data and the data is still been processed in time, but the interface freezes if I choose a very high traffic website for the today’s statistics. Not a problem for the yesterday’s one.
In one server, it is able to register 2300 log lines per second, the other one 1500 log lines per second (not considering the download from amazon aws s3). Of course it has not so many lines to register all the time we see it when we leave the logs growing during maintenances.
It takes 15 minutes to process one day in the biggest website (not processing week, month and year).
My archive.sh looks like this for the hourly cronjob:
#all the same until this line
echo "Starting Piwik reports archiving..."
for idsite in $ID_SITES; do
TEST_IS_NUMERIC=`echo $idsite | egrep '^[0-9]+$'`
if test -n "$TEST_IS_NUMERIC"; then
CMD="$PHP_BIN -q $PIWIK_PATH -- module=API&method=VisitsSummary.getVisits&idSite=$idsite&period=$period&date=$last&format=xml&token_auth=$TOKEN_AUTH"
note two things:
this “&” in front of CMD make all websites run in parallel
and the command ‘wait’, makes the shell script waits until all finish before it continues (for example if you want to run the week after that).
that is it for now
btw, now we are celebrating we became the 4th brasilian portal in number of visits.