We are seeing a significant discrepancy between matomo imports through log reading and Google analytics (kept in parallel active).
To give a practical example, in the month of July 2023 Matomo recorded 4,112,600 visits while Analytics a more credible 94,097.
a forty-fold difference ratio.
We cannot understand the problem but we are certain that Matomo is exaggerating the statistics also considering the load capacity of our hosting service which would not be able to handle such volumes in the slightest.
The import system is set up via crontab in the following way:
0 22 * * * python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --url=http://19*...1/matomo/ --idsite=2 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots /var/log/httpd/443-access_log > /home//***logs/matomo_import.log
logs show this data:
Logs import summary
532450 requests imported successfully
5191 requests were downloads
14095 requests ignored:
0 HTTP errors
0 HTTP redirects
14095 invalid log lines
0 filtered log lines
0 requests did not match any known site
0 requests did not match any --hostname
0 requests done by bots, search engines...
0 requests to static resources (css, js, images, ico, ttf...)
0 requests to file downloads did not match any --download-extensions
Website import summary
532450 requests imported to 1 sites
1 sites already existed
0 sites were created:
0 distinct hostnames did not match any existing site:
Performance summary
Total time: 3900 seconds
Requests imported per second: 136.51 requests per second
The strangest thing is that it doesn’t track static requests but the log is flat
Like:
*** - - [04/Oct/2023:15:12:41 +0200] “GET /templates/*/js/_box.js?ver=1696425161 HTTP/1.1” 200 7596
From your experience, is there anything we can do to get credible visits from Matomo using reading Apache logs?