In order to avoid using cookies I am importing Apache log files in to Piwik. I have found that the list of bot exclusions in import_log.py is no where near substantial enough and initial imports resulted in an excessive hit count due to bots.
I have attached a csv of the bots I will post below the list of bots that I added to import_log.py but obviously editing the import script is not ideal.
Would it be possible to have the script call a csv (or something similar) that wouldn’t get overridden upon an update?
Would it be possible to have an option in the script to ignore "HEAD … " requests as these are not useful when looking for human visits to the site? (I currently do this with sed before I import the logs)
Would it be possible to have an option to ignore GET requests which look like this:
“GET / HTTP/1.1” 200 14834 “-” “-”
as they are very unlikely to be human also?