OK I’ve had a quick hack at this that seems to work surprisingly well, and it’s only a handful of extra lines of code. It piggybacks the existing lineno counter and --skip option, except it automates it using tracking files that just contain the last value of lineno. It’s implemented as a single boolean option --auto-skip (that is off by default). If enabled, it creates a ‘skip-markers’ directory in the same place the script lives, and then creates .marker files within that directory with the last value of lineno. It then tries to read this at the start of a run and if it contains a valid lineno it skips to that point of the logfile, in the same way that --skip does.
This allows you to run the import_logs.py script against a live logfile continuously (so you can cron it to run every minute, if you want, rather than just once a day and being forced to rotate the log), and is a great alternative if you don’t want to (or can’t) use piped logging. For example I fire all my weblogs from a webfarm over rsyslog/logstash to a central server (and consolidate into a single central logfile for each vhost), and then just run this script continuously against the single central logfile. It’s also much less resource intensive on a piwik system to run this continuously (with lots of smaller hits) rather than one big hit that saturates the webserver/database less frequently.
For the future, it would be even simpler if the Piwik API was extended to include a field for this tracking value, then the local files could be done away with alltogether.
Hope this is of use to others, if it’s something that you’d be interested in adding then the diff against 2.1-rc2 is below (sorry, this forum won’t allow me to attach a diff file), or let me know if you’d like me to attempt a git pull request:
454a455,458
'--auto-skip', dest='auto_skip', action='store_true', default=False,
help="Track logfile processing and automatically skip processed lines on the next run. This allows multiple runs against an active logfile.",
)
option_parser.add_option(
1523a1528,1539
# If auto-skip and a real file, try and read the last marker
markerdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "skip-markers")
markerfile = os.path.join(markerdir, os.path.basename(filename) + ".marker")
skipmarker = None
if config.options.auto_skip and file != sys.stdin:
if os.path.exists(markerfile):
with open(markerfile, 'r') as f:
try:
skipmarker = int(f.readline())
except:
skipmarker = None
1555c1571
< if stats.count_lines_parsed.value <= config.options.skip:
if stats.count_lines_parsed.value <= config.options.skip or (skipmarker and (stats.count_lines_parsed.value <= skipmarker)):
1670,1671c1686,1693
<
<
# If auto-skip, write the file marker
if config.options.auto_skip and file != sys.stdin:
# First, make sure the directory exists, in the same place as this script
if not os.path.isdir(markerdir):
os.makedirs(markerdir)
# Write the ending lineno to the marker file
with open(markerfile, 'w') as f:
f.write(str(lineno))
1711a1734,1743
# If auto-skip, write the file marker
if config.options.auto_skip and file != sys.stdin:
markerdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "skip-markers")
markerfile = os.path.join(markerdir, os.path.basename(filename) + ".marker")
# First, make sure the directory exists, in the same place as this script
if not os.path.isdir(markerdir):
os.makedirs(markerdir)
# Write the ending lineno to the marker file
with open(markerfile, 'w') as f:
f.write(str(lineno))