How to prevent duplicates in log analytics?

Hello,
I’m trying to setup Piwik log analytics to complement my already working JavaScript analytics. What I got is shared hosting which gives me access to an (archived) log file which is being updated every hour with new log entries. I red all of the guides, FAQs and tutorials I could find on how to go about importing access logs and I successfully imported log for my website. The next obvious step would be to setup a cron job to pull new log entries every hour, however, I’m not sure how to prevent duplicate data being sucked in. I couldn’t find any info whether import_logs.py has any built in mechanism to prevent duplicates so I’m assuming that it doesn’t. I’m also not technical enough to come up with my own solution so my question is: can anybody point me to a place where I could find some info on how to solve this problem or maybe even share their own solution?

The Log Analytics FAQ covers this.

As I mentioned in my original post I’m not a technical person but I have red the FAQ and to be honest I do not see the issue of duplicates mentioned anywhere. I apologize for asking something that maybe obvious but does it mean that import_logs.py has a built in mechanism to prevent duplicates?

Response from matthieu in a post from 2012 which can be found here Importing apache logs as long-term strategy
suggests that “duplicates are not ignored, this is a missing feature” but was it fixed since? Further responses even from 2015 imply that this is still an issue.

I haven’t tried it but the FAQ suggests that you “likely would import log files hourly or daily into Piwik” and shows the command to put in a cron job.

Yep and it works. Tried it myself, no problem there. The issue I’m concerned with is that my logs are being rotated on monthly basis. If I put a cron job once a day or once an hour it would import the same data multiple times and I’m just not knowledgeable enough to figure out how to handle it.

In that case I’ll have to defer to someone else who may know.

Any updates on preventing duplicates in log analytics? Setting up a pre-processor to split up these appending log files into unique hourly / daily / etc. log files is best practice currently? Would be nice if Matomo could handle this build in.