Ghost visits: Real visit is duplicated in irregular intervals

Hi from Germany! First of all let me congratulate the Piwik team. This is an outstanding tool that you have created and it exactly matches my requirements. Thank you very much! :slight_smile:

However, I have a problem with apperantly wrong vistor logs. Unlike other Piwik users, I don’t have missing vistors, I have too many counted. I definitely see ghost vistors, but in the FAQ and forum I haven’t found anything related. But let me start from the beginning.

I am webmaster of a web site for some non-profit organisation. We use Joomla 2.5.8 on a Strato web server. There are no appearant problems.

I have installed Piwik 1.9.2 on December 27th. Everything worked fine, Piwik is just great. Upgrading to Piwik 1.10 on January 15th and Piwik 1.10.1 on January 17th including database update was flawless. I noticed that for a few minutes I had no past data (zero vistors for the days before) in the reports, but that went away without my action. I see visits, I get statistics and reports: everything seems okay.

Now what happened later is that I noticed strange visits in the statistics: one of my own visits appears again and again, so I am counted multiple times. I know that the data is wrong, because I did some tests with an unusual browser combination and unusual IP, tracking some pages that were not open for the public. Thus, this visit stands out from the rest. This visit with 11 page views was tracked fine on January 14th (so with Piwik 1.9.2). Meanwhile I see this individual combination eight times (and counting) in the logs of Piwik 1.10.1. It is exactly the same IP, browser and plugins. The intervals between the visits range from two hours to two days. It is 11 page views per visit, but not in the same order, so the duration of the visit varies slightly. I am 100% positive that these visits (except the first one) didn’t happen. Usually, I use the ignore-me cookie, but in this test I didn’t.

So I am sure that something is wrong with my database. I don’t know whether I just have to ignore these ghost visits to get the real picture, or whether the matter is a bit more complex and real vistors are connected to old actions, which would distort the statistics.

So I ask myself (and you, if anybody would be so kind to help):

  1. What could have gone wrong? A mistake in the database update? A bug in the core (unlikely, but never impossible)? User mistake (more likely, but what could I have done)?
  2. Is there any kind of check that I could run on the tables linkvisit_action, log_action, and log_visit (these tables are filled with reasonable data) to check the consistency?
  3. Is the duplication limited to one visit, or is there the risk that other visits are affected, too? How can I check?
  4. Is there anything that I can do to repair it, so that the visit is no longer duplicated?
  5. How can I get rid of the ghost visit data in the database?

The cleanest solution is a deinstallation and reinstallation of Piwik, so that the previous data of the past three weeks is deleted. I wouldn’t mind the loss of data, when I am sure that I can rely on the statistics afterwards. However, I prefer to understand the problem, so that I might avoid similar trouble in the future. Even better if I can keep the data.

Any thoughts? Advice is appreciated.

its’ technically impossible to have “ghost” visitors: something will add the data to piwik which piwik will then display.

Hi Matt! So you mean there is an actual trigger behind the data? That would explain the different intervals. I am not familiar with the Piwik algorithm. As far as I can see (with phpmyadmin), every page view creates a new line in the table *_log_link_visit_action. There is a pointer idvisit to a line in the table *_log_visit, which holds information about IP address, screen resolution and browser details. There is also a pointer idaction_url to a line in the table *_log_action, which holds the page name of the visited page. Is this correct?

The contents of log_visits puzzles me a bit. The time stamps in the columns visitor_localtime and visit_first_action_time have a difference of 1 hour (give or take a few seconds), which is the difference between German local time and UTC. Now there are a few entries with exactly the same localtime (all 14:22:39), but different actiontime. In Piwik’s visitor log, they appear with their individual action times (converted to the correct time zone), which seems okay. But these entries with the inconsistent time stamps are exactly those of which I am convinced they didn’t happen, at least they didn’t happen with the indicated visitor details. How are localtime and actiontime determined? Is actiontime copied from the servertime field in the *_log_link_visit_action table?

I don’t know whether it is of any importance: The only entries in the database with config_os=“UNK” and config_browser_name=“NS” are those that appear now multiple times. I fiddled around with some browser and cookie settings during my tests, so this was correct for one visit. Now I have nearly the same visit entry eight times in the database.

What could be the cause?

I have deleted the corresponding pages, so that in case the visits show up again, I can be sure that something is wrong. However, the ghosts visits haven’t appeared again. So the problem is gone.

I am still puzzled about the strange values in visitor_localtime, though.

jkl is it possible some sort of rss feed or other process is being captured by piwik?

if you were to analyze all your pages(pull a pages report) for a given day that contribute to your stats are there any any odd ones that stick out?

Some CMSs have these prcesses running constantly so it could be one of those creating a seemingly ghost visit.

regards

Hello lesjokolat, thanks for your reply.
I have done “Action-Pages” and “Action-Page titles” for various intervals during that time. Everything seems to be okay, no surprises here, no visible inconsistencies.
I also thought about automatic processes. I think I can exclude my own machines, as some visits were logged during the night or on Sunday morning, when my computers were off. I thought of a proxy server (in fact the logged IP is a proxy) that repeated my requests from Wednesday a few times between Wednesday night and Sunday morning, but this is not very likely.

As I said, the visitor_localtime is strange. These visits have an identical localtime (to the second), but a different server time (visit_first_action_time). Also the visitor_count_visits stays the same. Shouldn’t this be incremented for every new visit of a returning visitor? The columns visitor_days_since_last and visitor_days_since_first seem to be calculated correctly.

Strange…

hmm yes hard to pinpoint

http://piwik.org/faq/how-to-install/#faq_98

Just double check any proxy settings maybe there is something there that is slightly misconfigured? Just an idea no real proof, any error logs from apache that could shed some light?

I have no access to the proxy settings. In the lunch break of my main job I was using the company PC and company network to access my web site, so I won’t bother the network admins.

Neither can I acess the Apache log. I am webmaster of a subdomain only. I have FTP access to my subdirectory, and I have a database, but no console. (Yes, some of us are not root on the web server)

I am not convinced that it is a server issue, because I see inconsistencies in the database:

  1. The difference between local time and server time is in most cases an integer number of hours. If for some reasons it is not (wrong time on one of the machines), the difference should at least stay constant over time. Here the local time stays the same. I have a case of two “ghosts” visits from the same server. The difference between the action times is around three hours (same day), the difference between the local times is zero seconds. That’s why I asked “How are localtime and actiontime determined?”

  2. As far as I understand (correct me if I am wrong), in the table “log_visit” there shouldn’t be two rows with two identical values for “idvisitor” AND two identical values for “visitor_count_visit”. I have such cases here. (Appearantly, this is a new column, it is not yet shown in the database schema

Both points appear only for my one own ghost visit. All other entries seem to be okay.

Would it be possible that there was an issue during database upgrade so that something got corrupt? How can I check integrity of the database?

Anway, it is not really a problem, because it hasn’t happened again. But I’d prefer to know.