We’re doing some direct reporting on the DB to total up the number of pages viewed in a visit using page buckets (1, 2, 3, n pages per visit). The goal is to produce a histogram, but all we need is the data.
However, we’re getting really unusual distributions that are not squaring with other analytics systems (and this is across several sites with independent third party analytics). We’ve decomposed the SQL and there are no obvious problems. Piwik itself doesn’t provide this reporting, but from the schema, it seems eminently doable. What are we missing here?
page_count AS bucket_size, COUNT(idvisit) AS num_visits
idsite, idvisit, COUNT(pla2.name) AS page_count
JOIN piwik_log_action pla2 ON (pllva2.idaction_url = pla2.idaction)
pla2.type = 1
GROUP BY idvisit
ORDER BY idvisit ASC) tmp
GROUP BY bucket_size
ORDER BY bucket_size;