piwik_log_action has duplicate records with same value but diff hash

(Sony) #1

Hi Piwik Team,
We were using piwik since last 2 years, thanks for the great work.
I am facinga very different issue. I am assuming that piwik_log_action table will have all unique urls and page titles. But surprisingly there are duplicate page titles with diff hash I.e. same value with diff hash number. I f I group by idaction, it is giving me wrong numbers with wrong aggregation.

Can you please let me know is there any scenario where we can have duplicates or any one facing this issue?
How can I fix this one, Shall I reset hash for all records again?

Thanks in advance.

(Matthieu Aubry) #2

it shouldn;'t happen often correct? can you please output your testing data?

It might be caused by different collation in the strings… not sure?

(Sony) #3

Matt and Moserser,

thanks for the quick reply, here are some more details.

I should say its happening freaquently only. Total number of records in piwik_log_action table are 1768177
Number of duplicate records 14000

I can give lot of examples.

Credit/Client Dashboard 498582573
Credit/Client Dashboard 1958676980

Commodities/Overview 1079662284
Commodities/Overview -406640379

Search/FullTextSearch/Fund Flows 2096834533
Search/FullTextSearch/Fund Flows -132122507

If I do mysql CRC32 of any of the above strings, it is giving me one hash value and no idea on how come the second one has diff hash.
How to check for string collation? We were getting pagetitles from diff sources.
Can I perform any kind of tests on the existing data to find out some clue or I need to perform this check before inserting in the DB (Tracker plugin)?

(Matthieu Aubry) #4

I have a suggestion if you would like to help research this issue.

Create 2 pages, one utf-8 and one in another charset (some russian for example)

Record each page view with the same title.

Do they show up in your database as 2 lines in the log_action table?

1)If there is only 1 line, there must be some other bug that I don’t know what it could be
2) If there are 2 lines this is probably a bug in Piwik that we should fix -> please create a ticket at dev.piwik.org