Cannot load data into piwik


#1

Greetings,

I am a newbie to Piwik. I seem to be having an issue with loading data from access files into the application.

Let me start by saying that this all used to work, then the last days it seems not to be visible importing the data and running the archive.
Below is my import method that I have been using. I have this set as a cronjob that runs everyday at 4 in the morning. after all the previous days files have been placed into the local $LOGDIR location.
#!/bin/bash

YEAR=$(date --date=‘yesterday’ ‘+%Y’)
MONTH=$(date --date=‘yesterday’ ‘+%m’)
DAY=$(date --date=‘yesterday’ ‘+%d’)
LOGDIR=/tmp/parselogs.$DAY.$MONTH
LOG=$YEAR.$MONTH.$DAY.piwik.stats.log

echo “Parsing started” >> $LOG
date >> $LOG

for HOUR in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23; do
sudo /var/www/piwik/misc/log-analytics/import_logs.py
–enable-http-errors --enable-http-errors --enable-http-redirects
–exclude-path-from=/home/piwik/excludepaths.txt --url=https://stats.globalcollect.com/piwik/ --idsite=1
$LOGDIR/access_na.gcsip.com*${YEAR}${MONTH}${DAY}${HOUR}*
sleep 10
done
echo “Parsing Completed” >> $LOG
date >> $LOG

Looking through the log files I cannot see any issues with mysql, php, or system logs. After the archive is completed I do not see the that the data uploaded is visible within the stats pages?

I am completely lost as to why this is not visible.

Can anyone please point me into the correct direction.

Below are my server specs.

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel® Xeon® CPU E7330 @ 2.40GHz
stepping : 11
microcode : 0xb6
cpu MHz : 2393.890
cache size : 3072 KB
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx lm constant_tsc arch_perfmon pebs bts nopl tsc_reliable aperfmperf pni ssse3 cx16 hypervisor lahf_lm dtherm
bogomips : 4787.78
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel® Xeon® CPU E7330 @ 2.40GHz
stepping : 11
microcode : 0xb6
cpu MHz : 2393.890
cache size : 3072 KB
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx lm constant_tsc arch_perfmon pebs bts nopl tsc_reliable aperfmperf pni ssse3 cx16 hypervisor lahf_lm dtherm
bogomips : 4787.78
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel® Xeon® CPU E7330 @ 2.40GHz
stepping : 11
microcode : 0xb6
cpu MHz : 2393.890
cache size : 3072 KB
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx lm constant_tsc arch_perfmon pebs bts nopl tsc_reliable aperfmperf pni ssse3 cx16 hypervisor lahf_lm dtherm
bogomips : 4787.78
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel® Xeon® CPU E7330 @ 2.40GHz
stepping : 11
microcode : 0xb6
cpu MHz : 2393.890
cache size : 3072 KB
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx lm constant_tsc arch_perfmon pebs bts nopl tsc_reliable aperfmperf pni ssse3 cx16 hypervisor lahf_lm dtherm
bogomips : 4787.78
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

piwik@Piwik:~$ free
total used free shared buffers cached
Mem: 16435500 3046336 13389164 0 50516 272644
-/+ buffers/cache: 2723176 13712324
Swap: 524284 0 524284
piwik@Piwik:~$ cat /proc/meminfo
MemTotal: 16435500 kB
MemFree: 13389140 kB
Buffers: 50524 kB
Cached: 272656 kB
SwapCached: 0 kB
Active: 2640264 kB
Inactive: 230848 kB
Active(anon): 2580712 kB
Inactive(anon): 800 kB
Active(file): 59552 kB
Inactive(file): 230048 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 524284 kB
SwapFree: 524284 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 2547972 kB
Mapped: 54524 kB
Shmem: 33576 kB
Slab: 36204 kB
SReclaimable: 20952 kB
SUnreclaim: 15252 kB
KernelStack: 1480 kB
PageTables: 12276 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8742032 kB
Committed_AS: 6156380 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 303504 kB
VmallocChunk: 34359431420 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 61440 kB
DirectMap2M: 16715776 kB

piwik@Piwik:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise

mysql> SHOW VARIABLES LIKE “%version%”;
±------------------------±------------------------+
| Variable_name | Value |
±------------------------±------------------------+
| innodb_version | 5.5.34 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.34-0ubuntu0.12.04.1 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
±------------------------±------------------------+

I hope that I have covered all the aspects.
Thanks
Lawrence


(Matthieu Aubry) #2

For recent days do you see data in the Visitors>Visitor log?

if not then problem is with import.

If yes then the data is correctly imported and the archiving may have some problems. Do you see any error in the server error logs?


#3

Matt

thanks for the reply, I have done a little more investigation. This is what I have seen.

Something that I did not and still do not understand is why so many requests are ignored.

I tried to run the following to only load a single file into the database and saw the following.

piwik@Piwik:~$ python /var/www/piwik/misc/log-analytics/import_logs.py --enable-http-errors --enable-http-errors --enable-http-redirects --exclude-path-from=/home/piwik/excludepaths.txt --url=https://stats.globalcollect.com/piwik/ --idsite=1 /tmp/parselogs.27.10/access_na.gcsip.com_prod_mia_wbp03.20131027052414.gz
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /tmp/parselogs.27.10/access_na.gcsip.com_prod_mia_wbp03.20131027052414.gz…
1200 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 200 lines recorded, 33 records/sec (avg), 200 records/sec (current)
1200 lines parsed, 200 lines recorded, 28 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 200 lines recorded, 24 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 200 lines recorded, 22 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 200 lines recorded, 19 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 400 lines recorded, 36 records/sec (avg), 200 records/sec (current)
1200 lines parsed, 400 lines recorded, 33 records/sec (avg), 0 records/sec (current)
1200 lines parsed, 400 lines recorded, 30 records/sec (avg), 0 records/sec (current)
Purging Piwik archives for dates: 2013-10-27
To re-process these reports with your new update data, execute the piwik/misc/cron/archive.php script, or see: How to Set up Auto-Archiving of Your Reports - Analytics Platform - Matomo for more info.

Logs import summary

521 requests imported successfully
0 requests were downloads
502 requests ignored:
    0 invalid log lines
    0 requests done by bots, search engines, ...
    0 HTTP errors
    0 HTTP redirects
    502 requests to static resources (css, js, ...)
    0 requests did not match any known site
    0 requests did not match any requested hostname

Website import summary

521 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 13 seconds
Requests imported per second: 37.68 requests per second

I have looked through the server logs but see no issues there. Looking at the results above I think that there is an issue with the import.

I will attach the log file to the ticket so that you can see if there is any issues with it. The file was generated from a sunone web server so that might be what is causing the issue.

Thanks for the advice.
Lawrence


#4

Matt

I have now attached a single log file from the sunone server. I would appreciate some feedback how I can resolve this issue for the future.

Thanks for your help.

Lawrence


(Matthieu Aubry) #5

What is your problem? see my questions above…


#6

Matt

sorry I missed the boat here a little. You said that you asked some questions. I saw the following and thought that I had answer them.

For recent days do you see data in the Visitors>Visitor log? if not then problem is with import.
Yes I do see data in the visitors logs.

If yes then the data is correctly imported and the archiving may have some problems. Do you see any error in the server error logs?
No I do not see any errors in the server error logs at all.

I did ask a question about the archive results.

Why are there so many requests ignored. The file I saw the below results for is attached.
521 requests imported successfully
0 requests were downloads
502 requests ignored:

Lawrence


(Matthieu Aubry) #7

You can not ignore these by adding --enable-http-errors --enable-http-redirects --enable-static --enable-bots


#8

Matt

Thanks for the reply, but what do you mean by “You can not ignore these by adding --enable-http-errors --enable-http-redirects --enable-static --enable-bots”

Are you saying that these should be removed for better stats, should they be in the import or should they not be. I am completely new to this and am trying to learn, so please be patient and I am sure taht I will get my head around it some time soon … I hope.

Thanks
Lawrence


#9

@gcpiwik

I believe that the ignored lines are lines that contain a different URL, than that of --idsite=1. Are you running this on a server that has multiple websites hosted on it?

I ask because of this:

0 distinct hostnames did not match any existing site:

Also, you should take advantage of the 4 core processor you have, by adding:

--recorders=4

This will enable all 4 cores to process your logs.

Sven2157


#10

Sven2157

Thanks for the feedback, I am only running one site on the server, this is only for the time being. I will expand more later.

The info about about the 4 cores, is good but where would I do this, is it a switch used with the import_log scripts.

Thanks
Lawrence


#11

[quote=gcpiwik]
Sven2157

Thanks for the feedback, I am only running one site on the server, this is only for the time being. I will expand more later.

The info about about the 4 cores, is good but where would I do this, is it a switch used with the import_log scripts.

Thanks
Lawrence[/quote]

Yes it is an option in the command line for the log parser. It would look like this( 3rd line, after redirects ):


...
for HOUR in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23; do 
sudo /var/www/piwik/misc/log-analytics/import_logs.py \ 
--enable-http-errors --enable-http-redirects --recorders=4 \ 
--exclude-path-from=/home/piwik/excludepaths.txt --url=https://stats.globalcollect.com/piwik/ --idsite=1 \ 
$LOGDIR/access_na.gcsip.com*${YEAR}${MONTH}${DAY}${HOUR}* 
sleep 10 
done 
echo "Parsing Completed" >> $LOG 
date >> $LOG
...

By the way, you do know that you posted your bash with two –enable-http-errors in it, right? Was this a mistake here, or is it that way it is, in your actual script?

Anyway, you can find several other options by running just the import_logs.py Python script, and outputting the results to file/screen.

Sven2157

*** EDIT ***
I created a basic HTML page for the options of the import_logs.py. It is a single HTML page, save it and view straight from your desktop, if you like.
IMPORT_LOGS.PY OPTIONS

Hope that helps! :wink:

Sven2157


#12

[quote=gcpiwik]
Matt

Thanks for the reply, but what do you mean by “You can not ignore these by adding --enable-http-errors --enable-http-redirects --enable-static --enable-bots”

Are you saying that these should be removed for better stats, should they be in the import or should they not be. I am completely new to this and am trying to learn, so please be patient and I am sure taht I will get my head around it some time soon … I hope.

Thanks
Lawrence[/quote]

I saw the same thing in my log, and wondered about that too. However, the answer was right in front of me. So I see what he meant, now.

You posted your log result, several posts up:


Logs import summary 
------------------- 

521 requests imported successfully 
0 requests were downloads 
502 requests ignored: 
0 invalid log lines 
0 requests done by bots, search engines, ... 
0 HTTP errors 
0 HTTP redirects 
502 requests to static resources (css, js, ...) 
0 requests did not match any known site 
0 requests did not match any requested hostname

The script ignored 502 of the requests, because they were requests to CSS, JS, etc, etc… Since you are parsing the Apache Access Logs, EVERYTHING is recorded, and the import_logs.py script, is set to ignore non-relevant requests, or server requests from the server, such as these.

He said that those options, would not hide those requests. Does that help?

Sven2157