Log-Analyse recorded nur ca. 20.000 Zeilen , Rest wird ingnoriert


(O. Herbst) #1

Hallo,

Wir wollen sehr große Log (ca. 20GB - 30GB) mit Hilfe der Log-Analytics und dem Python-Script einlesen.
Das Format wird richtig erkannt, aber es werden nur die ersten ca. 20.000 Records übernommen. Danach werden alle anderen weiteren Log-zeilen als invalid erkannt. Die Zeilen sind aber definitv im gleichen Format wie die davor.
Auch ein splitten der Daten in kleinere Dateien bringt keine Abhilfe. Gibt es eine Limitierung im Python oder im Script oder in Matomo?

Hier die Ausgabe des Scriptes:
14449 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 199 records/sec (avg), 1400 records/sec (current)
20000 lines parsed, 1400 lines recorded, 174 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 155 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 139 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 127 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 116 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 215 records/sec (avg), 1400 records/sec (current)
20000 lines parsed, 2800 lines recorded, 199 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 186 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 174 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 164 records/sec (avg), 0 records/sec (current)

Logs import summary

4160 requests imported successfully
0 requests were downloads
15840 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    15840 invalid log lines
    0 filtered log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

4160 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

Danke im Voraus

MFG


(Fabian Dellwing) #2

Rufe mal bitte das Script mit --debug auf (eventuell auch mehrfach angeben), ob eine sinnvolle Ausgabe kommt.


(O. Herbst) #3

Hallo,

Schon gemacht, das ist die Ausgabe vom --debug. Aber nur ein Teil, hier die komplette Ausgabe:

1 Datei mit 20.000 Zeilen

DST\herbstod1@l0604022:/srv/www/htdocs/piwik/logfiles/verbis/out> python /srv/www/htdocs/piwik/misc/log-analytics/import_logs.py --url=https://piwik-free.web.dst.baintern.de --idsite=5 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --recorders=7 --debug /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log00
2019-04-17 12:35:30,860: [DEBUG] Accepted hostnames: all
2019-04-17 12:35:30,860: [DEBUG] Matomo Tracker API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:35:30,860: [DEBUG] Matomo Analytics API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:35:30,861: [DEBUG] No token-auth specified
2019-04-17 12:35:30,861: [DEBUG] No credentials specified, reading them from “/srv/www/htdocs/piwik/config/config.ini.php”
2019-04-17 12:35:30,964: [DEBUG] Authentication token token_auth is: d54fbafbbcedbfbfe52ac8a9ddd418f4
2019-04-17 12:35:30,964: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2019-04-17 12:35:31,326: [DEBUG] Launched recorder
2019-04-17 12:35:31,326: [DEBUG] Launched recorder
2019-04-17 12:35:31,327: [DEBUG] Launched recorder
2019-04-17 12:35:31,327: [DEBUG] Launched recorder
2019-04-17 12:35:31,328: [DEBUG] Launched recorder
2019-04-17 12:35:31,328: [DEBUG] Launched recorder
2019-04-17 12:35:31,329: [DEBUG] Launched recorder
Parsing log /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log00…
2019-04-17 12:35:31,329: [DEBUG] Detecting the log format
2019-04-17 12:35:31,330: [DEBUG] Check format shoutcast
2019-04-17 12:35:31,330: [DEBUG] Format shoutcast does not match
2019-04-17 12:35:31,330: [DEBUG] Check format iis
2019-04-17 12:35:31,330: [DEBUG] Format iis does not match
2019-04-17 12:35:31,330: [DEBUG] Check format common_complete
2019-04-17 12:35:31,330: [DEBUG] Format common_complete does not match
2019-04-17 12:35:31,330: [DEBUG] Check format amazon_cloudfront
2019-04-17 12:35:31,330: [DEBUG] Format amazon_cloudfront does not match
2019-04-17 12:35:31,330: [DEBUG] Check format verbis
2019-04-17 12:35:31,330: [DEBUG] Format verbis does not match
2019-04-17 12:35:31,330: [DEBUG] Check format w3c_extended
2019-04-17 12:35:31,331: [DEBUG] Format w3c_extended does not match
2019-04-17 12:35:31,331: [DEBUG] Check format ovh
2019-04-17 12:35:31,331: [DEBUG] Check format icecast2
2019-04-17 12:35:31,331: [DEBUG] Format icecast2 does not match
2019-04-17 12:35:31,331: [DEBUG] Check format nginx_json
2019-04-17 12:35:31,331: [DEBUG] Format nginx_json does not match
2019-04-17 12:35:31,331: [DEBUG] Check format elb
2019-04-17 12:35:31,331: [DEBUG] Format elb does not match
2019-04-17 12:35:31,331: [DEBUG] Check format s3
2019-04-17 12:35:31,332: [DEBUG] Format s3 does not match
2019-04-17 12:35:31,332: [DEBUG] Check format common
2019-04-17 12:35:31,332: [DEBUG] Format common does not match
2019-04-17 12:35:31,332: [DEBUG] Check format common_vhost
2019-04-17 12:35:31,332: [DEBUG] Format common_vhost matches
2019-04-17 12:35:31,332: [DEBUG] Format match contains 9 groups
2019-04-17 12:35:31,332: [DEBUG] Check format ncsa_extended
2019-04-17 12:35:31,332: [DEBUG] Format ncsa_extended does not match
2019-04-17 12:35:31,333: [DEBUG] Format common_vhost is the best match
7356 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
8848 lines parsed, 268 lines recorded, 133 records/sec (avg), 268 records/sec (current)
8989 lines parsed, 1715 lines recorded, 570 records/sec (avg), 1447 records/sec (current)
11203 lines parsed, 2426 lines recorded, 605 records/sec (avg), 711 records/sec (current)
11203 lines parsed, 3058 lines recorded, 610 records/sec (avg), 632 records/sec (current)
13510 lines parsed, 4029 lines recorded, 670 records/sec (avg), 971 records/sec (current)
13510 lines parsed, 4985 lines recorded, 711 records/sec (avg), 956 records/sec (current)
15632 lines parsed, 5669 lines recorded, 707 records/sec (avg), 684 records/sec (current)
15907 lines parsed, 6738 lines recorded, 747 records/sec (avg), 1069 records/sec (current)
18167 lines parsed, 7318 lines recorded, 730 records/sec (avg), 580 records/sec (current)
20000 lines parsed, 8538 lines recorded, 775 records/sec (avg), 1220 records/sec (current)
20000 lines parsed, 9003 lines recorded, 749 records/sec (avg), 465 records/sec (current)
20000 lines parsed, 9595 lines recorded, 737 records/sec (avg), 592 records/sec (current)
20000 lines parsed, 10459 lines recorded, 746 records/sec (avg), 864 records/sec (current)
20000 lines parsed, 10653 lines recorded, 709 records/sec (avg), 194 records/sec (current)
20000 lines parsed, 11468 lines recorded, 715 records/sec (avg), 815 records/sec (current)
20000 lines parsed, 11468 lines recorded, 673 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 11468 lines recorded, 636 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 11969 lines recorded, 629 records/sec (avg), 501 records/sec (current)

Logs import summary

12334 requests imported successfully
100 requests were downloads
7666 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    7666 invalid log lines
    0 filtered log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

12334 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 19 seconds
Requests imported per second: 620.23 requests per second

Processing your log data

In order for your logs to be processed by Matomo, you may need to run the following command:
 ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='https://piwik-free.web.dst.baintern.de'

2te Datei mit 20.000 Zeilen:
DST\herbstod1@l0604022:/srv/www/htdocs/piwik/logfiles/verbis/out> python /srv/www/htdocs/piwik/misc/log-analytics/import_logs.py --url=https://piwik-free.web.dst.baintern.de --idsite=5 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --recorders=7 --debug /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log01
2019-04-17 12:36:00,577: [DEBUG] Accepted hostnames: all
2019-04-17 12:36:00,577: [DEBUG] Matomo Tracker API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:36:00,578: [DEBUG] Matomo Analytics API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:36:00,578: [DEBUG] No token-auth specified
2019-04-17 12:36:00,578: [DEBUG] No credentials specified, reading them from “/srv/www/htdocs/piwik/config/config.ini.php”
2019-04-17 12:36:00,644: [DEBUG] Authentication token token_auth is: d54fbafbbcedbfbfe52ac8a9ddd418f4
2019-04-17 12:36:00,644: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2019-04-17 12:36:00,868: [DEBUG] Launched recorder
2019-04-17 12:36:00,869: [DEBUG] Launched recorder
2019-04-17 12:36:00,869: [DEBUG] Launched recorder
2019-04-17 12:36:00,870: [DEBUG] Launched recorder
2019-04-17 12:36:00,870: [DEBUG] Launched recorder
2019-04-17 12:36:00,871: [DEBUG] Launched recorder
2019-04-17 12:36:00,872: [DEBUG] Launched recorder
Parsing log /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log01…
2019-04-17 12:36:00,873: [DEBUG] Detecting the log format
2019-04-17 12:36:00,873: [DEBUG] Check format shoutcast
2019-04-17 12:36:00,873: [DEBUG] Format shoutcast does not match
2019-04-17 12:36:00,873: [DEBUG] Check format iis
2019-04-17 12:36:00,873: [DEBUG] Format iis does not match
2019-04-17 12:36:00,874: [DEBUG] Check format common_complete
2019-04-17 12:36:00,874: [DEBUG] Format common_complete does not match
2019-04-17 12:36:00,874: [DEBUG] Check format amazon_cloudfront
2019-04-17 12:36:00,874: [DEBUG] Format amazon_cloudfront does not match
2019-04-17 12:36:00,874: [DEBUG] Check format verbis
2019-04-17 12:36:00,874: [DEBUG] Format verbis does not match
2019-04-17 12:36:00,874: [DEBUG] Check format w3c_extended
2019-04-17 12:36:00,874: [DEBUG] Format w3c_extended does not match
2019-04-17 12:36:00,875: [DEBUG] Check format ovh
2019-04-17 12:36:00,875: [DEBUG] Check format icecast2
2019-04-17 12:36:00,875: [DEBUG] Format icecast2 does not match
2019-04-17 12:36:00,875: [DEBUG] Check format nginx_json
2019-04-17 12:36:00,875: [DEBUG] Format nginx_json does not match
2019-04-17 12:36:00,875: [DEBUG] Check format elb
2019-04-17 12:36:00,875: [DEBUG] Format elb does not match
2019-04-17 12:36:00,876: [DEBUG] Check format s3
2019-04-17 12:36:00,876: [DEBUG] Format s3 does not match
2019-04-17 12:36:00,876: [DEBUG] Check format common
2019-04-17 12:36:00,876: [DEBUG] Format common matches
2019-04-17 12:36:00,876: [DEBUG] Format match contains 8 groups
2019-04-17 12:36:00,876: [DEBUG] Check format common_vhost
2019-04-17 12:36:00,877: [DEBUG] Format common_vhost does not match
2019-04-17 12:36:00,877: [DEBUG] Check format ncsa_extended
2019-04-17 12:36:00,877: [DEBUG] Format ncsa_extended does not match
2019-04-17 12:36:00,877: [DEBUG] Format common is the best match
14449 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 199 records/sec (avg), 1400 records/sec (current)
20000 lines parsed, 1400 lines recorded, 174 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 155 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 139 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 127 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 1400 lines recorded, 116 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 215 records/sec (avg), 1400 records/sec (current)
20000 lines parsed, 2800 lines recorded, 199 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 186 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 174 records/sec (avg), 0 records/sec (current)
20000 lines parsed, 2800 lines recorded, 164 records/sec (avg), 0 records/sec (current)

Logs import summary

4160 requests imported successfully
0 requests were downloads
15840 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    15840 invalid log lines
    0 filtered log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

4160 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 17 seconds
Requests imported per second: 237.42 requests per second

Processing your log data

In order for your logs to be processed by Matomo, you may need to run the following command:
 ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='https://piwik-free.web.dst.baintern.de'

Gruß


(Fabian Dellwing) #4

Ich fürchte das sind nicht genug --debug :smiley:

Mach mal bitte noch 2 dazu.


(O. Herbst) #5

Ist das ein Scherz? Da gibt es keine weiteren Ausgaben und/oder Informationen.

Gruß


(Fabian Dellwing) #6

Nein, kein Scherz. Wenn das nichts bringt kann ich jetzt so direkt auch nichts mehr sagen. Könntest du vllt eine von den 20k Zeilen Dateien mal hier hochladen (IPs und eventuelle sensitive Felder durch dummy-Werte ersetzen)?

Ansonsten hat vllt @Lukas noch eine Idee.


(O. Herbst) #7

Hallo,

hier die Ausgabe vom nächsten LOG:

DST\herbstod1@l0604022:/etc/apache2/vhosts.d> python /srv/www/htdocs/piwik/misc/log-analytics/import_logs.py --url=https://piwik-free.web.dst.baintern.de --idsite=5  --enable-http-errors --enable-http-redirects --enable-static --enable-bots --recorders=7 --debug /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log02
2019-04-17 12:48:32,760: [DEBUG] Accepted hostnames: all
2019-04-17 12:48:32,760: [DEBUG] Matomo Tracker API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:48:32,760: [DEBUG] Matomo Analytics API URL is: https://piwik-free.web.dst.baintern.de
2019-04-17 12:48:32,760: [DEBUG] No token-auth specified
2019-04-17 12:48:32,760: [DEBUG] No credentials specified, reading them from "/srv/www/htdocs/piwik/config/config.ini.php"
2019-04-17 12:48:32,822: [DEBUG] Authentication token token_auth is: d54fbafbbcedbfbfe52ac8a9ddd418f4
2019-04-17 12:48:32,822: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2019-04-17 12:48:33,048: [DEBUG] Launched recorder
2019-04-17 12:48:33,048: [DEBUG] Launched recorder
2019-04-17 12:48:33,049: [DEBUG] Launched recorder
2019-04-17 12:48:33,049: [DEBUG] Launched recorder
2019-04-17 12:48:33,049: [DEBUG] Launched recorder
2019-04-17 12:48:33,050: [DEBUG] Launched recorder
2019-04-17 12:48:33,050: [DEBUG] Launched recorder
Parsing log /srv/www/htdocs/piwik/logfiles/verbis/out/verbis_20190411_geteilt.log02...
2019-04-17 12:48:33,051: [DEBUG] Detecting the log format
2019-04-17 12:48:33,051: [DEBUG] Check format shoutcast
2019-04-17 12:48:33,051: [DEBUG] Format shoutcast does not match
2019-04-17 12:48:33,051: [DEBUG] Check format iis
2019-04-17 12:48:33,051: [DEBUG] Format iis does not match
2019-04-17 12:48:33,051: [DEBUG] Check format common_complete
2019-04-17 12:48:33,051: [DEBUG] Format common_complete does not match
2019-04-17 12:48:33,052: [DEBUG] Check format amazon_cloudfront
2019-04-17 12:48:33,052: [DEBUG] Format amazon_cloudfront does not match
2019-04-17 12:48:33,052: [DEBUG] Check format verbis
2019-04-17 12:48:33,052: [DEBUG] Format verbis does not match
2019-04-17 12:48:33,052: [DEBUG] Check format w3c_extended
2019-04-17 12:48:33,052: [DEBUG] Format w3c_extended does not match
2019-04-17 12:48:33,052: [DEBUG] Check format ovh
2019-04-17 12:48:33,052: [DEBUG] Check format icecast2
2019-04-17 12:48:33,052: [DEBUG] Format icecast2 does not match
2019-04-17 12:48:33,052: [DEBUG] Check format nginx_json
2019-04-17 12:48:33,052: [DEBUG] Format nginx_json does not match
2019-04-17 12:48:33,053: [DEBUG] Check format elb
2019-04-17 12:48:33,053: [DEBUG] Format elb does not match
2019-04-17 12:48:33,053: [DEBUG] Check format s3
2019-04-17 12:48:33,053: [DEBUG] Format s3 does not match
2019-04-17 12:48:33,053: [DEBUG] Check format common
2019-04-17 12:48:33,053: [DEBUG] Format common does not match
2019-04-17 12:48:33,053: [DEBUG] Check format common_vhost
2019-04-17 12:48:33,053: [DEBUG] Format common_vhost matches
2019-04-17 12:48:33,054: [DEBUG] Format match contains 9 groups
2019-04-17 12:48:33,054: [DEBUG] Check format ncsa_extended
2019-04-17 12:48:33,054: [DEBUG] Format ncsa_extended does not match
2019-04-17 12:48:33,054: [DEBUG] Format common_vhost is the best match

Logs import summary
-------------------

    0 requests imported successfully
    56 requests were downloads
    20000 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        20000 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        0 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    0 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Matomo, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='https://piwik-free.web.dst.baintern.de'

hier 3 Zeilen LOG als Beispiel:

<DOMAIN.DE> 130.49.xxx.xxx - JREJczW5qh8NjhWX7l_Lb3guw0eVxLnG4m7c0oZgRAXORFHooH8w!235689862; [2019-04-11:00:52:31 -100] "GET /verbis/css/ARGl_custom.css HTTP/1.0" 200 1019 - -
<DOMAIN.DE> 130.215.xxx.xxx -  [2019-04-11:00:52:31 -100] "POST /verbis-ws/BewerberuebersichtService HTTP/1.0" 200 200 - -
<DOMAIN.DE> 130.49.xxx.xxx - JREJczW5qh8NjhWX7l_Lb3guw0eVxLnG4m7c0oZgRAXORFHooH8w!235689862; [2019-04-11:00:52:32 -100] "GET /verbis/app/VMKa_VermerkeAuflisten.BetriebvermerkeAuflisten?Betrieb=1000000001891056014&ausHauptNav=3&conversationId=862e04749a9a7caebd408dc62b0251d456740091337115d619c140b21c9adc9a-1&DoppelklickFilter.RequestKey=b251e58e-66a8-465e-b6e9-029589dc7a6a&nrc=9 HTTP/1.0" 200 27889 - -

Gruß


(Fabian Dellwing) #8

Leider werden die Zeilen vom Forum zu stark verändert, ich bekomme so keinen Import hin.


(Lukas Winkler) #9

Hi,

Ich habe Log Analytics noch nie verwendet. Obskure Python2-Bugs würde ich aber nicht ausschließen.

Ich habe mal einen Codeblock daraus gemacht. Vielleicht ist es so besser.


(Fabian Dellwing) #10

Ja, jetzt konnte ich das Problem auch lösen. Gut, hätte man auch vorher haben können, wenn der Nutzer nicht behauptet hätte:

Denn natürlich gibts da weitere Ausgaben, er gibt nämlich die Zeilen aus die nicht passen:

Parsing log test1.log...
2019-04-23 09:02:41,151: [DEBUG] Invalid line detected (line did not match): abc.de 130.215.0.0 -  [2019-04-11:00:52:31 -100] "POST /verbis-ws/BewerberuebersichtService HTTP/1.0" 200 200 - -

2019-04-23 09:02:41,154: [DEBUG] Invalid line detected (invalid date or invalid format: time data '2019-04-11:00:52:32' does not match format '%Y-%b-%d:%H:%M:%S'): abc.de 130.49.0.0 - JREJczW5qh8NjhWX7l_Lb3guw0eVxLnG4m7c0oZgRAXORFHooH8w!235689862; [2019-04-11:00:52:32 -100] "GET /verbis/app/VMKa_VermerkeAuflisten.BetriebvermerkeAuflisten?Betrieb=1000000001891056014&ausHauptNav=3&conversationId=862e04749a9a7caebd408dc62b0251d456740091337115d619c140b21c9adc9a-1&DoppelklickFilter.RequestKey=b251e58e-66a8-465e-b6e9-029589dc7a6a&nrc=9 HTTP/1.0" 200 27889 - -

Damit diese Logs importiert werden können, müssen einfach folgende 2 CLI Parameter übergeben werden:

--log-format-regex='(?P<host>[\w\-\.]*)(?::\d+)?\s+(?P<ip>[\w*.:-]+)\s+\S+\s+(?P<userid>\S+)?\s+\[(?P<date>.*?)\s+(?P<timezone>.*?)\]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)' --log-date-format='%Y-%m-%d:%H:%M:%S'

Bitte. Danke.


(O. Herbst) #11

Hallo zusammen,

erst mal danke für die Hilfe.
@ Lukas, fdellwing: Es gab tatsächlich keine weitere Ausgabe bei der Verarbeitung bei uns auf den Servern… Das Python-Script ist mehrmals durchgelaufen ohne eine solche Ausgabe. Daher würde ich gerne Wissen wie du an die Ausgabe gekommen bist. Die Option --debug haben wir standardmäßig im Aufruf dabei. Hätte ich diese Ausgabe gehabt, wäre mir der Fehler sofort aufgefallen.
Am Anfang der 20GB großen Datei steht das Datum richtig drin: [10/Apr/2019:22:31:55 -100]. der eigentliche Fehler ist aber das es in der Datei 2 unterschiedliche Datum gibt, den 10.04 und den 11.04. Das haben wir nicht erwartet und nicht gesehen. Das wäre dann unser eigentlicher Fehler.
Trotzdem danke noch mal für die Hilfe. Eure Hinweise haben uns gewaltig geholfen.


(Fabian Dellwing) #12

Hier mal der komplette Output:

[11:24 root@drupal-cms ~] > /var/www/piwik/misc/log-analytics/import_logs.py --dry-run --recorders=4 --url=https://stats.promato.de --idsite=2 --debug --debug test1.log
2019-04-23 11:25:02,098: [DEBUG] Accepted hostnames: all
2019-04-23 11:25:02,098: [DEBUG] Matomo Tracker API URL is: https://stats.promato.de
2019-04-23 11:25:02,098: [DEBUG] Matomo Analytics API URL is: https://stats.promato.de
2019-04-23 11:25:02,098: [DEBUG] No token-auth specified
2019-04-23 11:25:02,098: [DEBUG] No credentials specified, reading them from "/var/www/piwik/config/config.ini.php"
2019-04-23 11:25:02,154: [DEBUG] Authentication token token_auth is: <redacted>
2019-04-23 11:25:02,155: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2019-04-23 11:25:02,610: [DEBUG] Launched recorder
2019-04-23 11:25:02,611: [DEBUG] Launched recorder
2019-04-23 11:25:02,611: [DEBUG] Launched recorder
2019-04-23 11:25:02,613: [DEBUG] Launched recorder
Parsing log test1.log...
2019-04-23 11:25:02,614: [DEBUG] Detecting the log format
2019-04-23 11:25:02,614: [DEBUG] Check format shoutcast
2019-04-23 11:25:02,614: [DEBUG] Format shoutcast does not match
2019-04-23 11:25:02,614: [DEBUG] Check format iis
2019-04-23 11:25:02,614: [DEBUG] Format iis does not match
2019-04-23 11:25:02,614: [DEBUG] Check format common_complete
2019-04-23 11:25:02,614: [DEBUG] Format common_complete does not match
2019-04-23 11:25:02,615: [DEBUG] Check format amazon_cloudfront
2019-04-23 11:25:02,615: [DEBUG] Format amazon_cloudfront does not match
2019-04-23 11:25:02,615: [DEBUG] Check format w3c_extended
2019-04-23 11:25:02,615: [DEBUG] Format w3c_extended does not match
2019-04-23 11:25:02,615: [DEBUG] Check format ovh
2019-04-23 11:25:02,615: [DEBUG] Check format icecast2
2019-04-23 11:25:02,615: [DEBUG] Format icecast2 does not match
2019-04-23 11:25:02,615: [DEBUG] Check format nginx_json
2019-04-23 11:25:02,615: [DEBUG] Format nginx_json does not match
2019-04-23 11:25:02,615: [DEBUG] Check format elb
2019-04-23 11:25:02,615: [DEBUG] Format elb does not match
2019-04-23 11:25:02,615: [DEBUG] Check format s3
2019-04-23 11:25:02,616: [DEBUG] Format s3 does not match
2019-04-23 11:25:02,616: [DEBUG] Check format common
2019-04-23 11:25:02,616: [DEBUG] Format common does not match
2019-04-23 11:25:02,616: [DEBUG] Check format common_vhost
2019-04-23 11:25:02,616: [DEBUG] Format common_vhost matches
2019-04-23 11:25:02,616: [DEBUG] Format match contains 9 groups
2019-04-23 11:25:02,616: [DEBUG] Check format ncsa_extended
2019-04-23 11:25:02,616: [DEBUG] Format ncsa_extended does not match
2019-04-23 11:25:02,616: [DEBUG] Format common_vhost is the best match
2019-04-23 11:25:02,616: [DEBUG] Invalid line detected (line did not match): abc.de 130.215.0.0 -  [2019-04-11:00:52:31 -100] "POST /verbis-ws/BewerberuebersichtService HTTP/1.0" 200 200 - -

2019-04-23 11:25:02,620: [DEBUG] Invalid line detected (invalid date or invalid format: time data '2019-04-11:00:52:32' does not match format '%d/%b/%Y:%H:%M:%S'): abc.de 130.49.0.0 - JREJczW5qh8NjhWX7l_Lb3guw0eVxLnG4m7c0oZgRAXORFHooH8w!235689862; [2019-04-11:00:52:32 -100] "GET /verbis/app/VMKa_VermerkeAuflisten.BetriebvermerkeAuflisten?Betrieb=1000000001891056014&ausHauptNav=3&conversationId=862e04749a9a7caebd408dc62b0251d456740091337115d619c140b21c9adc9a-1&DoppelklickFilter.RequestKey=b251e58e-66a8-465e-b6e9-029589dc7a6a&nrc=9 HTTP/1.0" 200 27889 - -


Logs import summary
-------------------

    0 requests imported successfully
    0 requests were downloads
    3 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        2 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        1 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    0 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Matomo, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='https://stats.promato.de'

(O. Herbst) #13

Hallo fdellwing,

jetzt ist der Groschen gefallen, man muss --debug 2 mal angeben um die “Invalid Lines” zu bekommen.
Darauf muss man auch erst mal kommen.

Danke für den Tip.

VG


(Fabian Dellwing) #14

:wink:

Viel Erfolg weiterhin.