Import_logs.py logging changed in Matomo 3.13.3

Hi

When i import multiple log files with import_log script log was more detailed:

Parsing log /storage/log/apache/stari_kompresirani_2008/www.isvu.hr_443-access_log.20080103.gz…
539467 lines parsed, 59522 lines recorded, 1854 records/sec (avg), 2402 records/sec (current)
555472 lines parsed, 61213 lines recorded, 1849 records/sec (avg), 1691 records/sec (current)
572473 lines parsed, 63711 lines recorded, 1868 records/sec (avg), 2498 records/sec (current)
589538 lines parsed, 65511 lines recorded, 1866 records/sec (avg), 1800 records/sec (current)
608225 lines parsed, 67311 lines recorded, 1864 records/sec (avg), 1800 records/sec (current)
627018 lines parsed, 69111 lines recorded, 1862 records/sec (avg), 1800 records/sec (current)
645027 lines parsed, 71628 lines recorded, 1880 records/sec (avg), 2517 records/sec (current)
663086 lines parsed, 73300 lines recorded, 1874 records/sec (avg), 1672 records/sec (current)
679346 lines parsed, 75291 lines recorded, 1877 records/sec (avg), 1991 records/sec (current)
696613 lines parsed, 77714 lines recorded, 1890 records/sec (avg), 2423 records/sec (current)

Parsing log /storage/log/apache/stari_kompresirani_2008/www.isvu.hr_443-access_log.20080104.gz…
908242 lines parsed, 101511 lines recorded, 1910 records/sec (avg), 1800 records/sec (current)
924620 lines parsed, 104200 lines recorded, 1924 records/sec (avg), 2689 records/sec (current)
942206 lines parsed, 106346 lines recorded, 1928 records/sec (avg), 2146 records/sec (current)
960453 lines parsed, 108044 lines recorded, 1924 records/sec (avg), 1698 records/sec (current)
977942 lines parsed, 109985 lines recorded, 1924 records/sec (avg), 1941 records/sec (current)
996038 lines parsed, 111198 lines recorded, 1912 records/sec (avg), 1213 records/sec (current)
1014
1245941 lines parsed, 141637 lines recorded, 1935 records/sec (avg), 2660 records/sec (current)
Parsing log /storage/log/apache/stari_kompresirani_2008/www.isvu.hr_443-access_log.20080105.gz…
1256530 lines parsed, 143630 lines recorded, 1936 records/sec (avg), 1993 records/sec (current)
1275523 lines parsed, 144733 lines recorded, 1924 records/sec (avg), 1103 records/sec (current)
1294008 lines parsed, 146495 lines recorded, 1922 records/sec (avg), 1762 records/sec (current)

in new version I only have the final result when the script is done:

Logs import summary

341972 requests imported successfully
542 requests were downloads
2316284 requests ignored:
    23695 HTTP errors
    30083 HTTP redirects
    0 invalid log lines
    0 filtered log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    669 requests done by bots, search engines...
    2261837 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

importing logs can take hours and in the event of a break it is very important to see on which specific log the import has stopped.

Any ide? Thank you very much!

Hi Billy,

Just wondering… you are reading at blazing speed. Could you share a little about the setup and how you read back the logs?

We have created a log_merge and log_split script that helps when you have multiple frontend servers running to loadbalancing incoming tracking traffic.

Eh, and that is the question you are asking, because I didn’t understand? Are you wondering how you can see where an importer stops and why?

Hi

speed varies, i think speed is such because of smaller log files, i was using:
/usr/local/bin/import_logs.py --url=xxxxxxxx --token-auth=xxxxxxxxx --idsite=42 --recorders=6 --recorder-max-payload-size=300 --output=/var/log/matomo/www.isvu.hr_matomo.log /var/log/apache2/www.isvu.hr-ssl-access.log-*.gz >/dev/null

with this kind of import of multiple files, even bigger files 100MB + gziped it worked fine, i had no need to split log files.
question is when you import multiple files with this import_log.py script before each action was recorded, not only he final sum and you culd see on what step you were on by tailing the output log.