Bash Script / Cron Job


#1

Hi

I have a file in the /ect/cron.d folder, which calls a bash script.

5 4 * * * root bash /scripts/piwik_import.sh

this runs the bash at 4.05am everyday

The content of the bash script is


#!/bin/bash

#move already imported log files to temp folder
for file in $(cat /rotatefiles.txt); do mv "/media/logfileshare/$file" /media/logfileshare/done; done

wait

current_time=$(date "+%Y.%m.%d_%H.%M.%S")

#import logs 
python /var/www/html/misc/log-analytics/import_logs.py --login=admin --password=pass --idsite=1 --recorders=6 --debug --output=/var/log/piwik/piwik_import.$current_time.log --url=http://devpiwik/ /media/logfileshare/*.* 

wait

current_time=$(date "+%Y.%m.%d_%H.%M.%S")

#archive all logs to show graphs and reports
su root -c "/usr/bin/php5 /var/www/html/console core:archive --url=http://devpiwik/ > /var/log/piwik/piwik-archive.$current_time.log"

#delete archive and import log files older than 14 days
find /var/log/piwik/ -type f -mtime +14 -name '*.log' -execdir rm -- {} \;

#delete lighttpd log files older than 7 days
find /var/log/lighttpd/ -type f -mtime +7 -name '*.log' -execdir rm -- {} \;

wait

#check if import directory is empty or not
if [ "$(ls -A /media/logfileshare/*.log)" ]; then
	#if not empty append filename to text file
	cd /media/logfileshare
	ls *.log >> /rotatefiles.txt
else
	#if empty do nothing
	echo "$DIR is Empty"
fi

wait

#move back imported log files to original location
mv /media/logfileshare/done/*.* /media/logfileshare


Everything is working fine (wel it seems to)

I am concerned about the following command


#import logs 
python /var/www/html/misc/log-analytics/import_logs.py --login=admin --password=pass --idsite=1 --recorders=6 --debug --output=/var/log/piwik/piwik_import.$current_time.log --url=http://devpiwik/ /media/logfileshare/*.* 

So when i run the whole script manually from ssh shell

bash /scripts/piwik_import.sh

the piwik_import.$current_time.log file looks FINE. it shows all the steps and the lines its doing for e.g


Parsing log /media/logfileshare/u_ex150715.log...
2015-07-17 08:18:18,366: [DEBUG] Detecting the log format
2015-07-17 08:18:18,366: [DEBUG] Check format icecast2
2015-07-17 08:18:18,367: [DEBUG] Format icecast2 does not match
2015-07-17 08:18:18,367: [DEBUG] Check format w3c_extended
2015-07-17 08:18:18,367: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<generation_time_secs>[.\d]+)
2015-07-17 08:18:18,371: [DEBUG] Format w3c_extended matches
2015-07-17 08:18:18,372: [DEBUG] Format match contains 8 groups
2015-07-17 08:18:18,372: [DEBUG] Check format iis
2015-07-17 08:18:18,372: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?P<__win32_status>\S+)\s+(?P<generation_time_milli>[.\d]+)
2015-07-17 08:18:18,375: [DEBUG] Format iis matches
2015-07-17 08:18:18,376: [DEBUG] Format match contains 9 groups
2015-07-17 08:18:18,376: [DEBUG] Check format common
2015-07-17 08:18:18,376: [DEBUG] Format common does not match
2015-07-17 08:18:18,376: [DEBUG] Check format common_vhost
2015-07-17 08:18:18,376: [DEBUG] Format common_vhost does not match
2015-07-17 08:18:18,376: [DEBUG] Check format nginx_json
2015-07-17 08:18:18,377: [DEBUG] Format nginx_json does not match
2015-07-17 08:18:18,377: [DEBUG] Check format s3
2015-07-17 08:18:18,377: [DEBUG] Format s3 does not match
2015-07-17 08:18:18,377: [DEBUG] Check format ncsa_extended
2015-07-17 08:18:18,377: [DEBUG] Format ncsa_extended does not match
2015-07-17 08:18:18,377: [DEBUG] Check format common_complete
2015-07-17 08:18:18,377: [DEBUG] Format common_complete does not match
2015-07-17 08:18:18,378: [DEBUG] Check format amazon_cloudfront
2015-07-17 08:18:18,378: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?:rtmp:/)?(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<generation_time_secs>[.\d]+)
2015-07-17 08:18:18,382: [DEBUG] Format amazon_cloudfront matches
2015-07-17 08:18:18,382: [DEBUG] Format match contains 8 groups
2015-07-17 08:18:18,382: [DEBUG] Format iis is the best match
4914 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
13559 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
22251 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 197 lines recorded, 27 records/sec (avg), 197 records/sec (current)
24956 lines parsed, 197 lines recorded, 24 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 299 lines recorded, 33 records/sec (avg), 102 records/sec (current)
24956 lines parsed, 299 lines recorded, 29 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 299 lines recorded, 27 records/sec (avg), 0 records/sec (current)
24956 lines parsed, 593 lines recorded, 49 records/sec (avg), 294 records/sec (current)
...................

and in the end it shows the completion summary

but when the cron job runs every morning
all I see in the logs is


2015-07-17 04:05:02,722: [DEBUG] Accepted hostnames: all
2015-07-17 04:05:02,723: [DEBUG] Piwik URL is: http://devpiwik/
2015-07-17 04:05:02,723: [DEBUG] No token-auth specified
2015-07-17 04:05:02,723: [DEBUG] Using credentials: (login = piwikadmin, password = 567262d3ddbb8dd6542d6a047c035bf5)
2015-07-17 04:05:04,215: [DEBUG] Authentication token token_auth is: 292f1d36e5a805d7d3a1b258bd24b0e6
2015-07-17 04:05:04,217: [DEBUG] Resolver: static
2015-07-17 04:05:04,337: [DEBUG] Launched recorder
2015-07-17 04:05:04,338: [DEBUG] Launched recorder
2015-07-17 04:05:04,339: [DEBUG] Launched recorder
2015-07-17 04:05:04,340: [DEBUG] Launched recorder
2015-07-17 04:05:04,341: [DEBUG] Launched recorder
2015-07-17 04:05:04,342: [DEBUG] Launched recorder
2015-07-17 04:05:04,379: [DEBUG] Detecting the log format
2015-07-17 04:05:04,380: [DEBUG] Check format icecast2
2015-07-17 04:05:04,380: [DEBUG] Format icecast2 does not match
2015-07-17 04:05:04,380: [DEBUG] Check format w3c_extended
2015-07-17 04:05:04,381: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<generation_time_secs>[.\d]+)
2015-07-17 04:05:04,386: [DEBUG] Format w3c_extended matches
2015-07-17 04:05:04,386: [DEBUG] Format match contains 8 groups
2015-07-17 04:05:04,386: [DEBUG] Check format iis
2015-07-17 04:05:04,386: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?P<__win32_status>\S+)\s+(?P<generation_time_milli>[.\d]+)
2015-07-17 04:05:04,390: [DEBUG] Format iis matches
2015-07-17 04:05:04,390: [DEBUG] Format match contains 9 groups
2015-07-17 04:05:04,391: [DEBUG] Check format common
2015-07-17 04:05:04,391: [DEBUG] Format common does not match
2015-07-17 04:05:04,391: [DEBUG] Check format common_vhost
2015-07-17 04:05:04,391: [DEBUG] Format common_vhost does not match
2015-07-17 04:05:04,391: [DEBUG] Check format nginx_json
2015-07-17 04:05:04,391: [DEBUG] Format nginx_json does not match
2015-07-17 04:05:04,391: [DEBUG] Check format s3
2015-07-17 04:05:04,392: [DEBUG] Format s3 does not match
2015-07-17 04:05:04,392: [DEBUG] Check format ncsa_extended
2015-07-17 04:05:04,392: [DEBUG] Format ncsa_extended does not match
2015-07-17 04:05:04,392: [DEBUG] Check format common_complete
2015-07-17 04:05:04,392: [DEBUG] Format common_complete does not match
2015-07-17 04:05:04,392: [DEBUG] Check format amazon_cloudfront
2015-07-17 04:05:04,393: [DEBUG] Based on 'Fields:' line, computed regex to be (?P<date>^\d+[-\d+]+\s+[\d+:]+)[.\d]*?\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?:rtmp:/)?(?P<path>/\S*)\s+(?P<query_string>\S*)\s+(?:".*?"|\S+)\s+(?P<userid>\S+)\s+"?(?P<ip>[\d*.-]*)"?\s+(?P<user_agent>".*?"|\S+)\s+(?P<status>\d+)\s+(?:".*?"|\S+)\s+(?:".*?"|\S+)\s+(?P<generation_time_secs>[.\d]+)
2015-07-17 04:05:04,396: [DEBUG] Format amazon_cloudfront matches
2015-07-17 04:05:04,397: [DEBUG] Format match contains 8 groups
2015-07-17 04:05:04,397: [DEBUG] Format iis is the best match

Logs import summary
-------------------

    115872 requests imported successfully
    942 requests were downloads
    687292 requests ignored:
        259323 HTTP errors
        2446 HTTP redirects
        12 invalid log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        238107 requests done by bots, search engines...
        187404 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    115872 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 5777 seconds
    Requests imported per second: 20.05 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Piwik, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='http://devpiwik/'


As you can see it does not show the lines being processed.
I want to know why this is happening. Is that a error.problem.??

How do i fix it. ??

Any help will be appreciated.

Many thanks