The prodos_log_processor.py is a python script which is part of the SiteController software. It processes downloaded memodata (log-) files from the Prodos webserver in a specific way. The log processor must be launched by the JobProcessor of the SiteController. Otherwise environmental settings would be missing.
Once started the log processor scans a directory for downloaded memodata files. These memodata files are zipped and contain a bunch of other log files. In normal case there should be one (zipped) memodata file. If there are more, they all would be processed in a loop. Within this loop following processing steps are done
- uncompress the (zipped) memodata file
- select the file(s), which filenames match specific pattern
- store the selected files in a destination directory
- remove the downloaded file
- purging the destination directory (keep a maximum amount of stored files in the destination directory - delete the oldest files, which exceed the maximum amount)
The prodos_log_processor.py script terminates after these steps (or the loop of these steps if there were more downloaded files).
Parameters of the log processor script
The log processor accepts different parameters, they all have a default value if not specified.
parameter | description | default |
---|---|---|
-s , --src | The directory to search for download files. | /opt/azeti/SiteController/tmp |
-p , --pattern | Search pattern to identify the download files. | dl-*.zip |
-t , --target | Search pattern to identify the files to extract from the zipped download file | *_MDH3_*.txt |
-d , --dst | Destination directory to store the extracted files. | /home/azeti |
-k , --keep | The maximum number of extracted files to keep in the destination directory. All files, which filenames match the target search pattern are noted | 300 |
Search pattern
The search pattern are in Unix shell style
* | matches everything |
? | matches any single character |
[seq] | matches any character in seq |
[!seq] | matches any char not in seq |
Configuration of the prodos log processor
The prodos_log_processor.py script requires the python environment of the SiteController. It is also designed to be launched as a job during the execution of a SiteController action. That's why it is configured within the SiteController.cfg in the section [remote_exec_calls]
:
... [remote_exec_calls] # provide key=value pairs to define remote commands that could be executed on # this system via a job ... process_memodata=/opt/azeti/SiteController/src/prodos_log_processor.py --src=/opt/azeti/SiteController/tmp --pattern=dl-*.zip --target=*_MDH3_*.txt --dst=/home/azeti --keep=300 ...
All parameters in the example snippet above are at the default state, a configuration like this would have the same result:
[remote_exec_calls] process_memodata=/opt/azeti/SiteController/src/prodos_log_processor.py
Debug information about the prodos log processor
Because the prodos_log_processor.py is started by the JobProcessor this can be observed in the log file of this module.
... 2019-04-10 03:11:47,694:7604:[JobProcessor.py:227]:INFO:-------- Job Started (mqtt) --------- 2019-04-10 03:11:47,695:7604:[JobProcessor.py:228]:DEBUG:Received job from mqtt - topic: cloud/AluPress_ProDos_1/jobs/remote_exec 2019-04-10 03:11:47,696:7604:[JobProcessor.py:364]:DEBUG:HandleSimpleJob() 2019-04-10 03:11:47,697:7604:[JobProcessor.py:293]:INFO:job is a remote_exec call "process_memodata" 2019-04-10 03:11:47,699:7604:[JobProcessor.py:307]:DEBUG:cmd: ['/opt/azeti/SiteController/src/prodos_log_processor.py', '--src=/opt/azeti/SiteController/tmp', '--pattern=dl-*.zip', '--target=*_MDH3_*.txt', '--dst=/home/azeti', '--keep=20'] 2019-04-10 03:11:48,031:7604:[JobProcessor.py:56]:DEBUG:process 8247 finished with status 0 2019-04-10 03:11:48,033:7604:[JobProcessor.py:318]:DEBUG:output: successfully executed ...
The prodos_log_processor.py script itself has also an own log file.
... 2019-04-16 11:12:48,015:23478:[prodos_log_processor.py:157]:DEBUG:------------------------------------------ 2019-04-16 11:12:48,016:23478:[prodos_log_processor.py:158]:DEBUG:Process started 2019-04-16 11:12:48,016:23478:[prodos_log_processor.py:159]:DEBUG:Source directory to process: /opt/azeti/SiteController/tmp 2019-04-16 11:12:48,017:23478:[prodos_log_processor.py:160]:DEBUG:Source file pattern to process: dl-*.zip 2019-04-16 11:12:48,017:23478:[prodos_log_processor.py:161]:DEBUG:files to keep: 20 2019-04-16 11:12:48,018:23478:[prodos_log_processor.py:107]:DEBUG:About to process dl-rest_test-2019-04-16T11:12:47.435Z.zip 2019-04-16 11:12:48,020:23478:[prodos_log_processor.py:81]:DEBUG:['12136_650_MDH3_2019-04-16_12-24-18.txt'] 2019-04-16 11:12:48,024:23478:[prodos_log_processor.py:116]:DEBUG:processed dl-rest_test-2019-04-16T11:12:47.435Z.zip 2019-04-16 11:12:48,024:23478:[prodos_log_processor.py:118]:DEBUG:files to keep: 20 2019-04-16 11:12:48,025:23478:[prodos_log_processor.py:33]:DEBUG:21 files with pattern *_MDH3_*.txt in /home/azeti 2019-04-16 11:12:48,026:23478:[prodos_log_processor.py:55]:DEBUG:removed /home/azeti/12136_650_MDH3_2019-04-16_11-23-12.txt ... 2019-04-16 12:34:41,092:28021:[prodos_log_processor.py:157]:DEBUG:------------------------------------------ 2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:158]:DEBUG:Process started 2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:159]:DEBUG:Source directory to process: /opt/azeti/SiteController/tmp 2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:160]:DEBUG:Source file pattern to process: dl-*.zip 2019-04-16 12:34:41,094:28021:[prodos_log_processor.py:161]:DEBUG:files to keep: 20 2019-04-16 12:34:41,095:28021:[prodos_log_processor.py:170]:WARNING:0 files to process, should be one! 2019-04-16 14:35:20,655:1613:[prodos_log_processor.py:157]:DEBUG:------------------------------------------ ...
In the second block of the log the processor was launched without a download file.
The ram disk behavior of the temp directory
In practice a download of ~250KB data takes place approximately every minute and the download gets processed and deleted in less a second. To not strain the storage drive and to limit the usage of the storage drive in case of misconfiguration the standard temp directory of the SiteController, /opt/azeti/SiteController/tmp
, is configured as a 'ram disk' with a maximum space of 256MB. On each SiteController start/stop/restart the temp directory and its content gets destroyed. To switch off this ram disk behavior the SiteController.cfg needs an entry in the section [SiteController.conf]
:
... [SiteController.conf] ... ramdisk_size=0 ...