I have to agree with Jan here. In particular, related to the CSV output, you said that "the generated CSV and JSON files are mainly meant to be used by scripts for automation purposes", but the generated CSV is all but machine-readable.
The file starts with a number of lines not containing any comma at all, but a header saying "Advanced Video Quality Tool (AVQT) - CLI", then a few lines separated by a colon (:) with metadata on the processing.
Only then it starts with a CSV-conformant set of lines like:
Frame Index,AVQT
1,1.00
2,1.00
3,1.00
…
But later, it suddenly changes the semantics of the data again:
Segment Index,AVQT
1,1.00
…
This makes it really hard to parse. A proper CSV file should have a fixed number of columns with a known meaning, and it should not change the semantics in the middle of the file. Also, a CSV file cannot have metadata like this added to the top. If I wanted to parse it, I would have to manually strip away all this, and I would have to consider the change of semantics in the middle of the file.
This means I cannot simply use a CSV parser like Python's CSV reader, pandas or a statistical software like R or Excel to give me a meaningful output without having to manually clean up the output file.
If you are trying to convey different types of records, you should have three CSV outputs:
Metadata, with headers version,test_file,reference_file,segment_duration,temporal_pooling,display_width,display_height,viewing_distance
Per-frame scores
Per-segment scores
This way, all the data can be parsed cleanly.
For more info on how CSV data can be laid out cleanly, please see the paper "Tidy Data" by Hadley Wickham.