media_info_collector.py
What is media_info_collector.py?
I wrote a python script that could be useful for people, who are doing video clip analysis, or manual video testing, as it makes it really easy to compare video file attributes.
It will collect all information about the video files located in the same folder as the script and then parse the data into an xml file.
Columns of the excel file will be in arbitrary order.
Attributes missing from file(s) (if present in at least one other file) will be substituted with “N/A”
Here’s how to use it:
- Just put a bunch of video files in a folder
- Make sure that you have python, medianinfo and tablib installed
- To install mediainfo run
sudo apt-get install mediainfo
- To install tablib run
sudo pip install tablib
- To install mediainfo run
- If needed change
filters
, if a filename contains one of these strings that file will be ignored python media_info_collector.py
- Check out the beautiful result.xml file
media_info_collector.py
import subprocess, os, copy, tablib
filters = ['.evs', '.py', '.xml']
if __name__ == '__main__':
dir = os.getcwd()
list_of_files = os.listdir(dir)
new_list_of_files = []
for file in list_of_files:
if os.path.isfile(file) and all([filter not in file for filter in filters]):
new_list_of_files.append(file)
list_of_files = new_list_of_files
media_info = {}
for file in list_of_files:
abs_path = dir + '/' + file
media_info[file] = subprocess.check_output(['mediainfo %s'%file], shell=True, executable='/bin/bash').split('\n')
result_dict = {}
for file in media_info.keys():
subsection = ''
append_dict = {}
subsection_dict = {}
for line in media_info[file]:
if line == '': continue
if ':' not in line.strip():
if subsection != '':
append_dict[subsection] = subsection_dict
subsection = line.strip()
subsection_dict = {}
else:
line = line.replace(' ', '')
key, value = [x.strip() for x in line.split(': ', 1)]
subsection_dict[key] = value
result_dict[file] = (append_dict)
print "\n\n\nRESULT_DICT:" + str(result_dict) + "\n\n\n"
keys = {}
got_base = False
for file in result_dict:
if len(file) > len(keys):
keys = copy.deepcopy(result_dict[file])
for subsection in keys.keys():
new_subsection = {}
new_subsection[subsection] = keys[subsection].keys()
keys.update(new_subsection)
for file in result_dict.keys():
for subsection in result_dict[file].keys():
if subsection not in keys:
keys[subsection] = result_dict[file][subsection].keys()
for file in result_dict.keys():
for subsection in keys.keys():
for option in keys[subsection]:
if subsection not in result_dict[file]:
result_dict[file][subsection] = {}
if option not in result_dict[file][subsection].keys(): result_dict[file][subsection][option] = 'N/A'
print "\n\n\nKEYS:" + str(keys) + "\n\n\n"
data = tablib.Dataset()
headers = []
for subsection in keys.keys():
for option in keys[subsection]:
headers.append(subsection + ' - ' + option)
print '\n\n\nHEADERS:' + str(headers) + '\n\n\n'
data.headers = headers
for file in result_dict.keys():
file_list = []
for subsection in keys.keys():
for option in keys[subsection]:
file_list.append(result_dict[file][subsection][option])
data.append(file_list)
print '\nFILE_LIST:' + str(file_list) + '\n'
with open('result.xls', 'wb') as f:
f.write(data.xlsx)