File length and segmentation
1. file lengths
I double-checked, and it looks like 10% of the files are not exactly 1 second long. they are shorter:
I propose pre-processing the dataset, by zero-padding the files in question. see https://scm.medienhaus.udk-berlin.de/freder/pi-zero-asr-paper/-/blob/file-length-and-segmentation/code/utils.py#L16
@Paul did you already do padding somewhere? (I feel like I remember you having talked about it)
2. feature extraction: segments
re-reading https://github.com/jameslyons/python_speech_features#mfcc-features I think, we should not use the default value for winlen
but set our own. in fact winlen
and winstep
should be equal – which would result in the exact number of frames/segments we specify, without any gaps or overlap.