In recent years, with the advent of deep neural networks, the accuracy of speech recognition models have been notably improved which have made possible the production of speech-to-text systems that can accurately transcribe speech even in difficult scenarios such as …