If you’ve ever used Zoom or other recording tools that have native transcription, you will know the pain of getting a file that looks like this:
Very cool that they do the transcription for you, but there are some things missing. The primary issue is that it’s difficult to read.
Transcribr – A Simple Ruby VTT File Cleaner
Necessity is the mother of invention. I created a super simple (and hack coded) tool using Ruby that will help me solve my problem. This nifty little tool is what I call Transcribr.
All you need to do for this to work for you is to name your input transcription file as source.vtt and run the process using the instructions at the GitHub site for Transcribr.
The process will do a few neat things for you including:
- Remove empty lines
- Remove timestamps
- Remove other HTML-ish stuff
- Remove line numbers
Just clone the GitHub repo and run the tool against your source file. The tools is built to output to screen for viewing and also creates a file called output.txt for you to be able to use. Every time you run the tool it will recreate the output file for you as well.
Now that you’ve run Transcribr against the source file you get a more reader-friendly version that looks like this:
I hope that this is helpful for some folks out there. If you’re a legit developer and want to help clean up the hack coding that I did, please submit a PR and I’ll happily update to make it work better. Happy transcribing!