Convert WebVTT to a Transcript using Python

I want to convert YouTube's auto-generated subtitles into a plain transcript. Why is this so hard? Here's what the subtitles look like when you view a video: And here's what the code which generates those subtitles looks like: 00:00:00.930 --> 00:00:03.080 align:start position:0% and<00:00:01.230><c> now</c><00:00:01.439><c> can&[...]

