Even at my early stage of speed-reading in the context of voiceover, I can already say that it feels a bit like magic. What happens is: You're scanning ahead the coming words to recognize their shape and the words you're saying out loud right now you don't know even focus anymore.

In essence, you let your mouth babble (using what's in the buffer until empty) whereas your eyes are already busy recognizing the next words' shapes while your mouth frames the necessary shape.

