realized you could make a sort of markov chain text generator using concatenated word vectors instead of the tokens themselves, which has the benefit of being able to cope pretty well with out-of-vocab strings. anyway, here's word-vector-markov Jane Austen elaborating on what the Internet is

(could improve this by also concatenating an average vector of all of the context leading up to the n-gram, maybe? although at that point you're basically just hard-coding what an LSTM is supposed learn to do on its own, more or less)

Hmm. If you're using the vector of the specific context, in the individual case that should have more information, right? Because it'll sort of disambiguate homographs.

By which I mean:
If the vectors of previous words (in the prompt or in the generated output) are taken into account when producing new words rather than merely the last word, the likelyhood that a word with two distinct semantically-distant senses of roughly equal frequency will produce a garden-path sentence is much lower. (Basically the same logic as using 2nd & 3rd order markov chains over first order, except probably better.)

Are you thinking exponential weight decay?

@enkiv2 the example I showed was using concatenated vectors of 3-grams. and yeah I was sorta proposing a system where you do an average of the context vectors according to their distance from the point that you want to predict? worth trying, I just haven't had a second to do it :)

@aparrish this is both a great idea and a wonderful result 😁

@aparrish “the internet is so excessively” is a v accurate statement

