realized you could make a sort of markov chain text generator using concatenated word vectors instead of the tokens themselves, which has the benefit of being able to cope pretty well with out-of-vocab strings. anyway, here's word-vector-markov Jane Austen elaborating on what the Internet is
(could improve this by also concatenating an average vector of all of the context leading up to the n-gram, maybe? although at that point you're basically just hard-coding what an LSTM is supposed learn to do on its own, more or less)
By which I mean:
If the vectors of previous words (in the prompt or in the generated output) are taken into account when producing new words rather than merely the last word, the likelyhood that a word with two distinct semantically-distant senses of roughly equal frequency will produce a garden-path sentence is much lower. (Basically the same logic as using 2nd & 3rd order markov chains over first order, except probably better.)
Are you thinking exponential weight decay?
@enkiv2 the example I showed was using concatenated vectors of 3-grams. and yeah I was sorta proposing a system where you do an average of the context vectors according to their distance from the point that you want to predict? worth trying, I just haven't had a second to do it :)
Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. This server is run by the main developers of the Mastodon project. Everyone is welcome as long as you follow our code of conduct!