and a little interface for it. this is trying to spell the words using phonetic information (using a sequence-to-sequence neural network), the temperature parameter basically controls how the probabilities are distributed (at low temperatures, only the most likely characters are generated according to the information in the model; at higher temperatures, any character might be generated)

I need to stop playing with this, I have other stuff to do geez

still at work on this english nonsense word vae. here are some nonsense words sampled from the latent space of the latest trained model...


these are generated by feeding the decoder with normally-distributed random numbers. pretty happy with how they all seem like jabberwockian-yet-plausible english words

by contrast, results of feeding normally-distributed random numbers into the decoder on the RNN without the VAE:


not as good! which is encouraging, since it shows that the VAE model does actually have a "smoother" space than the non-VAE model.

(I have to admit that when I started this project I was like, "why do you even need a variational autoencoder, if just plugging random vectors into the decoder was good enough for jesus it's good enough for me," but there really is something magical and satisfying about being able to get more-or-less plausible generated results for basically any randomly sampled point in the distribution)

progress: at 50 epochs, even w/KL annealing, 32dims is not enough for the VAE latent vector to represent much of anything. leads to reconstructions that are probably just the orthography model doing its best with next-to-noise, but sometimes amusing, e.g.

cart → puach
liotta → pinterajan
intellectually → aching
capella → pellaka
photometer → augh
sympathizer → disteghway
butrick → jorserich
botha's → szine
clayman → tsantiersche
sparkles → trenlew
calamity → muliss
thermoplastic → tphare

(posted this mainly because "butrick → jorserich" seems like something mastodon people would like, e.g. "my name is Butrick Jorserich, follow me at")

in which I accidentally leave off the "end" token when predicting spelling from sound, and it just keeps on spelling until it's ready to stop

remarkable → remarymarkamal
wysiwig → irzerwizkian
bemuse → bismebishews
unenforceable → unofironsfinars
shutters → shurtsithaters
capstick → capstickapsitk
vittoria → viltovitria
beilenson → billabinsancin
peers → pieespianes
paste → past-pasest
excitable → exexaitabile
phibro → fib-to-birbo
croney → crainkrine-y
tangle → tangitangle

"how doth the little crocodile improve his shining tail and pour the waters of the nile on every golden scale" → neural network spelling by sound but with probabilities of [aeiou] zeroed out → "hv d'th thy lyttl crch-dykly mpr h's shynnyng thyl hnyd ph thy whytrs f thy nyl hwhn avry ghqlynd schqly"

"How doth the little srurbumbered improve his shining pearple and pour the borbirpers of the mrilmer on every golden sprarple"

(adding bilabial and rhotacization features to sounds of *just the nouns* when decoding from phonetics to spelling)

inferring spelling from phonetic feature sequences zoomed along the timeseries axis. (basically, smooshing and stretching the sound of the word and getting the neural network to try to spell out the sound)

in case you're wondering, if you scale the sound of "mastodon" by 4x, it spells "mammasavinstawn"

my name at 4x: Alarlaslilliance Pempereterriashi

the problem with this gug-aradamptling project is that I can't stop playing around with it long enough to write about it

apparently the trick to training a VAE w/annealing is to *never* let the KL loss go below the reconstruction loss. otherwise you get beautifully distributed, wonderfully plausible reconstructions that have almost nothing to do with your training data, i.e., "allison" becomes


exploring the latent phonetic nonsense space around "typewriter"—using the best model I've managed to train yet (100 epochs on HPC, managed to keep the reconstruction loss fairly low while also getting some semblance of a low KL loss)

going back to the regular seq2seq networks, I'm trying to do some quantitative evaluation. the phoneme features to orthography model gets... ~60% of words wrong, and ~12% of letters wrong (working on samples a few thousand words from cmudict), but its guesses seem... reasonable? not sure how to talk about this

this is showing the original words on the left, and the "sounded out" words from the model on the right. the model I trained going the other way (orthography to phoneme features) performs pretty close to published baselines for similar grapheme-to-phoneme models (phoneme error rate = ~7%, word error rate = ~30%), but there are comparatively fewer papers about phoneme-to-grapheme models, so I'm not sure if this is "good" or not

it might just be the case that sounding out a word based on its spelling relies only a little bit on context, but spelling a word from how it sounds relies on context a lot. ("context" in this case meaning anything other than, like, a character-based language model of individual words)

my instinct is that if you trained on the phonemes AND a distributional word vector, you'd get pretty good accuracy on this task! but someone's probably already done that and/or that's a project for another day

was curious to see if my model could produce reasonable portmanteaux. steps: translate spelling to sound, predict the sound-spelling model's hidden state for 2 words, average those states & decode

breaker + nylon → breichor
underwear + futility → unterilie
intolerance + homer → honepheren
Cabot + coyote → caibott
bonus + boasting → boensing
by-election + basin → baisention
demonstration + scissors → cesserser
volcano + baron → balano
tiger + panther → paighter
sharpness + hardship → sharpship

(mathematically this basically just works out to averaging the phonetic features of both words at each timestep and telling the sound-spelling model to do its best. which is of course not how portmanteau actually work—usually we try to find a useful point of similarity in two words, then cut from the first word to the second at that point. but I like to imagine that you could make portmanteau by just smooshing two words together like two skittles)

implied harmful language 

using the phonetic VAE to interpolate between US state names in a grid

decoding the same underlying vectors from the VAE using the french spelling model, for some reason, sure, whatever

visualizing the vector in the latent phonetic space while interpolating between "abacus" and "mastodon." (this is after inferring the latent vectors via orthography->phoneme features->VAE). I just arbitrarily reshaped the vectors from (1, 1, 128) to (8, 16), so the 2d patterns are arbitrary. still interesting to see what it's actually learning!

exploring the phonetic space by just setting large chunks of the dimensions to arbitrary values

@aparrish Sounds like a distant branch of the Joestar family tree from JoJo's Bizarre Adventure

@aparrish As the Most Interesting Man in the World, I don't always wear jorts. But when I do, I wear Butrick Jorserich.

Ooh now I want a bot that runs posts through this

@aparrish it snuck a single ‘a’ in there! Th mshjn rhblyhn bhgnz

@courtney according to this, it's Che-cotort-an-ratily Statstaintangent

@aparrish oh, that's how I was already pronouncing it

@aparrish Prof. Dodgson would have laughed at that. Try the program on Jabberwocky.

"I know how to spell banana I just don't know when to stop"

I think you should adopt "houghtrodhan" as your secret identity when in Britain.

@aparrish These words are all perfect. And now let me get back to my micepotor to compose some letters.

@aparrish btw please just apply these transforms to an entire book for a half hour nanogenmo this year

@aparrish those are amazing. tbh let's just normalize english spelling by making sure a small neural net can map between spelling and phonemes. that's probably a more justifiable plan than any of the earlier attempts?

@aparrish It feels like these are potential spellings of English words, but not *common* English words? I wonder if weighting the training words by frequency would push it more towards "standard" spellings?

@mewo2 I have been meaning to try that. but some of these already feel like more "standard" spellings, or are just attested alternate spellings (morice -> morris, whitacre -> whitaker), so I'm not sure if it would help

@aparrish They feel very right to me? Like, when I say the word on the left (those times that I recognize the word on the left), I can tell very easily why it might read that pronunciation the way it spells it.

I know far too little linguistics to put that into words, but yeah.

@aparrish terich sounds like a nice place to live

@aparrish some of these rows and columns make good nonsense poetry

@aparrish brb, setting a fantasy novel in Oache-Gnang

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!