m. nourbese philip's "zong!" is incredibly important and i feel negligent for not having read it until now
though I suspect that if I asked a random english speaker to pronounce "xgdjvgx" I would probably get no more than an annoyed look before they walked away. good-bye english speaker
down to ~0.05 loss on the training set. here's the arpabet that comes out when you feed in random sequences of characters. (I've annotated with my own pronunciation guide [in quotes] for those of you who aren't fluent in arpabet). not bad actually? in some cases pretty close to what I'd expect to get if I asked an english speaker how to pronounce those sequences
no joke huge thanks to the nyu hpc sysadmin who answered my question about keras segfaulting and then peeked at my process list and reminded me that I can request like 20 cpus, not just 2
experimenting with sequence-to-sequence lstm, attempting to learn cmudict transcriptions from orthography. if you stop at ~0.50 loss it thinks "ballet" is pronounced like "albert" https://mastodon.social/media/zixPHlE6F1Lax_OS-hs (this is also only using the first 10k entries from the dictionary)
uspol, net neutrality Show more
I think most people would say that the opposite of "nationalize" is "privatize," but is there a word for the latter that doesn't have the positive connotation of the word "private" (positive in the context of government oversight, e.g., privacy)? something like "feudalization" or similar. *does a web search* wait that's already the word https://en.wikipedia.org/wiki/Refeudalization
the IBM Port-A-Punch from 1958 has the industrial design of a late nineties PDA
somehow this paragraph is making exactly the same point I would make about texts and computers while completely reversing the abstract/physical relation (I would argue that it's the texts that are physical, spatial, embodied while the computational representation is abstract) (from https://www.ideals.illinois.edu/bitstream/handle/2142/402/SperbergMcQueen.pdf?sequence=2&isAllowed=y) https://mastodon.social/media/pO4JSVkwBoveOUoBjtg
every so often I'm struck by the hubris of the idea of representing a text as a uniform sequence characters. who first looked at a book and was like "ah yes, I see, this is a one-dimensional array"
earliest I can find so far is https://www.aclweb.org/anthology/W/W93/W93-0310.pdf (from 1993). wording therein suggests that there was a 1 million-word pg corpus already floating around at that point
it's also super suspicious and silicon-valley-dystopian that google scholar's "sort by date" feature actually just shows citations for your query in the past year, as things that happened in the past could not possibly matter?!
what was the first statistical study that used project gutenberg as a corpus? (including nlproc, computational linguistics, digital humanities, etc. under the heading of "statistical study") for that matter, what was the first computational creativity project to use pg as a corpus?
"[B]oth mora counts and number of voiced obstruents in their name seem to, albeit stochastically, affect Pokémon characters’ size, weight, and strength parameters. Vowel quality in initial syllables seems to have a tangible effect as well." cfp for linguistics conference on pokémon sound symbolism https://linguistlist.org/issues/28/28-5228.html (via rctatman on birdsite)
taught my last class of the semester tonight! excellent students and projects this semester. nothing else to add, just feeling a sense of satisfaction and accomplishment. 🎷
today in delightful wikipedia categories, https://en.wikipedia.org/wiki/Category:Fictional_tubers
Looking for #Patreon alternatives? Long comparison list of crowdfunding sites includes 16 Patreon-like subscription sites (Snowdrift Wiki):