Allison Parrish is a user on You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.

Allison Parrish

@Clausti I've never done that personally but it sounds very much in line with other data augmentation techniques I've seen

@Clausti I can't claim to be an expert but doesn't what you describe fall under the category of data augmentation? and stuff like adding dropout in neural networks, etc. conceptually for me the problem with data augmentation is that then you're sort of building an idea about how the data works into your own analysis, which seems... weird.

@falkreon do you have a link? it would actually be really helpful right now to see/read details about someone else's process

@charlyblack hahahaha that is a good idea for data augmentation maybe :) though I'm at the point in my research right now where I firmly believe No One Really Understands 80 Flowers Except For Me

in both cases there's so little data (just 80 items, since there are just 80 poems...) that the model pretty much instantly overfits and basically just learns the poems verbatim. I think I'm going to go back to the word model and try using pre-trained embeddings, then investigate data augmentation? (but allison, you're saying, CNN is very inappropriate for this task, use LSTM, bleah, and yes I know but I have Something I'm Trying To Show about zukofsky's style of composition in these poems)

output from a convolutional neural network trying to "condense" wikipedia articles about each of zukofsky's 80 Flowers into the text of the poems themselves. the first is from a word-level model, attempting to produce one of the poems in the validation set; the second is from a character-level model, trying to produce one of the poems in the training set. the word-level one looks "coherent" but it's really just reproducing words in similar frequencies from the targets

between article 13, GDPR and sesta/fosta it's... probably not a good idea to make server-side software that accepts user data unless you have a bunch of lawyers, right?

uspol, cruelty to children, bitter humor Show more

seriously though. and without any increase in validation accuracy. bleah

current status, overfitting my model like it's fashion week

uspol Show more

anyway til that letter distributions in scrabble were directly influenced by edgar allen poe's "the gold-bug"

based on the photo of the instructions on the nyhistory page, here's the letter frequency from hill's spelling blocks, which seems to... roughly follow english letter frequency in general (with some weird outliers, like five Cs but only three As, wayyy too many Js, not enough Zs to spell "pizza" etc). I wonder hill came up with this distribution by the seat of his pants or if he actually did some counting or had some other source

("prickles / points ditto itself" and "centifolia cemetery / striped stipules" feel especially true to the original, if only by chance in this instance)

... which isn't to say that I don't kinda *like* my simplest-possible implementation? it's at least doing the work of juxtaposing obvious and non-obvious words that are relevant to the topic and eschewing conventional syntax. so I do feel justified in this approach and like I'm on the right track. here's another...

my generator on the left, zukofsky on the right (obviously). I say "surprising" because even though I've been studying and admiring these poems for the better part of a year at this point I still sorta had this idea that the poems were *essentially* just random relevant words arranged in a grid. comparing the poems to, like, actually random elements in a grid really shows the craft and attention and unusual cohesion of the original

so I've been working on a computer program to compose poems in the style of Zukofsky's _80 Flowers_, a collection (literal anthology!) of constrained poems written about individual flower varieties. eventually this is going to be a corpus-driven machine-learning thing but just now as a sort of "baseline" I made a generator that just arranges the top forty keywords from the wikipedia page corresponding to each flower in Zukofsky's collection, and the results are... surprising?

birdsite Show more

birdsite Show more