@Clausti I can't claim to be an expert but doesn't what you describe fall under the category of data augmentation? and stuff like adding dropout in neural networks, etc. conceptually for me the problem with data augmentation is that then you're sort of building an idea about how the data works into your own analysis, which seems... weird.
in both cases there's so little data (just 80 items, since there are just 80 poems...) that the model pretty much instantly overfits and basically just learns the poems verbatim. I think I'm going to go back to the word model and try using pre-trained embeddings, then investigate data augmentation? (but allison, you're saying, CNN is very inappropriate for this task, use LSTM, bleah, and yes I know but I have Something I'm Trying To Show about zukofsky's style of composition in these poems)
output from a convolutional neural network trying to "condense" wikipedia articles about each of zukofsky's 80 Flowers into the text of the poems themselves. the first is from a word-level model, attempting to produce one of the poems in the validation set; the second is from a character-level model, trying to produce one of the poems in the training set. the word-level one looks "coherent" but it's really just reproducing words in similar frequencies from the targets
between article 13, GDPR and sesta/fosta it's... probably not a good idea to make server-side software that accepts user data unless you have a bunch of lawyers, right?
uspol, cruelty to children, bitter humor Show more
<< There is a term I wish to see go viral: "Trump hotel" as a synonym for concentration camp, prison, or orphanage. >>
--- Charlie Stross
seriously though. and without any increase in validation accuracy. bleah
current status, overfitting my model like it's fashion week
based on the photo of the instructions on the nyhistory page, here's the letter frequency from hill's spelling blocks, which seems to... roughly follow english letter frequency in general (with some weird outliers, like five Cs but only three As, wayyy too many Js, not enough Zs to spell "pizza" etc). I wonder hill came up with this distribution by the seat of his pants or if he actually did some counting or had some other source
hill's spelling blocks http://www.nyhistory.org/exhibit/hills-spelling-blocks and associated patent https://patents.google.com/patent/USRE2528E from this really great article about the history of alphabet blocks on atlas obscura https://www.atlasobscura.com/articles/history-alphabet-blocks
("prickles / points ditto itself" and "centifolia cemetery / striped stipules" feel especially true to the original, if only by chance in this instance)
... which isn't to say that I don't kinda *like* my simplest-possible implementation? it's at least doing the work of juxtaposing obvious and non-obvious words that are relevant to the topic and eschewing conventional syntax. so I do feel justified in this approach and like I'm on the right track. here's another...
my generator on the left, zukofsky on the right (obviously). I say "surprising" because even though I've been studying and admiring these poems for the better part of a year at this point I still sorta had this idea that the poems were *essentially* just random relevant words arranged in a grid. comparing the poems to, like, actually random elements in a grid really shows the craft and attention and unusual cohesion of the original
so I've been working on a computer program to compose poems in the style of Zukofsky's _80 Flowers_, a collection (literal anthology!) of constrained poems written about individual flower varieties. eventually this is going to be a corpus-driven machine-learning thing but just now as a sort of "baseline" I made a generator that just arranges the top forty keywords from the wikipedia page corresponding to each flower in Zukofsky's collection, and the results are... surprising?
birdsite Show more
(twitter reminded me that it's been eleven years since i signed up and suggested a tweet for me which i annotated! posted here instead of twitter because sometimes things on twitter go viral and i don't want to deal with thousands of entitled jerks or worse in my mentions all day today!)