so I've been working on a computer program to compose poems in the style of Zukofsky's _80 Flowers_, a collection (literal anthology!) of constrained poems written about individual flower varieties. eventually this is going to be a corpus-driven machine-learning thing but just now as a sort of "baseline" I made a generator that just arranges the top forty keywords from the wikipedia page corresponding to each flower in Zukofsky's collection, and the results are... surprising?
my generator on the left, zukofsky on the right (obviously). I say "surprising" because even though I've been studying and admiring these poems for the better part of a year at this point I still sorta had this idea that the poems were *essentially* just random relevant words arranged in a grid. comparing the poems to, like, actually random elements in a grid really shows the craft and attention and unusual cohesion of the original
... which isn't to say that I don't kinda *like* my simplest-possible implementation? it's at least doing the work of juxtaposing obvious and non-obvious words that are relevant to the topic and eschewing conventional syntax. so I do feel justified in this approach and like I'm on the right track. here's another...
("prickles / points ditto itself" and "centifolia cemetery / striped stipules" feel especially true to the original, if only by chance in this instance)
output from a convolutional neural network trying to "condense" wikipedia articles about each of zukofsky's 80 Flowers into the text of the poems themselves. the first is from a word-level model, attempting to produce one of the poems in the validation set; the second is from a character-level model, trying to produce one of the poems in the training set. the word-level one looks "coherent" but it's really just reproducing words in similar frequencies from the targets
in both cases there's so little data (just 80 items, since there are just 80 poems...) that the model pretty much instantly overfits and basically just learns the poems verbatim. I think I'm going to go back to the word model and try using pre-trained embeddings, then investigate data augmentation? (but allison, you're saying, CNN is very inappropriate for this task, use LSTM, bleah, and yes I know but I have Something I'm Trying To Show about zukofsky's style of composition in these poems)
@aparrish _it's gone_ sentient AAAAAAA
@aparrish me texting sober vs me texting drunk
@aparrish I *just read* something about applying CNNs to small, domain-specific sets like this. Their approach was to use a GAN pair to learn the style of the source set and generate additional plausible data points, and then train a traditional CNN/DNN on the stretched data.
@falkreon do you have a link? it would actually be really helpful right now to see/read details about someone else's process
@aparrish Whoops, it wasn't a paper, it was a talk by Monty Barlow. Still, found it: https://www.youtube.com/watch?v=7EfhicNoAbM
@aparrish I have a very dumb question about overfit... do you ever deliberately add ‘noise’ to training data on purpose? or like, other writing in either the same style or by the same author but not both?
this question is inspired by the way one needs to “backcross to wild type” when optimizing for a specific polygenic trait in a breeding population (bc any single sampling won’t get all the possible contributions)
@Clausti I can't claim to be an expert but doesn't what you describe fall under the category of data augmentation? and stuff like adding dropout in neural networks, etc. conceptually for me the problem with data augmentation is that then you're sort of building an idea about how the data works into your own analysis, which seems... weird.
@aparrish ah, I said it was a dumb question bc I maybe don’t have enough background to ask a good one. I’m not familiar w standard practices of data augmentation
I think what I was trying to ask is if, in the face of a small data set probe to overfit, if there are maybe other populations of data/sources that contain some but not all of the characteristics you’re training for, and if doing multiple rounds of training, w a clean set then an “expanded” set could mitigate overfit
@aparrish my apolgies if that question continues to be nonsense!
@Clausti I've never done that personally but it sounds very much in line with other data augmentation techniques I've seen