hey so remember how I wanted a project gutenberg corpus with every plaintext file in an easy-to-use format? mastodon.social/@aparrish/1005

well I wanted it so bad I guess that I went ahead and made it github.com/aparrish/gutenberg-

Follow

a quick exercise with this corpus: "Flower blank," alphabetized bigrams beginning with "flowers" from every Project Gutenberg book labelled as "Poetry"

gist.github.com/aparrish/fdcbd

excerpt:

flowers a
flowers ablaze
flowers about
flowers above
flowers absorb
flowers accompanying
flowers adorn
flowers advance
flowers afford
flowers affray
flowers aflame
flowers after
flowers again
flowers against
flowers alighting
flowers alive
flowers all
flowers allied
flowers ally
flowers almost
flowers aloft
...

Sign in to participate in the conversation
Mastodon

Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. This server is run by the main developers of the Mastodon project. Everyone is welcome as long as you follow our code of conduct!