One of the signs you're Not Well is that putting a 'reading list' section on your blog starts with building a rotating-proxy amazon scraper and a NLP powered metadata scrubbing engine that fixes common problems — Edition names jammed onto the end of Publisher names, publication dates appended to Imprint names, subtitles that are actually author lists, etc.
But, like… what am I supposed to do? Let BAD DATA just STAY BAD?
*wild-eyed, twitchy stare*
Structured book metadata is less of a contract and more of a puzzle game, in which you try to figure out whether ‘PENGUIN' appearing in the Publisher, Format, and Edition fields of a book about penguins is a sign that Penguin/Random House published it, that the metadata is duplicated, or that someone’s enthusiastic 5 year old got ahold of the keyboard on release day.
@eaton It’s such a headache with my own reading log, and I view any API-pulled data as a mere starting point that needs human intervention
@markllobrera 100%. I've got things in ... DECENT shape, though not so good that I’d be comfortable doing (say) auto-generation of author-name index pages.
@eaton I settled on a system where the API pulls result in a Markdown file with the book metadata in front matter, and that can get cleaned up
@markllobrera @eaton i’ve had a book blog for 16 years and i just...keyboard that shit in
@aworkinglibrary @eaton Fair! (The only reason I ended up hooking in to APIs was because I wanted to build a few CLI tools for coding practice, rather than a true workflow need)
@eaton The dataviz potential here [fans self]
@markllobrera Oh, *you know it*
@eaton Resisting the siren call to add another personal project to the list