I'm looking for tooling (#Bash #Python #OpenRefine etc.) that processes Excel-formatted exports from Web of Science or similar into #DSpace-compatible CSVs.

Processing steps include: Renaming & subsetting columns, renaming to match #DublinCore fields, harmonising author names, retrieving missing info via #DOI-APIs etc.

Would be a neat programming lessons, but I'm looking for tried-and-tested code for production.

Please boost! 1k thanks for any hints 🙂 #Code4Lib #DataScience #LibraryCarpentry

@katrinleinweber @datenteiler I don't have any experience with Web Of Science Excel format and maybe this is a stupid question but I am curious: Have you tried Pandas to do this job? 🤔

@datenteiler Doing that right now, but as I wrote: "Would be a neat programming lessons, but I'm looking for tried-and-tested code for production."

@katrinleinweber
Cool! This sounds so much like a problem that should already have been solved. It's strange that there isn't anything already.🤔

Good luck tooling and happy hacking! 😀

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!