my seq2seq network for predicting phonetic features at from character strings after 10 epochs is at 99% accuracy on the validation set and pronounces (e.g.) "fediverse" (not in training set) almost flawlessly (I'll transcribe the features as "fidiverz") but seems to consistently mess up on interdental fricatives ("theorizing" comes out as "feruhzing," "lathe" comes out as "lat-tee," "this" comes out as "sis")

similar problems with /ʒ/ ("genre" comes out as ?ehnuh where <?> is a consonant described as a "voiced alveolar fricative stop" with a hint of velar thrown in). probably because these sounds combine be less than 1% of all sounds and might not be present more than a handful of times in the training set. I might have to think about partitioning differently or augmenting the data set to even out the distribution

@aparrish I tried to make that noise but it was a challenge.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!