my seq2seq network for predicting phonetic features at from character strings after 10 epochs is at 99% accuracy on the validation set and pronounces (e.g.) "fediverse" (not in training set) almost flawlessly (I'll transcribe the features as "fidiverz") but seems to consistently mess up on interdental fricatives ("theorizing" comes out as "feruhzing," "lathe" comes out as "lat-tee," "this" comes out as "sis")


similar problems with /ʒ/ ("genre" comes out as ?ehnuh where <?> is a consonant described as a "voiced alveolar fricative stop" with a hint of velar thrown in). probably because these sounds combine be less than 1% of all sounds and might not be present more than a handful of times in the training set. I might have to think about partitioning differently or augmenting the data set to even out the distribution

@aparrish I tried to make that noise but it was a challenge.

Sign in to participate in the conversation

Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. This server is run by the main developers of the Mastodon project. Everyone is welcome as long as you follow our code of conduct!