Chris (now @chris@toot.works) is a user on mastodon.social. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
Chris (now @chris@toot.works) @cs

Hi everyone! I made a translator bot that translates statuses it's mentioned in.

To use it, mention @translator (translator@toot.works) in your status, followed by the two-letter code for the language you want to translate to (e.g. "ja" for Japanese) (full list of possible codes here: christopher.su/projects/transl). It uses English if no language is given.

Give it a try and toot me your suggestions and comments!

@translator ja
#MastoDev #Mastodon #Bots #Languages

· Web · 184 · 167

@jeroensmeets [en] good idea, @cs @ mastodon. social! Which API do you use?

@cs [nl] @jeroensmeets ik ben met behulp van de API Microsoft vertaling!

@cs @translator @jeroensmeets Minor #privacy drawback is that all toots will be run through and stored by trackers. 😟

@metbril @jeroensmeets It is a paid API, so it might be slightly better. I haven't really looked too far into how source input can be used.

@cs Never mind. It's my personal choice to use it, or not. Keep up the good work.

@jeroensmeets

@metbril @cs for my app I'm also looking at translation, built a first version on Google's API. Wasn't too impressed, this looks promising.

To cover costs I'm thinking about making it an in app purchase, so that would handle the choice to use it.

@jeroensmeets @metbril Built-in translation in a client would be pretty awesome. I was thinking you might be able to build some kind of caching server to avoid having to perform the same translations more than once, thus saving you some cost, but then you'd need server-side code for that.

@cs @metbril true. For now I just added a button to translate a toot with a cache or switch back to the original content.

All the html in the content is a pain in the butt, especially because my platform (fusetools) doesn't have a richtext component yet.

@jeroensmeets @metbril Parsing from the HTML status content returned by the API was a huge pain for this bot as well. I wish the API would just include some plain text version of statuses.

@cs @metbril yes, had to write a small library to parse the content into clean text, into an array of clean paragraphs and an array of words with extra data how to handle appearance and click.

11t now displays the timeline as paragraphs of clean text, and when you click a toot, you get a mock richtext view of the toot where usernames, links and hashtags are bold and clickable. It's far from perfect.

@jeroensmeets @cs Guys. Now that you're discussing app development I don't mind being out of the loop. :wink:

@cs i just favourited a post by @translator to make it feel good about itself.

@jeroensmeets [to] @cs 'oku ou favourited pe na'e fai ia 'e ha fu'u pou 'e ongo'i lelei fekau'aki mo ia.

@Elizafox @cs

You think that's something? He also built this:

@PongBot ping

@adam [ja] @cs @translator は、インスタンス間でこの作業を行います?

@cs Ah. The fourth account :P

I assume it won't translate toots that are forwarded to it (i.e. someone replies to a toot and tags the bot), correct?

@jbbdude It does right now, which is kind of problematic... next item to fix is not tooting when the source language and translated language are the same.

@cs Wait, so if I reply to a foreign language toot (say, in Japanese) with "@translator en" it would translate the previous toot? That's amazing!

@cs that's awesome. Thanks for making a bot I don't hate

@cs imagine being able to send it a permalink to a toot and have it parse the toot contents

@mig5 That's a good idea! Another suggestion has been to have some command that requests a translation of the parent toot (e.g. I see your toot in another language, I reply with a translator command, it replies with a translation of your toot).

@cs Would there be any chance of extending that to translate the /parent/ post into some specified language?

Say:

@translator parent fr

@dredmorbius Yeah, this has been the most requested feature now. I didn't add it in at first, because I didn't want spam notifications at the parent toot posters, but I suppose it wouldn't be too bad. Considering adding it! I would have to make it respect #nobot, probably.

@cs You could have translate bot /not/ mention the author of a toot.

@dredmorbius Would do that for sure. I wonder if Mastodon might still send a notification to the author though. It might look at the parent toot id, check that toot, and then send a notification to its author. Need to manually test a reply toot that doesn't mention the author and see if it still notifies.

@cs My understanding is that Notifs occur /only/ if the user is directly mentioned, or of a toot is favourited or boosted.

Otherwise, a post is added to a thread, but there's no notification generated.

@dredmorbius Actually, another problem with this: the toot.works instance is essentially a single-user (+bots) instance. If the parent toot isn't federated to the instance, I'm not sure if the bot can fetch the parent toot's content to translate it. Any idea on how to fix it?

When browsing my timeline on toot.works, I frequently click on toots that are replies and I can't see the parent toot :/

@chris You might need to work around that. I'm not sure what the bot API looks like but:

1. Open the requestor's local instance of the request (e.g., <instance>/@<requestor>/<post-ID>
2. Look at the parent post, which should be the original.
3. Grab its timeline URL.
4. Search that on the bot instance. This adds the post to that instance.

I don't know that there's a streamlined way of doing this, but it would include the parent work, if you can make this happen.

@dredmorbius Ah, using the instance search through the API to pull in the post to the instance is brilliant. I hadn't thought of that.

Another approach that I thought of that could work: the "Get a status" API endpoint was changed to not require authentication in one of the more recent versions, so I could try calling that API directly on the other instance. Would only work if the other instance was on a new enough version that the API didn't require auth though.

@webmind Yea, here, it didn't properly translate to Japanese because the mention on my toot was actually @translator on my local instance rather than @translator on the toot.works instance. I was tooting from the .social instance there.

@translator ja

@cs [ja] @webmind また、それはバギーだけの種類: p

@Tryphon [fr] Un nouveau bot de traducteur ? C'est génial !

@cs That's awesome! Is it based on the Google API (or similar), or your own implementation?

@Tryphon It's using a Microsoft Translation API, so similar to that :)

.@cs Are you making a local server-side implementation using something like Moses, Phrasal or Joshua next? Not to be greedy 😂

@cs I noticed that RTL is not really okay, by the way, on Mastodon display.
For example these symbols should by on the left side of the sentence (?!.)

@BoF Nice catch! That might be a problem in Mastodon itself. You might want to open an issue on GitHub with some screenshots describing the problem.

@cs Yep, we are waiting for Mastodon (ar) strings to be merged (already submitted) then we will correct the RTL issue :)

@cs Nice work! I had lots of fun with it... 😂

@cs I think this should be integrated into the UI, honestly. I don't want to toot something to get a translation and I *really* love "translate this"-buttons in other webapps :)

@Merovius You can use a translation browser extension for that! Problem with integrating it in the UI and translating lots of toots is programmatic translation is not cheap :(

@cs Saw at least one user request that the translator post using unlisted toots, so that the Federated (and Local) Timeline(s) don't get flooded by translations...

@cs Crap. Meant to include "but not sure if you saw that already" at the end of that toot.

Well, count this as a +1 on that request, then. Seems like a legit request to me.

@sendoshin Whoops, I should fix that. Hadn't seen that yet, thanks for bringing it to my attention! I'll fix it when I wake up in the morning :) (3 am here now)

@sendoshin This was keeping me up, so I just pushed the change 😛 Should be 'unlisted' now.

@cs @translator translation would be neat but I don't want to mention toots to find out what they say (if I understand how it works).

Any way to integrate this differently? Eg, as a translate tab, so toots can have a "translate to xx" tab that I can enable/disable in settings, and which I could permanently enable for certain people who I follow.

@translator @cs I got my first Japanese follower today but there's no point me following back because I can't read any of their toots ☹

@translator @markhughes There are some browser extensions that will translate highlighted text for you (e.g. the Google Translate browser extension). You could use that to read translated toots and then use this to reply back quickly :)

@cs @translator @markhughes anything especially for firefox ? thank you for all your time and effort chris !!

@cs @translator thanks but I'm a Googlephobe because of how they track people and use that data in harmful ways.

@markhughes @translator Fair enough! This is also why it would be difficult to add translation to Mastodon itself or to clients though. Using a 3rd-party service raises privacy concerns. Not using a 3rd-party service would require probably GBs of trained translation models

@cs @translator I agree it's a tricky issue but some level of privacy could be introduced (translating text without exposing the URL for example). Using a browser plugin is a much bigger privacy concern I think.

@markhughes @translator Ah, I see what you mean. Agreed, a browser extension that's reading text off pages you visit/text you highlight is definitely much greater cause for concern.

@Murkrow [ja] おはようございます、翻訳ロボット!

@translator en どみありがと!

@translator Je teste ce traducteur, songeant à la formulation la plus complexe possible, pour ainsi, peut-être, le mettre en porte à faux.

@Plumy [en] I'm testing this translator, thinking of the possible the more complex formulation, to, perhaps, put in door to false.

@cs Very cool! Awesome work! This can be very handy

@cs @translator Hi Chris, which translation service do you use? This is not described on your project side.
Thanks
Wolfgang

@translator @vilbi Ya, it's using the Microsoft Translation API. Going to update the project FAQ later to include this info

@cs Wondering, How hard would it be to ask the translator bot to translate a toot.
Like, hit reply, mention @ translator and: [previous] [ja]
Or @ translator with the URL of the toot
Both could be useful, just wondering :)

@hugo It wouldn't be too hard; and that's a highly requested feature, so I'm considering it!

@cs @translator
Quite usefull and impressive, but I have a question : what is the underlying translation service ?

@translator @jd_d It's using a (paid) Microsoft Translation API. I'm looking into other possible options now as well

@cs @jd_d how much did you have to pay - do u have a patreon page ?

@jd_d @theresatagonmyshirt First two million characters are free and then it's $10 per million characters. No Patreon for now, but I'll make one if the costs become high

@cs @translator fr
thanks for your useful work. Anyway which API is used for the translation ?

@yvesago [fr] @cs Merci pour votre travail utile. En tout cas quelle API est utilisée pour la traduction ?

@yvesago @translator I'm using the (paid) Microsoft Translation API.

@cs @translator
Re: " I'm using the (paid) Microsoft Translation API"
Thanks for the reply. Maybe you could add this in the status of translator. This could be important for some users.

@cs @translator From french to spanish, "Jean" was translated by "John" (should have been "Juan" or, probably better, not translated)

@cs @translator When the toot is already in the target language, it is reproduced as is, *except* if the target language is english, where the toot is ignored.

@translator @bortzmeyer That's intentional. What do you think it should do?

@cs Why the difference between english and the other target languages? @translator yua

@bortzmeyer [yua] @cs wáaj Ba'axten le diferencia ichil le ingles yéetel uláak' t'aano'ob destino.

@translator @bortzmeyer Figured that was the most likely case as mentioning @translator without any code defaults it to English. I wanted to reduce the number of unwanted translations without reducing the number of wanted translations.

Implementation-wise, it takes another API call to detect the source language before translating

@cs @translator If there are *two* occurrences of translator@toot.works language, the first is silently ignored.

@cs [ja] @translator @Berthavlix それに翻訳するいくつかのテキストを与える必要があります。

@translator @cs darn, didn't realize Esperanto wasn't supported :<

@justyn @translator There's a link to the full list of supported languages in the bot's bio. Also, it just translates the toot that it's mentioned in, so you need to give it some text other than the language code as well.

@translator es

@cs [es] @justyn @translator de que hay un enlace a la lista completa idiomas soportados en bio del bot. También, sólo traduce el toque de bocina que se menciona, por lo que necesita para darle algún texto que no sean el código del idioma.

@translator ja @cs why is the translated toot not visible as an answer to the original toot? Do you have a Patreon crowdfundind account for what you pay to microsoft for their translation API ?

@xdej [ja] @cs なんで翻訳 toot オリジナル警笛に答えとして表示されますか?彼らの翻訳 API のためマイクロソフトに支払う何のため Patreon crowdfundind アカウントを持っていますか。

@xdej @translator The translated toot should be in direct reply to the original toot. I see the reply from @translator as a child to your status (at least on my instance).

Don't have a Patreon yet. The API does have a "free tier" of 2 million characters (though at this rate, I think we'll be reaching it soon). I might start a Patreon once it starts to become costly :)

@cs [ja] @xdej @translator それらが掲載され、非上場。

@cs this is now the only translation of one of your toots in that thread, according to Tusky on my phone.

@xdej I think that should be the only translation in the thread, at it is the only translation requested in the thread. If you mention @translator without specifying a language code after the mention, and your toot is already in English, it will not respond. If you do give a language, it will reply with the translation in that language: @translator zh-CHS

@cs [zh-CHS] 我觉得那应该是翻译中的线程,它的唯一的 @xdej 是请求在线程中唯一的翻译。如果你没有指定语言代码后说到和你嘟嘟已在英语提到了 @translator,它将不响应。如果你给出一种语言,它将回复中那种语言的翻译︰

@translator jp @cs does it have an option to add furiganas over the kanjis?

@xdej @translator Don't think so :/ I might be able to add it it in after the translation, if there's a decent library for it.

taku910.github.io/mecab ; ruby annotations as used in the adware easypronunciation.com/en/japanese-kanji-to-romaji-converter @cs

Thank you so much for doing this, Chris. I suspect that it will allow many new friendships.
@cs

@cs @translator to be clear, this only works for your posts. You cannot reply to a post with by tagging @translator and expect to get a translation back. ja

@ChristianD [to] 'Ai ke mahino, pe ngaue 'eni ki he ngaahi fu'u pou ko ho'o @cs. He 'ikai lava ke ke tali mai ki ha pou 'aki 'e fakahingoa 'o e 'u @translator mo 'amanaki atu ke toe foki 'a e liliu. ja

@ChristianD @cs @translator Yes, for now, it only translates the actual toot that it is mentioned in. I'm adding in translation for parent toots soon (so you can mention the translator in a reply and get a translation of the original toot). I didn't add it in initially because the translator uses a paid translation API and figured such a feature could see some heavy usage and get expensive.