mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

354K
active users

When opening UTF-8 files, do any of the abstractions in Rust std::io silently remove the UTF-8 BOM (where present)?

The question I am really asking is "Do I need to check for and skip the BOM"

EDIT: The answer is "Rust does not have this, you do it yourself"

@mcc Checking now. This kind of question is why I keep a checkout of the Rust repo handy.

@matt I'm credibly told no.

@mcc Yeah, you'll have to check for and remove the BOM yourself if you wish. Rust's support for dealing with character encodings when reading a file is rudimentary when compared to, say, Python, Java, or .NET (I assume .NET has some support for detecting and removing the UTF-* BOM, since AFAIK Microsoft originated that convention; I don't know about the other two languages). If you read file contents into a string, the Rust standard library will ensure it's valid UTF-8, and that's it.

Rachel Greenham

@matt @mcc IIRC Java does for UTF-16(+?), not UTF-8. I needed to roll my own Reader impl wrapped around a PushbackInputStream to more comprehensively make use of (ie:not just discard) the BOM if present.