TIL that modern CPUs have an $F_2$-polynomial multiplication intrinsic operation: https://en.wikipedia.org/wiki/CLMUL_instruction_set
@j2kun Yeah, it's surprisingly useful. Aside from the classic "algebraic" use cases, there are some often useful bit tricks like computing the running bit parity by carryless multiplying by all ones/-1.
@j2kun For example, if you mark the start and end of a range with a 1 bit then the running parity is a mask vector to select the bits in those ranges. You can even use this for computing rasterization coverage masks for potentially overlapping polygons where overlaps are resolved with the "mod 2" rule.
@j2kun And here's a fun application to parsing quoted strings: https://github.com/simdjson/simdjson/blob/cab383e1de7385c6460b66e5fad25a116d750402/src/generic/stage1/json_string_scanner.h#L67
@pervognsen @j2kun speeding up multi-block CRC32C is another example: