People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias
OpenAI mitigates ChatGPT’s biases using fine tuning and reinforcement learning. These methods affect only the model’s output, not its implicit biases (the stereotyped correlations that it's learned). Since implicit biases can manifest in countless ways, OpenAI is left playing whack-a-mole, reacting to examples posted on social media.
This is the latest in the AI Snake Oil book blog by @sayashk and me. Writing this blog alongside the book has been really fun. I'll probably do something like this for all future books! Thank you to everyone who subscribed. https://aisnakeoil.substack.com/
@randomwalker @sayashk Great blog! "OpenAI mitigates biases using reinforcement learning and instruction fine-tuning. But these methods can only correct the model’s explicit biases, that is, what it actually outputs."
@randomwalker @sayashk Thanks for putting this collection out there, a large part of my work has been battling the FOMO created within public institutions about not using the newest hype-baby. Whether it was big-data, blockchains, crypto or AI, the hardest part is getting folks to acknowledge limitations and strengths so that they can ask the right questions about the tool/medium, and then harness it appropriately.
@randomwalker As an uninformed AI plebe I got kind of stuck on the sentence "*reinforcement learning [..] affect only the model’s output, not its implicit biases*". That... really sounds like a sentence that also applies to humans? We measure humans by their output, so, should we do the same for AI?
**Philosophically speaking, if a biased AI generates unbiased output, is it really biased?**