Thank you for linking this paper.
I share the peculiar mix of feelings tho.
"It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model."
@zeroed One reason for optimism is that it makes auditing for bias in training data a theoretical possibility, rather than leaving ML algorithms as a quasi-mystical blackboxes without verifiability or recourse.
Systematic bias can turn out devastating with such "scale" and the infamous "Security through obscurity" IMHO does not even get closer to be a point with this regard.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!