Why Release a Large Language Model?

Here at EleutherAI, we are probably most well known for our ongoing project to produce a GPT⁠-⁠3-like very large language model and release it as open source. Reasonable safety concerns about this project have been raised many times. We take AI safety extremely seriously, and consider it one of the, if not the most important problem to be working on today. We have discussed extensively the risk-benefit tradeoff (it's always a tradeoff), and are by now quite certain that the construction and release of such a model is net good for society, because it will enable more safety-relevant research to be done on such models.

While this is a genuinely nuanced issue whose full subtlety cannot be captured in a single short blogpost, we have tried to summarize the most important reasons we believe this is the best course of action for us:

There is significant, important safety research that can only be done with access to large, pretrained models. We would like to make such research possible and easy for low-resource researchers (and participate in such research ourselves). We take the possibility that the first TAI ends up being effectively a scaled up transformer without any radically new scientific insights in its architecture extremely seriously. We feel that the ways in which future scaled-up LMs could be dangerously powerful are not sufficiently well understood. Meanwhile, since GPT⁠-⁠3 already exists, and we have not yet been taken over by some form of malicious AGI, we are quite confident that models of this scale are not world-endingly dangerous. We think that this means we have the opportunity to do safety critical research before such models become truly dangerous. In order to do so though, we need access to large models to do the best research. Access to the actual underlying model is critical for work on model interpretability (a field especially useful to safety), and it seems certain relevant capabilities worth studying only start to emerge at larger scales (such as few-shot improvements only becoming noticeable for large models). It is very unclear if and when such models will start to exhibit far more powerful and dangerous capabilities. If we had access to a truly unprecedentedly large model (say one quadrillion parameters), we would not release it, as no one could know what such a system might be capable of.
Most (>99%) of the damage of GPT⁠-⁠3's release was done the moment the paper was published. What the release of the GPT⁠-⁠3 paper showed was just how simple and theoretically straight-forward building such a model is. "The only secret was that it was possible," as it were. Assuming that the scaling laws of transformers hold (and all current empirical evidence seems to point in that direction) there is very little one can do to prevent well funded actors from acquiring such capabilities (OpenAI has estimated at best a ~6 month lead time over replications), as the technology can so easily be scaled up by just investing more money. If even a bunch of random volunteers on the internet working in their free time using donated compute can put together such a model, then just about anyone can. And indeed, many different well funded actors are acquiring such capabilities: a few examples are Megatron-LM, Turing-NLG, Switch Transformer, PanGu-α/盘古 α, HyperCLOVA and Wudao/悟道 2.0, and that's just from the ones that are publicly known. We think the damage caused by new technologies like these are likely to be heavy-tailed, in the sense that the top 1% of dangerous actors are likely to be responsible for >99% of the damage. For the reasons just given, attempting to keep this technology out of the hands of bad actors is futile, and the best we can do is empower society as a whole to study and use this technology for good.
Delaying the release of language models is not a solution to solving the attacks on our epistemic commons. As Connor Leahy, a co-founder of EleutherAI, has written about rather extensively, language models are just the latest tool that might be deployed in attacking our society's epistemics. This is a fundamental problem that needs to be fixed. But attempting to limit the availability of LMs is a misguided attempt at security. Security through obscurity is not security. As already described in point 2, attempting to keep this technology out of select actors hands is just infeasible. Pretending that LMs are in some sense uniquely responsible for these security vulnerabilities in our shared epistemic norms and censoring their study will not give us the robust security we need. The attacks need to be studied and countermeasures developed, not well-meaning and high value research hampered. The scare around LMs has many of the hallmarks of security theatre, in that it costs large corporations little to nothing to gate their models behind a (commercial) API and claim they have contributed to safety, while in reality cheap and easy to run troll farms, recommender algorithms on social media platforms super charging disinformation and other far more serious threats remain under-addressed. Language Models in a sense represent a "Photoshop for text", and as with Photoshop proper, the solution was not to ban Photoshop, or restrict the study of digital image manipulation and CGI techniques.

This short overview should not be seen as a full treatise on all of the various EleutherAI members' beliefs about this highly complex situation. If you have questions or concerns, please feel free to reach out either directly to contact@eleuther.ai or drop by our discord and talk to us in our #alignment-general channel, where we love to talk about this kind of stuff for hours.