Dario Amodei’s AI security contingent was rising disquieted with a few of Sam Altman’s behaviors. Shortly after OpenAI’s Microsoft deal was inked in 2019, a number of of them have been surprised to find the extent of the guarantees that Altman had made to Microsoft for which applied sciences it will get entry to in return for its funding. The phrases of the deal didn’t align with what that they had understood from Altman. If AI questions of safety truly arose in OpenAI’s fashions, they nervous, these commitments would make it far tougher, if not unattainable, to forestall the fashions’ deployment. Amodei’s contingent started to have severe doubts about Altman’s honesty.
“We’re all pragmatic individuals,” an individual within the group says. “We’re clearly elevating cash; we’re going to do business stuff. It’d look very affordable for those who’re somebody who makes a great deal of offers like Sam, to be like, ‘All proper, let’s make a deal, let’s commerce a factor, we’re going to commerce the subsequent factor.’ After which if you’re somebody like me, you’re like, ‘We’re buying and selling a factor we don’t absolutely perceive.’ It feels prefer it commits us to an uncomfortable place.”
This was in opposition to the backdrop of a rising paranoia over totally different points throughout the corporate. Inside the AI security contingent, it centered on what they noticed as strengthening proof that highly effective misaligned techniques may result in disastrous outcomes. One weird expertise particularly had left a number of of them considerably nervous. In 2019, on a mannequin educated after GPT‑2 with roughly twice the variety of parameters, a bunch of researchers had begun advancing the AI security work that Amodei had wished: testing reinforcement studying from human suggestions (RLHF) as a technique to information the mannequin towards producing cheerful and constructive content material and away from something offensive.
However late one night time, a researcher made an replace that included a single typo in his code earlier than leaving the RLHF course of to run in a single day. That typo was an vital one: It was a minus signal flipped to a plus signal that made the RLHF course of work in reverse, pushing GPT‑2 to generate extra offensive content material as an alternative of much less. By the subsequent morning, the typo had wreaked its havoc, and GPT‑2 was finishing each single immediate with extraordinarily lewd and sexually specific language. It was hilarious—and in addition regarding. After figuring out the error, the researcher pushed a repair to OpenAI’s code base with a remark: Let’s not make a utility minimizer.
Partly fueled by the conclusion that scaling alone may produce extra AI developments, many workers additionally nervous about what would occur if totally different corporations caught on to OpenAI’s secret. “The key of how our stuff works may be written on a grain of rice,” they might say to one another, which means the only phrase scale. For a similar motive, they nervous about highly effective capabilities touchdown within the fingers of dangerous actors. Management leaned into this concern, continuously elevating the specter of China, Russia, and North Korea and emphasizing the necessity for AGI growth to remain within the fingers of a US group. At occasions this rankled workers who weren’t American. Throughout lunches, they might query, Why did it should be a US group? remembers a former worker. Why not one from Europe? Why not one from China?
Throughout these heady discussions philosophizing concerning the lengthy‑time period implications of AI analysis, many workers returned usually to Altman’s early analogies between OpenAI and the Manhattan Challenge. Was OpenAI actually constructing the equal of a nuclear weapon? It was a wierd distinction to the plucky, idealistic tradition it had constructed to date as a largely tutorial group. On Fridays, workers would relax after an extended week for music and wine nights, unwinding to the soothing sounds of a rotating solid of colleagues taking part in the workplace piano late into the night time.