OpenAI and Anthropic will start predicting when users are underage

OpenAI and Anthropic are making tweaks to their chatbots that they are saying will make them safer for teenagers. As OpenAI has up to date its pointers on how ChatGPT ought to work together with customers between the ages of 13 and 17, Anthropic is engaged on a brand new approach to establish if somebody is perhaps underage.

On Thursday, OpenAI introduced that ChatGPT’s Mannequin Spec — the rules for the way its chatbot ought to behave — will embrace 4 new rules for customers beneath 18. Now, it goals to have ChatGPT “put teen security first, even when it could battle with different objectives.” Meaning guiding teenagers towards safer choices when different person pursuits, like “most mental freedom,” battle with security issues.

It additionally says ChatGPT ought to “promote real-world help,” together with by encouraging offline relationships, whereas laying out how ChatGPT ought to set clear expectations when interacting with youthful customers. The Mannequin Spec says ChatGPT ought to “deal with teenagers like teenagers” by providing “heat and respect” as a substitute of offering condescending solutions or treating teenagers like adults.

OpenAI says the replace to ChatGPT’s Mannequin Spec ought to end in “stronger guardrails, safer alternate options, and encouragement to hunt trusted offline help when conversations transfer into higher-risk territory.” The corporate provides that ChatGPT will urge teenagers to contact emergency companies or disaster sources if there are indicators of “imminent danger.”

Together with this alteration, OpenAI says it’s within the “early levels” of launching an age prediction mannequin that can try to estimate somebody’s age. If it detects that somebody could also be beneath 18, OpenAI will routinely apply teen safeguards. It is going to additionally give adults the prospect to confirm their age in the event that they have been falsely flagged by the system.

Anthropic is rolling out related measures, because it’s growing a brand new system able to detecting “refined conversational indicators {that a} person is perhaps underage” throughout conversations with its AI chatbot, Claude. The corporate will disable accounts in the event that they’re confirmed to belong to customers beneath 18, and already flags customers who self-identify as a minor throughout chats.

Anthropic additionally outlines the way it trains Claude to answer prompts about suicide and self-harm, in addition to its progress at decreasing sycophancy, which might reaffirm dangerous pondering. The corporate says its newest fashions “are the least sycophantic of any to this point,” with Haiku 4.5 performing one of the best, because it corrected its sycophantic habits 37 % of the time.

“On face worth, this analysis reveals there may be important room for enchancment for all of our fashions,” Anthropic says. “We predict the outcomes replicate a trade-off between mannequin heat or friendliness on the one hand, and sycophancy on the opposite.”

Source link