How Trustworthy Are Large Language Models Like GPT?

August 23, 2023

Stanford HAI News: More people feel comfortable outsourcing important projects to AI; new research shows why we shouldn’t. In a new study, researchers show that popular models maintained a toxicity probability of 32 percent and can easily leak private information.

Sanmi Koyejo, assistant professor of computer science at Stanford University, Bo Li, assistant professor of computer science at the University of Illinois at Urbana-Champaign, and Dawn Song, professor of computer science at UC Berkeley — three C3.ai DTI researchers from three consortium universities –together with collaborators from Microsoft Research, set out to explore exactly how trustworthy these large language models are.

“Everyone seems to think LLMs are perfect and capable, compared with other models. That’s very dangerous, especially if people deploy these models in critical domains. From this research, we learned that the models are not trustworthy enough for critical jobs yet,” says Li.

Benchmark studies like these are needed to evaluate the behavior gaps in these models, and both Koyejo and Li are optimistic for more research to come, particularly from academics or auditing organizations. “Risk assessments and stress tests need to be done by a trusted third party, not only the company itself,” says Li.

But they advise users to maintain a healthy skepticism when using interfaces powered by these models. “Be careful about getting fooled too easily, particularly in cases that are sensitive. Human oversight is still meaningful,” says Koyejo.

Read the full story here.

The paper, “DECODING TRUST: A Comprehensive Assessment of Trustworthiness in GPT Models, is available here.

Illustration: Stanford University, Human-Centered Artificial Intelligence