The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’

January 26, 2024

Nature Biotechnology: In this first-person piece, DTI COVID-19 researcher and UC Berkeley Professor of EECS and Bioengineering Jennifer Listgarten writes, “As a longtime researcher at the intersection of artificial intelligence (AI) and biology, for the past year I have been asked questions about the application of large language models and, more generally, AI in science. For example: ‘Since ChatGPT works so well, are we on the cusp of solving science with large language models?’ or ‘Isn’t AlphaFold2 suggestive that the potential of AI in biology and science is limitless?’ And inevitably: ‘Can we use AI itself to bridge the lack of data in the sciences in order to then train another AI?'”

Listgarten continues, “I do believe that AI — equivalently, machine learning — will continue to advance scientific progress at a rate not achievable without it. I don’t think major open scientific questions in general are about to go through phase transitions of progress with machine learning alone. The raw ingredients and outputs of science are not found in abundance on the internet, yet the tremendous power of machine learning lies in data — and lots of them.”

Read more here.