C3.ai DTI’s quarterly newsletter covers news of the Institute’s Principal Investigators and digital transformation research around the consortium. You can sign up to receive the newsletter here.
The winter edition covers this news:
MIT Group Releases White Papers on AI Governance
From Fawkes, to Glaze, to Nightshade, According to Ben Zhao
The Global Project to Make a General Robotic Brain
From Ground, to Air, to Space, Tillage Estimates Get Tech Boost
Learn about GenAI from the Experts
Updates and Quick Takes
See the Winter 2024 C3.ai DTI newsletter pdf here.
With a serendipitous introduction to a community of artists, C3.ai DTI cybersecurity Principal InvestigatorBen Zhao, computer science professor at the University of Chicago, dedicated his team to producing ways to protect original artwork from rampant AI reproduction. Their three inventions – Fawkes, Glaze, and Nightshade, all designed to evade or counter-program AI scraping – have established Zhao as a defender of artists’ rights in the era of Generative AI.
His novel work has been covered in the tech press, art press, and in major media outlets from MIT Technology Review, TechCrunch, and Wired, to Scientific American, Smithsonian Magazine, and the New York Times.
At the C3.ai DTI Generative AI Workshop in Illinois last October, Zhao gave a talk relating how this series of events unfolded. Here’s what he had to say. Listen to the entire talk here.
(Excerpted and edited for length and clarity.)
IN 2020, we built this tool called Fawkes, which, at a high level, is an image-altering sort-of filter that perturbs the feature space of a particular image, shifting the facial recognition position of that image into a different position inside the feature space. That tool got a bit of press and we set up a user mailing list.
We were starting to look at the potential downsides and harms of Generative AI in general deep learning. That’s when the news about Clearview AI came out, the company that scraped billions of images from online, social media, and everywhere else, to build facial recognition models for roughly 300 million people globally. They’re still doing this, with numbers significantly higher than that now.
Last summer, we got this interesting email – we still have it – from this artist in the Netherlands, Kim Van Dune. She wrote, “With the rise of AI learning on images, I wonder if Fawkes can be used on paintings and illustrations to warp images and render them less useful for learning algorithms.”
An interesting question, but at the time we had no idea what was going on in Generative AI and this question made no sense. Why do you need to protect art? We wrote back, “I’m sorry, Kim, this is only for facial recognition. We don’t know how to apply this for art, but thanks for reaching out.” Kind of a useless reply. When all the news hit about DALL-E 2, Stable Diffusion, and Midjourney, one day in the lab, Shawn walked over to me and said, “Ben, is this what they were talking about, that email from that artist?” And we’re like, “Okay, maybe that’s it.”
We went back to Kim to ask what was going on. And we got an invite to an online townhall of artists, in November. I jumped on that call not knowing what to expect. There are some big artists there and successful professionals in the field – including people who worked for major movie studios – about five to six hundred people, talking about how their lives had been upended in the last two or three months by Generative AI. This was a complete shock to us. Right after this call, I remember thinking, “Okay, we should do something. I think there is a technological solution to do something about this.”
Over the next couple of months, we reached out to Karla Ortiz and a few other artists to enlist their help connecting us to the artist community. We did a user study. First, we said, “Okay, I think we can do what we did with Fawkes, this idea of perturbation in the feature space while maintaining visible similarity to the original.” Of course, that’s really challenging, because in the art space, you would imagine artists – fine artists, creatives, professionals – would care quite a bit about how much you perturb their art, and let you get away with it. And we weren’t sure we could do this because obviously fusion models are quite different from discriminative classifiers like DNNs [Deep Neural Networks]. Also, our style is this weird and fuzzy sort of feature space that we weren’t sure held the same rules as something like feature space for a facial recognition feature effect.
We tried this, built an initial prototype, and conducted a massive user study with more than 1,100 professional artists. So many signed up because this is obviously dear to their hearts. By February, we had completed the study, submitted a paper, and picked up some press coverage, including the New York Times. A month later, we built the first version of what became known as Glaze, into a software release. By July, we had a million downloads. By August, we presented at a user security conference. There were awards as well, the Internet Defense Prize and a paper award.
We had released this desktop app, but it took us a while to realize that artists don’t have a lot of money, and most of them don’t have GPUs at their disposal. Many of them don’t even have desktop computers, and if they do, they’re woefully out of date. So, we built a free web service sitting on our GPU servers to do the computation for them.
One of the things that’s interesting about this whole process is what we learned. The first question that came up was, “Should we deploy something?” For me, this was a no-brainer because the harms were so severe and immediate. I was literally talking to people who were severely depressed and had anxiety attacks because of what was going on. It seemed like the stakes were extremely high and you had to do something because there was something that we could do. Turns out many people feel differently.
A number of people in the security community said, “Why would you do this? Don’t. If it’s at all imperfect, if it can be broken in months, years, you’re offering a false sense of security. Can it be future-proof?” But nothing is future-proof, right? Give it 10-20 years, I don’t even know if Generative AI models will be around. Who knows? They will probably be greatly different from they are now.
We decided on this weird compromise: We made a free app, but offline. Many artists were already paranoid to run more AI on their art. We had to walk this fine line between transparency and gaining trust from the artists.
So what happened after that? A lot of good things. The artist’s reaction globally was really insane. For a while there we got so many emails we couldn’t answer them all. Globally speaking, a lot of artists now use Glaze on a regular basis. A number of art galleries online still post signs that say, “Closed while we Glaze everything,” because Glazing can take a while. More than that, artists have been extremely helpful in helping us develop Glaze, with everything from the app layout to logo color schemes, everything has had a ton of input from artists. Some have even taken money out of their own pocket to advertise for Glaze – really quite unexpected.
The minute Glaze was out the door we started working on Nightshade – a poison attack in the wild. The paper came out last week.
Epilogue: The free Nightshade program, released on January 19, 2024, was downloaded 250,000 times within the first five days.
Nature Biotechnology: In this first-person piece, C3.ai DTI COVID-19 researcher and UC Berkeley Professor of EECS and Bioengineering Jennifer Listgarten writes, “As a longtime researcher at the intersection of artificial intelligence (AI) and biology, for the past year I have been asked questions about the application of large language models and, more generally, AI in science. For example: ‘Since ChatGPT works so well, are we on the cusp of solving science with large language models?’ or ‘Isn’t AlphaFold2 suggestive that the potential of AI in biology and science is limitless?’ And inevitably: ‘Can we use AI itself to bridge the lack of data in the sciences in order to then train another AI?'”
Listgarten continues, “I do believe that AI — equivalently, machine learning — will continue to advance scientific progress at a rate not achievable without it. I don’t think major open scientific questions in general are about to go through phase transitions of progress with machine learning alone. The raw ingredients and outputs of science are not found in abundance on the internet, yet the tremendous power of machine learning lies in data — and lots of them.”
Two C3.ai DTI researchers were quoted in Quanta about their work on autonomous driving.
Sayan Mitra, a computer scientist at the University of Illinois Urbana-Champaign leads a team that has managed to prove the safety of lane-tracking capabilities for cars and landing systems for autonomous aircraft. Their strategy is now being used to help land drones on aircraft carriers, and Boeing plans to test it on an experimental aircraft this year. “Their method of providing end-to-end safety guarantees is very important,” said Corina Pasareanu, a research scientist at Carnegie Mellon University and NASA’s Ames Research Center.
Their work involves guaranteeing the results of the machine-learning algorithms that are used to inform autonomous vehicles.
The aerospace company Sierra Nevada is currently testing these safety guarantees while landing a drone on an aircraft carrier. This problem is in some ways more complicated than driving cars because of the extra dimension involved in flying.
Industry forecasting firm Visiongain released its Antibacterial Drugs Market Report 2024-2034 today, which projected the antibacterial drug market to grow at a compound annual growth rate of 3.6 percent during the 10-year forecast period.
One reason cited for this growth is the impact of artificial intelligence and machine learning on the drug discovery process. The report highlights the recent discovery of a drug to treat the bacteria Acinetobacter baumannii, which built upon efforts two C3.ai Co-P.I.s who worked on the breakthrough, MIT professors Regina Barzilay and Tommi Jaakkola, had make to investigate interventions for COVID-19 that were funded by the C3.ai Digital Transformation Institute.
The new drug was identified from a library of ~7,000 potential drug compounds using a machine-learning model trained to evaluate whether a chemical compound will inhibit the growth of A. baumannii. Once approved, the drug could help combat Acinetobacter baumannii found in hospitals that leads to pneumonia, meningitis, and other serious infections.
See our news brief on the drug discovery study here.
C3.ai DTI cybersecurity P.I. Sergey Levine of UC Berkeley co-authored an article in IEEE Spectrum describing how robots from around the world are sharing data on object manipulation to help work towards a general purpose robotic brain.
“In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality,” authors write.
“As more labs engage in cross-embodiment research,” they conclude, “we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.”
CBS News Pittsburgh interviews C3.ai DTI researcher Zico Kolter of Carnegie Mellon University about his discovery of how easy it can be to engineer a “jailbreak” in chatbots to break through safeguards and generate harmful information.
These vulnerabilities can make it easier to allow humans to use the chatbots for all sorts of dangerous purposes, generating hate speech or fake social media accounts to spread false information — something the author fears in the upcoming presidential election, increasing divisions and making all information suspect.
“I think the biggest risk of all of this isn’t that we believe all the false information, it’s that we stop trusting information period. I think this is already happening to a degree,” Kolter said. “Used well, these can be useful, and I think a lot of people can use them and can use them effectively to improve their lives if used properly as tools.”
AIThority: A novel antibiotic that can kill a type of bacterium responsible for many drug-resistant diseases has been identified by researchers at the Massachusetts Institute of Technology (MIT) and McMaster University using an artificial intelligence algorithm.
Regina Barzilay and Tommi Jaakkola, MIT professors and co-authors of the current paper, “Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii,” set out a few years ago to tackle this using machine learning. Nine antibiotics, including a highly effective one, were produced during those tests. This chemical, which was first investigated for its use as a diabetes medication, was found to be highly efficient against A. baumannii.
The C3.ai Digital Transformation Institute was among the organizations that contributed to the funding of this research.
Barzilay and Jaakkola were co-P.I.s on a 2020 C3.ai DTI grant awarded to Ziv Bar-Joseph of Carnegie Mellon University for a project using AI to mitigate COVID-19, research that led to this later discovery.
Agri-View: According to national U.S. Department of Agriculture statistics, no-till and conservation tillage are increasing, with more than three-quarters of corn and soybean farmers opting for the practices to reduce soil erosion, maintain soil structure and save on fuel. However those estimates are based primarily on farmer self-reporting and are only compiled once every five years, potentially limiting accuracy.
In a new study funded in part by C3.ai DTI, University of Illinois Urbana-Champaign scientists led by Kaiyu Guan demonstrate a way to accurately map tilled land in real time by integrating ground, airborne and satellite imagery.