With a serendipitous introduction to a community of artists, C3.ai DTI cybersecurity Principal Investigator Ben Zhao, computer science professor at the University of Chicago, dedicated his team to producing ways to protect original artwork from rampant AI reproduction. Their three inventions – Fawkes, Glaze, and Nightshade, all designed to evade or counter-program AI scraping – have established Zhao as a defender of artists’ rights in the era of Generative AI.

His novel work has been covered in the tech press, art press, and in major media outlets from MIT Technology Review, TechCrunch, and Wired, to Scientific American, Smithsonian Magazine, and the New York Times.

At the C3.ai DTI Generative AI Workshop in Illinois last October, Zhao gave a talk relating how this series of events unfolded. Here’s what he had to say. Listen to the entire talk here.

(Excerpted and edited for length and clarity.)

UChicago Professor Ben Zhao showing samples of synthetic art at his C3.ai DTI presentation in fall 2023.

IN 2020, we built this tool called Fawkes, which, at a high level, is an image-altering sort-of filter that perturbs the feature space of a particular image, shifting the facial recognition position of that image into a different position inside the feature space. That tool got a bit of press and we set up a user mailing list.

We were starting to look at the potential downsides and harms of Generative AI in general deep learning. That’s when the news about Clearview AI came out, the company that scraped billions of images from online, social media, and everywhere else, to build facial recognition models for roughly 300 million people globally. They’re still doing this, with numbers significantly higher than that now.

Last summer, we got this interesting email – we still have it – from this artist in the Netherlands, Kim Van Dune. She wrote, “With the rise of AI learning on images, I wonder if Fawkes can be used on paintings and illustrations to warp images and render them less useful for learning algorithms.”

An interesting question, but at the time we had no idea what was going on in Generative AI and this question made no sense. Why do you need to protect art? We wrote back, “I’m sorry, Kim, this is only for facial recognition. We don’t know how to apply this for art, but thanks for reaching out.” Kind of a useless reply. When all the news hit about DALL-E 2, Stable Diffusion, and Midjourney, one day in the lab, Shawn walked over to me and said, “Ben, is this what they were talking about, that email from that artist?” And we’re like, “Okay, maybe that’s it.”

We went back to Kim to ask what was going on. And we got an invite to an online townhall of artists, in November. I jumped on that call not knowing what to expect. There are some big artists there and successful professionals in the field – including people who worked for major movie studios – about five to six hundred people, talking about how their lives had been upended in the last two or three months by Generative AI. This was a complete shock to us. Right after this call, I remember thinking, “Okay, we should do something. I think there is a technological solution to do something about this.”

Over the next couple of months, we reached out to Karla Ortiz and a few other artists to enlist their help connecting us to the artist community. We did a user study. First, we said, “Okay, I think we can do what we did with Fawkes, this idea of perturbation in the feature space while maintaining visible similarity to the original.” Of course, that’s really challenging, because in the art space, you would imagine artists – fine artists, creatives, professionals – would care quite a bit about how much you perturb their art, and let you get away with it. And we weren’t sure we could do this because obviously fusion models are quite different from discriminative classifiers like DNNs [Deep Neural Networks]. Also, our style is this weird and fuzzy sort of feature space that we weren’t sure held the same rules as something like feature space for a facial recognition feature effect.

We tried this, built an initial prototype, and conducted a massive user study with more than 1,100 professional artists. So many signed up because this is obviously dear to their hearts. By February, we had completed the study, submitted a paper, and picked up some press coverage, including the New York Times. A month later, we built the first version of what became known as Glaze, into a software release. By July, we had a million downloads. By August, we presented at a user security conference. There were awards as well, the Internet Defense Prize and a paper award.

We had released this desktop app, but it took us a while to realize that artists don’t have a lot of money, and most of them don’t have GPUs at their disposal. Many of them don’t even have desktop computers, and if they do, they’re woefully out of date. So, we built a free web service sitting on our GPU servers to do the computation for them.

One of the things that’s interesting about this whole process is what we learned. The first question that came up was, “Should we deploy something?” For me, this was a no-brainer because the harms were so severe and immediate. I was literally talking to people who were severely depressed and had anxiety attacks because of what was going on. It seemed like the stakes were extremely high and you had to do something because there was something that we could do. Turns out many people feel differently.

A number of people in the security community said, “Why would you do this? Don’t. If it’s at all imperfect, if it can be broken in months, years, you’re offering a false sense of security. Can it be future-proof?” But nothing is future-proof, right? Give it 10-20 years, I don’t even know if Generative AI models will be around. Who knows? They will probably be greatly different from they are now.

We decided on this weird compromise: We made a free app, but offline. Many artists were already paranoid to run more AI on their art. We had to walk this fine line between transparency and gaining trust from the artists.

So what happened after that? A lot of good things. The artist’s reaction globally was really insane. For a while there we got so many emails we couldn’t answer them all. Globally speaking, a lot of artists now use Glaze on a regular basis. A number of art galleries online still post signs that say, “Closed while we Glaze everything,” because Glazing can take a while. More than that, artists have been extremely helpful in helping us develop Glaze, with everything from the app layout to logo color schemes, everything has had a ton of input from artists. Some have even taken money out of their own pocket to advertise for Glaze – really quite unexpected.

The minute Glaze was out the door we started working on Nightshade – a poison attack in the wild. The paper came out last week.

Epilogue: The free Nightshade program, released on January 19, 2024, was downloaded 250,000 times within the first five days.

Sampling of news stories:

FAWKES
This Tool Could Protect Your Photos From Facial Recognition
New York Times – August 3, 2020

GLAZE
UChicago scientists develop new tool to protect artists from AI mimicry
University of Chicago News – February 15, 2023

NIGHTSHADE
This new data poisoning tool lets artists fight back against generative AI
MIT Technology Review – October 23, 2023

Nature Biotechnology: In this first-person piece, C3.ai DTI COVID-19 researcher and UC Berkeley Professor of EECS and Bioengineering Jennifer Listgarten writes, “As a longtime researcher at the intersection of artificial intelligence (AI) and biology, for the past year I have been asked questions about the application of large language models and, more generally, AI in science. For example: ‘Since ChatGPT works so well, are we on the cusp of solving science with large language models?’ or ‘Isn’t AlphaFold2 suggestive that the potential of AI in biology and science is limitless?’ And inevitably: ‘Can we use AI itself to bridge the lack of data in the sciences in order to then train another AI?'”

Listgarten continues, “I do believe that AI — equivalently, machine learning — will continue to advance scientific progress at a rate not achievable without it. I don’t think major open scientific questions in general are about to go through phase transitions of progress with machine learning alone. The raw ingredients and outputs of science are not found in abundance on the internet, yet the tremendous power of machine learning lies in data — and lots of them.”

Read more here.

Two C3.ai DTI researchers were quoted in Quanta about their work on autonomous driving.

Sayan Mitra, a computer scientist at the University of Illinois Urbana-Champaign leads a team that has managed to prove the safety of lane-tracking capabilities for cars and landing systems for autonomous aircraft. Their strategy is now being used to help land drones on aircraft carriers, and Boeing plans to test it on an experimental aircraft this year. “Their method of providing end-to-end safety guarantees is very important,” said Corina Pasareanu, a research scientist at Carnegie Mellon University and NASA’s Ames Research Center.

Their work involves guaranteeing the results of the machine-learning algorithms that are used to inform autonomous vehicles.

The aerospace company Sierra Nevada is currently testing these safety guarantees while landing a drone on an aircraft carrier. This problem is in some ways more complicated than driving cars because of the extra dimension involved in flying.

Read more here.

Image: Señor Salme for Quanta Magazine

C3.ai DTI cybersecurity P.I. Sergey Levine of UC Berkeley co-authored an article in IEEE Spectrum describing how robots from around the world are sharing data on object manipulation to help work towards a general purpose robotic brain.

“In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality,” authors write.

“As more labs engage in cross-embodiment research,” they conclude, “we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.”

Read it here.

AIThority: A novel antibiotic that can kill a type of bacterium responsible for many drug-resistant diseases has been identified by researchers at the Massachusetts Institute of Technology (MIT) and McMaster University using an artificial intelligence algorithm.

Regina Barzilay and Tommi Jaakkola, MIT professors and co-authors of the current paper, “Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii,” set out a few years ago to tackle this using machine learning. Nine antibiotics, including a highly effective one, were produced during those tests. This chemical, which was first investigated for its use as a diabetes medication, was found to be highly efficient against A. baumannii.

The C3.ai Digital Transformation Institute was among the organizations that contributed to the funding of this research.

Barzilay and Jaakkola were co-P.I.s on a 2020 C3.ai DTI grant awarded to Ziv Bar-Joseph of Carnegie Mellon University for a project using AI to mitigate COVID-19, research that led to this later discovery.

Read the full AIThority story here.

Read the paper here.

Image from Nature Chemical Biology paper

Agri-View: According to national U.S. Department of Agriculture statistics, no-till and conservation tillage are increasing, with more than three-quarters of corn and soybean farmers opting for the practices to reduce soil erosion, maintain soil structure and save on fuel. However those estimates are based primarily on farmer self-reporting and are only compiled once every five years, potentially limiting accuracy.

In a new study funded in part by C3.ai DTI, University of Illinois Urbana-Champaign scientists led by Kaiyu Guan demonstrate a way to accurately map tilled land in real time by integrating ground, airborne and satellite imagery.

Read the story here.

Read the study, “Cross-scale sensing of field-level crop residue cover: Integrating field photos, airborne hyperspectral imaging, and satellite data,” in Remote Sensing of Environment here.

Forbes covers the work of Dandelion Health, a startup sparked by the work of two C3.ai DTI researchers, Ziad Obermeyer of the University of California, Berkeley, and Sendhil Mullainathan of the University of Chicago Booth School of Business.

In 2019, the two co-authored a research paper on bias in healthcare algorithms that was published in Science. That paper’s findings would inspire them to start Dandelion, along with two other colleagues.

The paper revealed how differences in access to healthcare services among Black and white patients could ultimately result in fewer Black patients being flagged by an algorithm that used overall healthcare costs as a proxy for which patients need extra care.

That’s because if you just consider the total cost of care – that the sickest patients would be the ones with the highest bills – the data will skew towards people who can afford to go to the doctor. The result was that only around half of the Black patients who should get extra services were identified.

Access can vary wildly “depending on where you live, who you are, the color of your skin, the language you speak,” Obermeyer told Forbes. In this case, white patients were more likely to go to clinics and get treatment or surgery and had higher costs, while Black patients were more likely to use the emergency room once their untreated conditions were spiraling out of control. The end result? “The bias just piles up.”

Dandelion is creating a massive, de-identified dataset from millions of patient records so that developers can build and test the performance of their algorithms across diverse types of patients. The founding team hopes they can help establish a framework for testing and validating healthcare AI “while regulators play catchup.”

Read the full Forbes story here.

Read the 2019 Science paper here.

Forbes photo via Dandelion Health

Two C3.ai DTI P.I.s are part of an effort to provide resources for policymakers on AI governance.

MIT News: Providing a resource for U.S. policymakers, a committee of MIT leaders and scholars has released a set of policy briefs that outlines a framework for the governance of AI.

“The framework we put together gives a concrete way of thinking about these things,” says Asu Ozdaglar, deputy dean of academics in the MIT Schwarzman College of Computing and head of MIT’s EECS Department, who also helped oversee the effort.

Ad hoc committee members include Sendhil Mullainathan, the Roman Family University Professor of Computation and Behavioral Science at the University of Chicago Booth School of Business.

Read the MIT News story here.

See the AI Policy Briefs here.

Photo: Jake Belcher for MIT News

MIT News: From vehicle collision avoidance to airline scheduling systems to power supply grids, many of the services we rely on are managed by computers. As these autonomous systems grow in complexity and ubiquity, so too could the ways in which they fail.

Now, MIT engineers Chuchu Fan, assistant professor of aeronautics and astronomics, and graduate student Charles Dawson, have developed an approach that can be paired with any autonomous system, to quickly identify a range of potential failures in that system before they are deployed in the real world. What’s more, the approach can find fixes to the failures, and suggest repairs to avoid system breakdowns.

In February 2021, a major system meltdown in Texas got Fan and Dawson thinking. Winter storms with unexpectedly frigid temperatures set off failures across the power grid, creating the worst energy crisis in Texas’ history, and leaving more than 4.5 million homes and businesses without power for days. “That was a pretty major failure that made me wonder whether we could have predicted it beforehand,” Dawson says. “Could we use our knowledge of the physics of the electricity grid to understand where its weak points could be, and then target upgrades and software fixes to strengthen those vulnerabilities before something catastrophic happened?”

Dawson and Fan’s work focuses on robotic systems and finding ways to make them more resilient in their environment. Prompted in part by the Texas power crisis, they set out to expand their scope, to spot and fix failures in other complex, large-scale autonomous systems. To do so, they realized they would have to shift the conventional approach to finding failures.

The researchers presented their work at the Conference on Robotic Learning in Atlanta November 6-9.

C3.ai DTI Principal Investigator Chuchu Fan worked on frameworks for securing critical networked infrastructure for her awarded DTI project.

Read the story here. Read the paper, “A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling,” here.