MIT Engineers on a ‘Failure-finding’ Mission

November 09, 2023

MIT News: From vehicle collision avoidance to airline scheduling systems to power supply grids, many of the services we rely on are managed by computers. As these autonomous systems grow in complexity and ubiquity, so too could the ways in which they fail.

Now, MIT engineers Chuchu Fan, assistant professor of aeronautics and astronomics, and graduate student Charles Dawson, have developed an approach that can be paired with any autonomous system, to quickly identify a range of potential failures in that system before they are deployed in the real world. What’s more, the approach can find fixes to the failures, and suggest repairs to avoid system breakdowns.

In February 2021, a major system meltdown in Texas got Fan and Dawson thinking. Winter storms with unexpectedly frigid temperatures set off failures across the power grid, creating the worst energy crisis in Texas’ history, and leaving more than 4.5 million homes and businesses without power for days. “That was a pretty major failure that made me wonder whether we could have predicted it beforehand,” Dawson says. “Could we use our knowledge of the physics of the electricity grid to understand where its weak points could be, and then target upgrades and software fixes to strengthen those vulnerabilities before something catastrophic happened?”

Dawson and Fan’s work focuses on robotic systems and finding ways to make them more resilient in their environment. Prompted in part by the Texas power crisis, they set out to expand their scope, to spot and fix failures in other complex, large-scale autonomous systems. To do so, they realized they would have to shift the conventional approach to finding failures.

The researchers presented their work at the Conference on Robotic Learning in Atlanta November 6-9. DTI Principal Investigator Chuchu Fan worked on frameworks for securing critical networked infrastructure for her awarded DTI project.

Read the story here. Read the paper, “A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling,” here.