Cyber Security

Home / Cyber Security

In 2022, the C3.ai Digital Transformation Institute selected 24 research projects to transform cybersecurity and secure critical infrastructure, in response to the Institute’s Third Call for Proposals in December 2021.

A total of $6.5 million and access to the C3 AI Suite and Microsoft Azure computing and storage have been awarded to support the following multidisciplinary projects. Read the announcement here.

Industrial robots usually operate within a “safety cage” to ensure that workers are not harmed by a robot in operation. We need the same type of security, simple and explainable, for IT systems. Novel mechanisms that can be embedded in the network, such as through hardware-accelerated programmable networks or kernel extensions, are the enabling technology for this type of security at network level. We propose a solution using machine learning and test generation, leveraging expertise in machine learning from UIUC and in testing and verification from KTH. Unlike previous approaches, we focus on explainable AI in our safety cage, so that the cage itself, as well as its effects on network traffic, can be inspected and validated. Lightweight approaches guarantee that our safety cage can be embedded in programmable networks or operating system kernels. Machine learning will learn behavioral models that have their roots in formal modeling (access policies, protocol states, Petri Nets) and thus are inherently readable by humans. Test-case generation will validate diverse traces against the model and also showcase potential malicious behavior, validating both positive and negative outcomes.

Cyrille Valentin Artho
Associate Professor
KTH Royal Institute of Technology

Roberto Guanciale
Associate Professor
KTH Royal Institute of Technology

Reyhaneh Jabbarvand
Assistant Professor
University of Illinois at Urbana-Champaign

Brighten Godfrey
Professor
University of Illinois at Urbana-Champaign

This project focuses on the design, deployment, and experimental testing of AI/ML techniques for detecting malicious actors within multi-agent systems, applied to mixed-autonomy traffic. The focus is on a specific class of attacks new in the field of mixed autonomy: stealthy, advanced, persistent attacks. We design attacks in the form of flow-controlling policies, derived through our FLOW system, a cloud integration of deep RL and traffic simulation. Detection algorithms rely on the application of deep learning auto-encoders, to be trained and tested using a next-gen traffic monitoring system, known as the I-24 MOTION testbed, in Nashville, TN. During this one-year project, I-24 MOTION will generate approximately 200,000,000 vehicle miles of trajectory data processed through the C3 AI platform, approximately 10^4 times larger than the seminal UC Berkeley NGSIM dataset on the I-80 – currently the gold standard of vision-based traffic data. Within one year, we will control and “pseudo attack” traffic flow on a segment of I-24 daily, for a few hours at a time, for a full week, with our 100 level-2 self-driving vehicles (Toyota, Nissan, and GM) capable of (safely) influencing the rest of traffic (thousands of vehicles driven by regular motorists). Using detect models trained on I-24 MOTION, we will demonstrate the ability to detect these “attacking” vehicles through the C3 AI Suite implementation of our detection algorithms on the I-24 MOTION system.

Alexandre M. Bayen
Director of the Institute of Transportation Studies
University of California, Berkeley

Carl A. Gunter
George and Ann Fisher Distinguished Professor in Engineering
University of Illinois at Urbana-Champaign

Maria Laura Delle Monache
Assistant Professor
University of California, Berkeley

Jonathan W. Lee
Senior Engineering Manager and Project Coordinator
University of California, Berkeley

Bo Li
Assistant Professor
University of Illinois at Urbana-Champaign

Jonathan M. Sprinkle
Professor of Computer Science
Vanderbilt University

Daniel B. Work
Associate Professor
Vanderbilt University

Probabilistic programming allows users to write programs specifying a probabilistic model and estimate distribution parameters based on observations. With AI techniques, it is able to handle models too complex for direct mathematical analysis. Therefore, we believe probabilistic programming can be transformative to secure critical infrastructure, wherein complex interactions abound. Thrust 1: Applying probabilistic programming to three domains relevant to infrastructure cybersecurity: generating benign and malicious communications traffic for power grids; managing congestion in computer networks; and using physics models in robust cyber-physical systems. The team has extensive expertise in these domains and use of real-world datasets to create realistic demonstrations of the benefits of probabilistic programming. Thrust 2: A previously unexplored concern for probabilistic programming is adversarial inputs. Our preliminary results show that directed attacks can effectively skew the learned distribution parameters, even in models robust to random noise. We will study threats of adversarial inputs to probabilistic programming in the context of the applications above. We will define realistic threat models and explore attack strategies, including attacks on the underlying learning techniques. Thrust 3: Designing countermeasures to adversarial input by building on previous work to detect corrupt inputs and secure machine learning models. We will use our system for program analysis and transformation to implement automatic robustification of probabilistic programs. We will then evaluate countermeasures end-to-end, using case studies and attacks developed in the first two thrusts.

Nikita Borisov
Professor
University of Illinois at Urbana-Champaign

Geir Dullerud
Professor
University of Illinois at Urbana-Champaign

Sasa Misailovic
Assistant Professor
University of Illinois at Urbana-Champaign

Sayan Mitra
Professor
University of Illinois at Urbana-Champaign

David Nicol
Director, Information Trust Institute
University of Illinois at Urbana-Champaign

Modern SCADA systems rely on IP-based communication protocols that are mostly event-driven and follow a publish-subscribe model. The timing and content of protocol messages emerge from interactions between the physical system state and the protocol’s internal state – as an effect, traditional approaches to anomaly detection result in excessive false positives, and ultimately, alarm fatigue. We propose to develop computationally efficient machine learning algorithms and tools for attack detection and identification based on a novel, scalable representation of the physical system state, the communication protocol state, and the IT infrastructure’s security state, maintained based on noisy observations and measurements from the physical and the IT infrastructure. The key contribution is to learn a succinct representation of the security state of the IT infrastructure that allows computationally efficient belief updates in real time, and enables to jointly account for the evolution of the state of the physical system, communication protocols, and infrastructure for accurate detection of attacks and identification through causal reasoning based on learnt dependency models. The research will help address questions such as, how to achieve real-time situational awareness in complex IT infrastructures, how to develop anomaly detectors with low false positive and low false negative rate, and, how to use information about IT infrastructure to improve attack identification? The project leverages the expertise of three research teams, from KTH, UIUC, and MIT, with extensive expertise in cyber physical systems security, smart grids, and anomaly detection.

György Dán
Professor in Teletraffic Systems
KTH Royal Institute of Technology

Klara Nahrstedt
Grainger Distinguished Chair in Engineering
University of Illinois at Urbana-Champaign

Saurabh Amin
Associate Professor of Civil and Environmental Engineering
Massachusetts Institute of Technology

Henrik Sandberg
Professor, Electrical Engineering and Computer Science
KTH Royal Institute of Technology

Security and robustness are increasingly challenging with interconnected modern infrastructures using AI to make decisions. Any mistake in communication and operation could lead to serious consequences, but AI-based decision-makers have to rely on a huge amount of data from many distributed agents. Our project aims to understand how to use rigorous methods based on mathematical analysis to overcome the issue that AI-based methods are not certifiably secure and correct. We will develop novel decentralized, compositional, and data-driven formal techniques that learn certificates together with the decision-maker for dynamical behavior of infrastructure networks, and formalize the fault detection, isolation, and recovery (FDIR) procedure in large, networked infrastructure using the learned compositional neural certificates. We aim to advance techniques to make full use of the machinery of machine learning and rigorous methods on critical networked infrastructure systems. We will develop a framework – consisting of algorithms, theories, and software tools for learning compositional neural certificates and certified decision-makers – to be tested on large critical infrastructures, including connected vehicles, air transportation systems, and power grids.

Chuchu Fan
Assistant Professor of Aeronautics and Astronautics
Massachusetts Institute of Technology

Guannan Qu
Assistant Professor, Electrical and Computer Engineering
Carnegie Mellon University

Software vulnerabilities in libraries and external modules are a leading enabler of cybersecurity attacks. Identifying such vulnerabilities, however, remains massively challenging. While automated testing tools (e.g., AFL) have gained wide adoption, they require input test cases (i.e., sequences of API calls) to be effective. Crafting a representative and complete set of test cases — especially while respecting constraints (e.g., type safety) imposed by a compiler — remains computationally infeasible. Our team will develop novel machine learning-based techniques for generating inputs for API testing – specifically, we explore the use of generative models to generate valid and useful inputs. We will examine novel architectures and training procedures to favor high coverage, building on recent work by the principal investigators. Our approach will be evaluated on two use cases – a persistent key-value storage system and Rust Library APIs.

Giulia Fanti
Assistant Professor, Electrical and Computer Engineering
Carnegie Mellon University

Limin Jia
Associate Research Professor
Carnegie Mellon University

The project has two themes: (1) Developing and applying fingerprinting tools and techniques to automatically generate fingerprints for known vulnerabilities and other security weaknesses; and (2) Designing, implementing, and deploying large-scale scanning techniques to uncover these vulnerabilities in a broad array of settings (such as industrial control and other cyber-physical settings). The approaches that we propose to develop extend a rich body of previous work in both supervised machine learning (to detect, fingerprint, and inventory vulnerable infrastructure), unsupervised machine learning (to detect anomalous device behavior), and large-scale Internet scanning.

Nick Feamster
Professor, Department of Computer Science
University of Chicago

Zakir Durumeric
Assistant Professor of Computer Science
Stanford University

Prateek Mittal
Associate Professor, Department of Electrical Engineering
Princeton University

This project aims to advance machine learning techniques for the detection of insider attacks. The analysis is based on a time series of events in which trusted insiders access resources. The detection system learns patterns and aims to predict events in which trusted insiders violate their trust. We focus on technical advances in three thrusts: (1) Re-learning, where we develop strategies to determine how frequently patterns of behavior should be learned; the key goal here is to learn for long enough to recognize key patterns but not so long that information for the detector is obsolete; (2) Understanding whether and how insiders may poison training-time data or evade testing-time data to trick a detector into falsely labeling a given target access as legitimate; (3) Logic inference aims to develop domain knowledge and use it to improve detection systems. A key goal is to reduce false positives so that follow-up is more effective. In the first phase of the project, we develop foundations for these three thrusts. In the second phase, we study how they can be integrated and used at scale.

Carl Gunter
George and Ann Fisher Distinguished Professor in Engineering
University of Illinois at Urbana-Champaign

Bo Li
Assistant Professor, Computer Science
University of Illinois at Urbana-Champaign

Gang Wang
Assistant Professor, Computer Science
University of Illinois at Urbana-Champaign

Insider threats have been one of the top security concerns facing large organizations. This team has identified three major challenges when modeling insider threats – the Rarity challenge, Multi-modality challenge, and Adaptivity challenge. The PIs propose multi-facet rare event modeling of adaptive insider threats via novel AI models and algorithms. The work will be carried out through two major research thrusts: (1) Detection of Multi-facet Insider Threats; and (2) Modeling of Adaptive Insider Threats. If successful, these techniques will advance the state-of-the-art of insider threat detection and other related areas – such as outlier detection, rare category analysis, multi-view learning, and adversarial learning. They are also expected to bridge multiple knowledge gaps, such as the relative performance of contrastive learning and non-contrastive learning for modeling multi-modality data, the impacts of adversarial attacks from multiple domains, etc. Throughout the performance period of the project, the team will work closely with DevOps to integrate resulting functions into the C3 AI Suite platform, to carry out evaluations using various public data sets (including the CERT Insider Threat data set), and to seek joint publication opportunities at top conferences and journals in AI and ML.

Jingrui He
Associate Professor, Information Sciences
University of Illinois at Urbana-Champaign

John Birge
Professor of Operations Management, Chicago Booth
University of Chicago

Artificial intelligence algorithms are known to be vulnerable to subtle, adversarial perturbations to “normal” data to trick an AI algorithm into making large, potentially catastrophic mistakes. While such attacks are demonstrated in visual domains (where they may be unrealistic, as an attacker typically cannot control inputs to a classifer at a pixel level), for cybersecurity domains they bring genuine, tangible security risks: they present a danger of having critical infrastructure uprooted by an attack on the AI itself. This project aims to develop new techniques for building AI methods to be simultaneously high performance — in their accuracy and response time — and also provably robust against broad categories of adversarial perturbations, with a specific focus precisely on these cybersecurity domains. From a technical perspective, the work builds upon recently proposed semi-definite relaxation methods, which have shown promise in striking much better trade-offs between performance and safety compared to existing methods (previously, the method has been considered intractable for even modest-sized AI systems); using recent methods in sparse semidefinite programming, we aim to develop verification approaches that scale linearly with network size (for underlying RNN or 1D CNN architectures) and thus can be applied to cybersecurity AI systems on a real-world scale. Crucially, an important advantage of our approach is that it allows provably robust models to be trained against semantic threat models, as these can match much more closely to adversarial perturbations that arise in the real world.

Zico Kolter
Associate Professor
Carnegie Mellon University

Richard Zhang
Assistant Professor of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign

As Machine Learning models grow in complexity and size, training models on very large datasets often requires significant computing resources. Cloud computing offers a low-cost, scalable approach to training these large complex models, which are then served in applications. ML applications have varying requirements around networking, computing, privacy, threat models, and other factors. For example, privacy, latency, and bandwidth requirements result in the need to deploy ML applications close to or at the network edge. At the same time, malicious actors may attempt to evade an application’s ML decision boundaries in security-sensitive applications, or to alter decision boundaries by exerting influence during periodic re-training. Or they may simply engage in Intellectual Property theft and steal valuable models and parameters. In traditional deployments, anyone with physical access to deployment machines can examine ML algorithms and extract model parameters, and can view the data provided to the model. This access creates substantial privacy and security challenges, and results in application developers limiting application deployments to environments they can fully secure and control. This team aims to develop a novel C3 AI Suite plug-in approach to leverage trusted execution hardware to enable secure ML algorithms. In our proposed integration of Fog Robotics with the C3 AI Suite, applications execute against Data Capsules using Paranoid Stateful Lambdas enabling robust ML applications to be built, securely trained, and served – preserving user privacy and preventing adversarial extraction of models and their parameters.

John Kubiatowicz
Professor, Electrical Engineering and Computer Sciences
University of California, Berkeley

Anthony Joseph
Professor, Electrical Engineering and Computer Sciences
University of California, Berkeley

Social engineering attacks, which exploit vulnerabilities in human behavior and cognition to infiltrate systems even in the presence of strong hardware and software measures, are a growing threat to global cybersecurity. In addition, large populations of new users in developing countries, as they enter the social media and technology market, often lack awareness of proper cyber hygiene. Nudges (i.e., “soft influencing” interventions, in contrast to mandates or monetary inducements) have the potential to successfully induce such desirable online behaviors. However, much remains to be understood regarding the principled design of cyber hygiene nudges, especially at scale. Key gaps include a lack of data regarding effective nudges that work across a diversity of contexts, as well as regarding available modalities to implement those nudges. This team aims to fill these gaps by addressing: (1) the dearth of data on social engineering threats and cyber-hygiene-related behaviors, especially in developing countries; and (2) the control-theoretic design of nudges, using machine learning-generated models as a key enabling link between the two.

Cedric Langbort
Professor, Aerospace Engineering
University of Illinois at Urbana-Champaign

Abhilash Mishra
Founding Director, Xu Initiative
University of Chicago

To improve the efficiency, resiliency, and sustainability of power systems and to address climate change, the operation of power systems is becoming data-centric. Major operational problems, such as security-constrained optimal power flow, contingency analysis, and transient stability analysis, rely on knowledge extracted from sensory data. Data manipulation by a malicious actor tampers with grid operation, with catastrophic consequences including physical damages to the equipment and cascading failures. Developing frameworks and methodologies that help power operators protect the U.S. power grid against such malicious attacks is of utmost importance to national security. This team will address five objectives regarding cyberattacks for power systems, based on state-of-the-art AI methods: (1) designing graph neural networks that can process power data to learn the state of the system and detect cyberattacks; (2) developing AI algorithms that utilizes image recognition techniques using convolutional neural networks to detect denial of view and image replays resulting from cyberattacks; and (3) developing optimization techniques to robustify previously designed neural networks against adversarial data. Selecting power system operating points and policies through attack-aware methods creates a resilient system. In case an attack is not immediately sensed, operating from such a position of strength buys time for detection algorithms to work. Objectives 4 and 5 aim to develop attack-aware AI methods via distributionally robust optimization and cascading failure analysis.

Javad Lavaei
Associate Professor, Industrial Engineering and Operations Research
University of California, Berkeley

Somayeh Sojoudi
Assistant Professor of Electrical Engineering & Computer Science
University of California, Berkeley

Steven Low
Professor of Computing and Mathematical Sciences and Electrical Engineering
California Institute of Technology

Jan Kronqvist
Assistant Professor
KTH Royal Institute of Technology

Jeremy Lawrence
Principal Technical Leader at Electric Power Research Institute
Electric Power Research Institute

Federated Learning (FL) is emerging as a promising approach to enable scalable intelligence over next-gen AI systems. It transforms the machine learning ecosystem from “centralized over-the-cloud” to “decentralized over-the-local users,” in order to alleviate the communication bottleneck for pooling massive amounts of data from millions of local users and strengthen users’ privacy by avoiding data egress from their local devices. Sever cybersecurity systems – such as intrusion detection, malware detection, and spam detection systems – have been deployed under such decentralized settings. Despite significant FL milestones as critical infrastructure such as distributed spam detection systems, several fundamental challenges need to be addressed. First, after AI ecosystems are decentralized across local users, there is potential for advanced privacy breaches in FL systems. For instance, models learned from local users’ email data would leak private information by sending even just the model gradient to the server. Second, since it is hard to verify the identities of a large number of local users, adversarial attacks could happen during both training and testing stages. The overarching goal of this proposal is to address these security and privacy challenges and enable scalable and secure intelligence for distributed cybersecurity systems. In particular, we propose REFL, aiming to answer three critical questions/challenges: (1) how to ensure resilience and security of the distributed systems against training-time attacks; (2) how to ensure resilience and security against test-time attacks; and (3) how to ensure privacy for local users in the distributed setting?

Bo Li
Assistant Professor, Computer Science
University of Illinois at Urbana-Champaign

Dawn Song
Professor, Electrical Engineering and Computer Sciences
University of California, Berkeley

This team proposes developing the foundations for a unified infrastructure combining machine learning and program analysis for identifying vulnerabilities in JavaScript programs. We aim to achieve low false-negative and false-positive rates and low overhead. Our proposed project includes the following tasks: (1) Machine Learning. We will build predictive models for fast, precise, and scalable vulnerability detection in JavaScript programs. The models will be trained on a corpus of programs that are labeled as vulnerable or non-vulnerable based on the results of program analysis, as well as on new vulnerable programs obtained via mutations generated in collaboration with an industrial partner. (2) Program analysis. We will complement existing program analysis tools with improved dynamic techniques for identifying vulnerabilities in JavaScript programs. We will use their output to provide training data for Task 1 and in concert with ML vulnerability detection to achieve better coverage and precision. (3) Integration and evaluation. We will integrate ML-based and program analysis tools to achieve better accuracy and lower overhead. We will evaluate the performance and effectiveness of this integrated approach with deployed web applications and real Node.js packages. We will leverage the C3 AI Suite for unifying multi-source and heterogeneous data, automating ML pipelines, and versioning experimental results.

Corina Pasareanu
Principal Systems Scientist
Carnegie Mellon University

Limin Jia
Associate Research Professor
Carnegie Mellon University

Lujo Bauer
Professor of Electrical and Computer Engineering
Carnegie Mellon University

Hakan Erdogmus
Teaching Professor, Electrical and Computer Engineering
Carnegie Mellon University

Ruben Martins
Assistant Research Professor
Carnegie Mellon University

This project aims at the development of new theoretical and algorithmic AI tools to locate attack sources with high accuracy, low computational complexity, and low sample complexity. It has two mutually reinforcing research thrusts. The first thrust develops theoretical tools based on statistical learning, graph theory, and stochastic processes to understand the fundamental limits of source localization and to derive important features and structural properties for source localization. Thrust 2 develops novel AI algorithms based on graph neural networks (GNN) and guided by theories in Thrust 1 to quickly and accurately locate attack sources. The success of this project will be measured by two criteria, including the impact of the fundamental results and the accuracy, scalability, and efficiency of algorithms/software toolkits.

H. Vincent Poor
Professor
Princeton University

Hanghang Tong
Associate Professor
University of Illinois at Urbana-Champaign

Lei Ying
Professor of Electrical Engineering and Computer Science
University of Michigan

With the integration of information and communications technology and the intelligent electric devices, substation automation systems (SAS) greatly boost the efficiency of power system monitoring and control. However, at the frontier of the wide-area monitoring and control infrastructure of a bulk power system, substations bring new vulnerabilities and are attractive targets for attackers. In this project, we will research, develop, and validate algorithms to defend against cyberattacks that aim to disrupt operations of a substation by maliciously changing measurements and/or spoofing spurious control commands. We propose multiple use-inspired AI innovations that leverage concurrent capabilities of SAS to transform cybersecurity of power systems, including (1) a framework that synergizes optimization-based attack modeling with inverse reinforcement learning for multi-stage attack detection; (2) a decision-focused distributed CPS modeling approach; and (3) a mathematical program with equilibrium constraints framework of adversarial unlearning for spoofing detection.

Alberto Sangiovanni-Vincentelli
Professor
University of California, Berkeley

Ming Jin
Assistant Professor of Electrical and Computer Engineering
Virginia Tech

Carlo Fischione
Professor
KTH Royal Institute of Technology

Chen-Ching Liu
American Electric Power Professor
Virginia Tech

Securing critical energy infrastructures is an imminent concern for the safety and security of societal functions. Critical infrastructure operators are unfortunately constrained by resources and personnel and may not have the degree of sophistication that “hyperscale” IT infrastructures possess today in terms of global-scale visibility, having complex security DevOps workflows, and being able to afford bespoke solutions. We argue that AI- and ML-driven workflows can help level the playing field for critical energy infrastructure operators by helping automate security-relevant workflows and provide early detection of novel threats. We identify key challenges to data access, model exploration, and automation to realize these benefits. We envision an open novel software-defined collaborative AI/ML cybersecurity stack to tackle these challenges and help democratize benefits of AI/ML driven automation for security infrastructures, building on three key research thrusts: (1) Using high fidelity synthetic data to augment normal and anomalous traffic and log profiles to enable sharing of insights across the ecosystem; (2) A modular reusable extensible ML pipeline for developing novel AI/ML detection and response defenses and systematizing the knowledge on application of AI/ML for critical infrastructure security; and (3) A self-driving software-defined network infrastructure that can enable the design and implementation of novel reprogrammable defenses driven by AI/ML logic.

Vyas Sekar
Tan Family Professor, Electrical and Computer Engineering
Carnegie Mellon University

Giulia Fanti
Assistant Professor, Electrical and Computer Engineering
Carnegie Mellon University

Lawrence Pileggi
Professor of Electrical and Computer Engineering
Carnegie Mellon University

Lujo Bauer
Professor of Electrical and Computer Engineering
Carnegie Mellon University

Anthony Rowe
Professor of Electrical and Computer Engineering
Carnegie Mellon University

Machine learning methods, including deep neural networks (DNNs) and reinforcement learning (RL), are now widely used in a range of mission-critical and safety-critical infrastructure applications, in transportation, finance, healthcare, and cloud computing, among others. There is growing recognition that ML models are vulnerable to a range of adversarial attacks, including training-time attacks, test-time attacks, and system-level attacks. Adversarial analysis to find such vulnerabilities mostly operate at the component level, finding perturbations to inputs that produce incorrect or undesirable output. However, there is a pressing need to develop methods that find vulnerabilities that matter in the context of the overall system, its specification, and operating environment, not just at the component level. This team seeks to develop such semantic adversarial analysis techniques to find vulnerabilities that impact the semantics of the overall system. Our approach is based on formal methods and formally-guided simulation of ML-based systems. The research seeks to integrate these analysis methods into the ML design and maintenance pipeline, enabling verification-guided design of ML components such as deep neural networks to guarantee desired specifications, as well as run-time monitoring and assurance of their operation in the field. We will demonstrate our research for the adversarial analysis of ML-based infrastructure systems, such as in transportation, and for robust design of ML components. A key component of the project will be the integration of open-source tools Scenic and VerifAI with the C3 AI Suite, and leveraging C3 AI Suite computing platforms for scalable simulation and verification of ML systems.

Sanjit A. Seshia
Professor
University of California, Berkeley

Yasser Shoukry
Assistant Professor
University of California, Irvine

With the development of blockchain technology, decentralized finance (DeFi) has become an important player in the economy today, attracting hundreds of billions of dollars and enabling novel financial applications. However, the rapid growth of DeFi also leads to many security issues and huge attacks. In 2021 alone, DeFi attacks have caused close to $1B in financial loss. DeFi security has not received the amount of attention that matches this severity. This team will design and develop new techniques combining machine learning and security to build the first DeFi Intelligence Platform, an advanced security infrastructure to strengthen security in the fast-growing DeFi ecosystem. The proposed platform collects, fuses, and analyzes data from both off-chain natural language DeFi reports and on-chain account and transaction details. It gathers the intelligence in the DeFi space as a dynamic DeFi knowledge graph, brings new angles to solve many existing DeFi security problems, and powers new AI-based DeFi security applications.

Dawn Song
Professor of Computer Science
University of California, Berkeley

Daniel Klein
Professor
University of California, Berkeley

Christine Parlour
Professor, Haas School of Business
University of California, Berkeley

Jiantao Jiao
Assistant Professor
University of California, Berkeley

Bo Li
Assistant Professor
University of Illinois at Urbana-Champaign

Blockchains underpin the decentralization revolution in critical infrastructure, including the Internet. The revolution, most conspicuous as cryptocurrencies presently, is driven by broader blockchain technology and collected popularly under the rubric of Web3. This new infrastructure will be driven by users with cryptographic identities and incentives aligned to benefit both suppliers and consumers proportional to participation levels. Blockchain designs, although nascent, are rapidly being consolidated and the scientific basis solidified. However, a key missing component is the design of disincentives to deviating from prescribed protocol behavior: the same permission-free access blockchains provide to participants, also engenders deviant (and malicious) behavior. This team will study the fundamental limits and associated protocols of attributing identity to malicious actors, with cryptographic integrity to the extent possible. We refer to this study as blockchain forensics, and conduct this research both in the context of blockchain protocols and blockchain applications. The forensic “support” of blockchain protocols is a fundamental property of the underlying Byzantine fault tolerant (BFT) consensus protocol. We seek to characterize the forensic support of BFT protocols underlying five major central bank digital currency (CBDC) initiatives. Then we focus on Non-Fungible Token (NFT) marketplaces, a prominent digital collectible application revolutionized by blockchains; here, we seek to identify both malicious trades (e.g., wash trades) and attribute malicious actions. The research features novel machine learning that combines cryptographic traces and protocol specifications, together with data transcripts.

Pramod Viswanath
Professor
University of Illinois at Urbana-Champaign

This team will develop AI methods for detecting cyberthreats and attacks on computer systems and new AI foundations to support cybersecurity. The breakthroughs in machine learning over the past decade provide a great opportunity to put these to work to defend computer systems and critical infrastructure and create a need for new methods that enable organizations to collaborate to detect attacks and protect their data. We will advance these areas, through a team that melds expertise in computer security and AI.

David Wagner
Professor of Computer Science
University of California, Berkeley

Michael Jordan
Distinguished Professor
University of California, Berkeley

Vern Paxson
Professor Emeritus
University of California, Berkeley

Jacob Steinhardt
Assistant Professor
University of California, Berkeley

Nika Haghtalab
Assistant Professor
University of California, Berkeley

Wenke Lee
Professor
Georgia Institute of Technology

Determining fundamental bounds on robustness for machine learning algorithms is of critical importance for securing cyberinfrastructure. Machine learning is ubiquitous but prone to severe vulnerabilities, particularly at deployment. Adversarial modifications of inputs can induce misclassification, with catastrophic consequences in safety-critical systems. This team will develop a framework to obtain lower bounds on robustness for any supervised learning algorithm (classifier), when the data distribution and adversary are specified. The framework will work with a general class of distributions and adversaries, encompassing most proposed in prior work. It can be extended to get lower bounds on robustness for any pre-trained feature extractor or family of classifiers and for multiple attackers operating in tandem. Its implications for training and deploying robust models are numerous and consequential. Perhaps the most important is enabling algorithm designers to get a robustness score for either a specific classifier or a family of classifiers. For any adversary, they can compute this score as the gap to the optimal performance possible. The optimal performance is the equilibrium of a classification game between the adversary and classifiers. Robustness scores can also be determined for pre-trained feature extractors, widely used in transfer learning, enabling designers to pick robust feature extractors. Robust training can also be improved via byproducts of the framework, which enables the identification of hard points, provides optimal soft labels for use during training, and enables better architecture search for robustness by identifying model layers and hyperparameters that impact robustness.

Ben Zhao
Professor of Computer Science
University of Chicago

Daniel Cullina
Assistant Professor
Pennsylvania State University

For external-facing systems in real world settings, few if any security measures offer full protection against all attacks. In practice, digital forensics and incident response (DFIR) provide a complementary security tool that focuses on using post-attack evidence to trace back a successful attack to its root cause. Not only can forensic tools help identify (and patch) points of vulnerability responsible for successful attacks (e.g., breached servers, unreliable data-labeling services), but also provide a strong deterrent against future attackers with the threat of post-attack identification. This is particularly attractive for machine learning systems, where defenses are routinely broken soon after release by more powerful attacks. This team plans to build forensic tools to boost the security of deployed ML systems using post-attack analysis to identify key factors leading to a successful attack. We consider two broad types of attacks: “poison” attacks, where corrupted training data embeds misbehaviors into a model during training, and “inference-time” attacks, where an input is augmented by a model-specific adversarial perturbation. For poison attacks, we propose two complementary methods to identify the training data responsible for the misbehavior, one using selective unlearning and one using computation of the Shapley value from game theory. For inference time attacks, we will explore use of hidden labels to shift feature representations, making it possible to identify the source model of an adversarial example. Given promising early results, our goal is both a principled understanding of these approaches, and a suite of usable software tools.

Ben Zhao
Professor of Computer Science
University of Chicago

Haitao Zheng
Professor of Computer Science
University of Chicago

Bo Li
Assistant Professor
University of Illinois at Urbana-Champaign