AlignADV

From Attacks to Curricula:
Learnability-Guided Adversarial Training
for Safe Autonomous Driving

Yuewen Mei1, Tong Nie1,2, Jie Sun1, Haotian Shi1, Wei Ma2, Jian Sun1

1Tongji University    2The Hong Kong Polytechnic University

AlignADV shifts closed-loop adversarial training from a attack-oriented paradigm to a learnability-guided curriculum loop with resolvability-aligned scenario generation and capability-aligned scenario sampling.

Abstract

Closed-loop adversarial training has emerged as a vital paradigm for enhancing the safety of autonomous driving policies by enabling them to learn from rare safety-critical scenarios. Standard training pipelines typically generate adversarial scenarios first, then sample them for policy optimization. However, most existing frameworks remain attack-oriented. Driven primarily by collision maximization, current generators often synthesize practically unsolvable extreme situations, thereby degrading the learning process. Furthermore, conventional heuristic and simplified sampling strategies ignore the continuously evolving capability of the driving policy, leading to sample inefficiency and delayed convergence. To overcome these limitations, we propose AlignADV, a learnability-guided closed-loop adversarial training framework designed to convert adversarial scenarios into resolvable and capability-aligned curricula. First, we reformulate adversarial scenario generation as a preference alignment problem and employ direct preference optimization to guide the generator toward critical yet resolvable scenarios. Second, we introduce the concept of behavioral fingerprint to extract the intrinsic characteristics of the evolving policy and construct a multi-modal capability prediction model that accurately evaluates policy performance without expensive simulations. By combining the resolvability-aligned scenario set with these capability predictions, we develop a dynamic curriculum sampling mechanism that prioritizes scenarios targeting the exact vulnerabilities of the current policy. Comprehensive experiments using the Waymo Open Motion Dataset demonstrate that AlignADV significantly improves both convergence efficiency and final performance, reducing training steps by up to 40.6% compared to baseline methods, while reducing collision rate and improving route completion rate in both normal and adversarial traffic conditions. These results highlight a shift from attack-oriented scenario generation to learnability-guided policy improvement, offering a principled direction for safer and more efficient autonomous driving training.

Problem Background

The efficacy of adversarial training depends on two key factors: what scenarios to generate and how to use these scenarios to maximize learning efficiency and policy performance.

Research motivation and AlignADV framework
Overview of the research motivation and AlignADV framework.

Two bottlenecks in attack-oriented adversarial training

  • Practically unsolvable adversarial scenarios. Collision-maximizing generators can synthesize extreme situations where no reasonable collision-avoidance response exists, which weakens learning feedback and may induce overly conservative behavior.
  • Capability-mismatched scenario sampling. Heuristic or simplified sampling strategies ignore the continuously evolving capability of the driving policy, making sampled scenarios too easy for the current policy and reducing training efficiency.

Method Overview

The primary objective of AlignADV is to shift from the conventional unidirectional attack paradigm to learnability-guided curriculum loop. In this work, learnability serves as an operational principle that guides AlignADV to jointly pursue resolvability-aligned scenario generation and capability-aligned scenario sampling.

Overview of AlignADV
Overview of AlignADV.
  • Resolvable Adversarial Scenario Generation fine-tunes a pretrained adversarial generator with expert-evaluated preference pairs and Direct Preference Optimization, guiding generated scenarios toward critical yet resolvable interactions.
  • Dynamic Policy Capability Prediction represents the evolving policy with behavioral fingerprints and predicts scenario-specific success probabilities before expensive simulation.
  • Capability-Aligned Curriculum Sampling combines the resolvable scenario library with predicted policy capabilities to construct a dynamic curriculum distribution targeting the current policy's predicted vulnerabilities.

Results

Experiments are conducted in MetaDrive by reconstructing real-world traffic logs from the Waymo Open Motion Dataset. The evaluation verifies scenario solvability, behavioral fingerprints and capability prediction, and closed-loop adversarial training performance.

40.6%
training steps saved compared to vanilla adversarial training
99.87%
solvability rate after preference optimization
0.885
precision for the top 10% of negative predictions
26%
relative reduction in crash rate compared with vanilla adversarial training

Scenario solvability

The fine-tuned generator sharply reduces unsolvable cases while preserving criticality and behavioral diversity.

Model Unsolvable Scenarios ↓ Solvability Rate ↑ Average TTC ↓ APD ↑ FPD ↑ Variance ↑
Pre-trained60696.05%0.582 s2.771 m8.369 m2.696
Fine-tuned2099.87%0.577 s2.709 m8.284 m2.779

Behavioral fingerprints and capability prediction

Evolution of behavioral fingerprints form a continuous trajectory across rapid learning, exploration, and convergence, reflecting the evolving competence of the policy. The capability predictor further combines these fingerprints with local scenario context to estimate success probabilities before simulation, reaching 0.885 precision when identifying the top 10% most challenging cases.

Closed-loop adversarial training

Normal scenarios

Method Crash Rate ↓ Route Comp. ↑ Cost ↓ Reward ↑ Steps Saved ↑
Replay18.65% ± 2.70%73.69% ± 1.99%0.482 ± 0.02649.01 ± 1.99--
Vanilla Adv15.95% ± 1.30%75.73% ± 1.53%0.461 ± 0.00350.68 ± 1.84Baseline
Solvable Adv14.79% ± 2.57%77.84% ± 1.14%0.432 ± 0.02153.43 ± 1.5327.1%
Static14.70% ± 2.32%74.47% ± 2.44%0.468 ± 0.04350.45 ± 2.737.7%
History14.85% ± 1.07%77.13% ± 0.69%0.434 ± 0.02451.90 ± 1.073.9%
Ours11.80% ± 1.78%79.30% ± 1.54%0.404 ± 0.03654.15 ± 1.3640.6%

Adversarial scenarios

Method Crash Rate ↓ Route Comp. ↑ Cost ↓ Reward ↑ Steps Saved ↑
Replay40.31% ± 1.49%65.77% ± 1.05%0.653 ± 0.00842.23 ± 1.30--
Vanilla Adv35.77% ± 1.95%67.68% ± 2.39%0.634 ± 0.02343.16 ± 1.96Baseline
Solvable Adv33.27% ± 1.34%70.10% ± 1.54%0.601 ± 0.01745.39 ± 1.575.8%
Static35.50% ± 4.37%66.21% ± 3.69%0.639 ± 0.03042.70 ± 2.580.0%
History34.97% ± 1.85%69.22% ± 1.31%0.609 ± 0.02744.42 ± 1.3017.4%
Ours31.37% ± 3.51%71.66% ± 1.93%0.588 ± 0.02646.24 ± 1.9130.9%
Overall, compared with the adversarial training baseline, AlignADV uses only 59% of the training steps to reach the established performance thresholds and delivers a 26% relative reduction in crash rate.

BibTeX

@article{mei2026alignadv,
  title={From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving},
  author={Mei, Yuewen and Nie, Tong and Sun, Jie and Shi, Haotian and Ma, Wei and Sun, Jian},
  year={2026}
}