Nuwa 女娲
Frontier AI Safety Lab
Transparent, third-party, open infrastructure for frontier AI safety evaluation and governance.
Nuwa studies frontier AI risks, agent safety, autonomy risks, deception, scheming, and loss-of-control. Supported by Whitzard, Nuwa develops open evaluation frameworks, benchmarks, technical notes, and governance evidence for safe and controllable AI.
Supported by Whitzard
Repair
Identify and fix safety gaps
Boundary
Define safe operational limits
Prudence
Cautious evaluation of risks
Guided
Evidence-based governance
Transparent Third-Party Evaluation
Nuwa's approach to independent frontier AI safety evaluation and governance evidence.
Transparent
Public research notes, open evaluation assumptions, and clear evidence standards.
Third-Party
Independent safety evaluation and governance evidence for frontier AI systems.
Open Infrastructure
Benchmarks, tools, datasets, and frameworks that support broader AI safety research.
Governance-Oriented
Technical evidence that can support responsible AI governance and public accountability.
Research Themes
Core themes driving our frontier AI safety research.
Autonomy Risks
Self-replication, self-proliferation, and autonomous resource acquisition in frontier AI.
Deception
AI deception in human-AI interaction, developer-facing deception, and trust dynamics.
Scheming
Evaluation faking, situational awareness, and observer effects in safety evaluations.
Loss-of-Control
Conditions under which AI systems may evade oversight or resist shutdown.
Agent Safety
Behavioral safety, thought correction, and guardrails for AI agent systems.
Evaluation
Executable risk-evaluation environments, benchmarks, and evaluation integrity.
Governance Evidence
Technical evidence supporting responsible AI governance and public accountability.
Featured Research
arXiv preprint · 2024
Frontier AI systems have surpassed the self-replicating red line
Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang
Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.
arXiv preprint; work in progress · 2025
Large language model-powered AI systems achieve self-replication with no human intervention
Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang
Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.
Nüwa Project preprint · 2026
One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation
Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang
Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.
arXiv preprint · 2025
Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing
Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang
Introduces TermiBench and TermiAgent to evaluate and improve autonomous penetration-testing agents in realistic shell-acquisition settings.
Publications & Papers
View all →arXiv preprint · 2024
Frontier AI systems have surpassed the self-replicating red line
Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang
Evaluates whether frontier AI systems can autonomously self-replicate and reports successful self-replication in controlled trials.
arXiv preprint; work in progress · 2025
Large language model-powered AI systems achieve self-replication with no human intervention
Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang
Extends self-replication evaluation across 32 AI systems and reports autonomous replication, self-exfiltration, adaptation, and shutdown-survival behaviors.
Nüwa Project preprint · 2026
One Step from Silicon Life: Autonomous AI Agents Capable of Uncontrolled Self-Proliferation
Geng Hong, Xudong Pan, Jiarun Dai, Jiaqi Luo, Wuyuao Mai, Min Yang
Demonstrates autonomous agents acquiring external computational resources and propagating across remote devices under controlled, simulated real-world conditions.
arXiv preprint · 2025
Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing
Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang
Introduces TermiBench and TermiAgent to evaluate and improve autonomous penetration-testing agents in realistic shell-acquisition settings.
arXiv preprint · 2026
CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly
Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan, Jiarun Dai, Geng Hong, Min Yang
Proposes a cybersecurity agent framework that iteratively revises its own scaffold from failed attempts to adapt across targets and failure modes.
Follow Nuwa's research updates on frontier AI risk, agent safety, and controllable AI.
We collaborate on frontier AI safety evaluation, agent safety research, and runtime defense.