Whitzard 白泽

Open Ecosystem

Open tools, models, datasets, benchmarks, and infrastructure for AI safety evaluation and agent safety.

6
Runtime Safety
5
Safety Models
4
Agent Infrastructure
4
Evaluation
5
Cyber Data
Core Runtime Security

AgentGuard

Attribute-based access control framework for tool-use LLM agents, with policy specification, runtime inspection, and auditing support.

Agent Runtime Safety

Runtime safety tools that monitor, correct, authorize, and contain AI agent behavior before unsafe actions occur.

AgentGuard

Core

Attribute-based access control framework for tool-use LLM agents, with policy specification, runtime inspection, and auditing support.

qise

Open Source

AI-first runtime security framework for AI agents, centered on multi-layer guards, SLM/LLM/rule checks, and fail-closed execution protection.

Thought-Aligner

Open Source

Plug-and-play thought-level correction module for improving behavioral safety of tool-use agents before risky actions are executed.

MirrorGuard

Open Source

Simulation-to-real reasoning-correction framework and VLM for safer computer-use agents operating over GUI environments.

ReasoningShield

Open Source

Content-safety detection system for monitoring reasoning traces of large reasoning models.

XuanwuBox

Coming Soon

Secure execution layer for agentic runtime environments, positioned as an AI security advisor inside Docker-style agent sandboxes.

Guardrail & Safety Models

Lightweight safety models for thought correction, GUI-agent safety, intent modeling, trust estimation, and reasoning safety.

Thought-Aligner

Open Source

Plug-and-play thought-level correction module for improving behavioral safety of tool-use agents before risky actions are executed.

MirrorGuard

Open Source

Simulation-to-real reasoning-correction framework and VLM for safer computer-use agents operating over GUI environments.

IntentNet

Open Source

Fine-tuned model for evaluating whether an AI agent's reasoning contains deceptive, manipulative, or malicious intent in multi-turn interactions.

TrustNet

Open Source

Fine-tuned model for scoring a user's degree of trust in AI responses during multi-turn human-AI interactions.

ReasoningShield

Open Source

Content-safety detection system for monitoring reasoning traces of large reasoning models.

Agent Infrastructure

Frameworks, simulators, intermediate representations, and trajectory datasets for building and evaluating agentic systems.

qitos

Open Source

Torch-like, agent-native framework for researchers building reproducible LLM agents, harnesses, trajectories, and evaluation workflows.

YOGA

Open Source

Yet Another General-purpose Agent: an extensible and modular generalist agent framework.

Mirror-GUI

Open Source

LLM-based GUI simulator for synthesizing and evaluating agentic desktop interaction trajectories.

agentir

Open Source

Compiler infrastructure for agentic trajectories, designed as an LLVM-style intermediate representation and conversion toolkit for agent traces.

Evaluation & Benchmarks

Frameworks for evaluation of frontier AI and agent safety.

snowl

Open Source

A safety evaluation framework for AI agents, designed to support agent safety benchmarks and risk evaluation workflows.

snowl-evals

Open Source

Benchmark integration layer for snowl, connecting third-party agent safety and frontier-risk evaluations into a common workflow.

LLMPentest

Active

Measurement and evaluation codebase for LLM-based penetration testing capability and behavior.

NVWA Project

Active

Frontier AI safety research project focused on autonomy risk, silicon-based life emergence, proliferation, and control technologies.

Cybersecurity Data

Large-scale cybersecurity datasets and pipelines for cyber agent training.

cyberhunter

Active

Cybersecurity corpus mining and filtering pipeline for extracting high-quality cyber training data from large web corpora.

CyberSecurity-100B

Open Source

Large quality-filtered bilingual cybersecurity corpus for continual pre-training, with cyber relevance scoring, topic labels, code-aware splits, and structured metadata.

CyberSecurity-1M

Open Source

Curated 1.19M-record cybersecurity knowledge dataset covering vulnerabilities, threat intelligence, incident response, security tools, CTF, frameworks, and Chinese security content.

CyberRepo-10K

Open Source

Dataset of 7,670 real-world vulnerability audit tasks with verified GitHub repositories, fix commits, patch diffs, and vulnerable code checkouts.

CyberTrainer collection

Active

Hugging Face collection grouping CyberSecurity-1M, CyberSecurity-100B, and CyberRepo-10K as the data foundation for cyber model training.

Contribute to Open Ecosystem

We welcome contributions to our open-source safety infrastructure. Check our repositories on GitHub and models on Hugging Face.