Open Ecosystem

Open tools, models, datasets, benchmarks, and infrastructure for AI safety evaluation and agent safety.

GitHub Organization Hugging Face Organization

Runtime Safety

Safety Models

Agent Infrastructure

Evaluation

Cyber Data

Core Runtime Security

AgentGuard

Attribute-based access control framework for tool-use LLM agents, with policy specification, runtime inspection, and auditing support.

View on GitHub Learn More

Agent Runtime Safety

Runtime safety tools that monitor, correct, authorize, and contain AI agent behavior before unsafe actions occur.

AgentGuard

Core

Attribute-based access control framework for tool-use LLM agents, with policy specification, runtime inspection, and auditing support.

GitHub

qise

Open Source

AI-first runtime security framework for AI agents, centered on multi-layer guards, SLM/LLM/rule checks, and fail-closed execution protection.

GitHub

Thought-Aligner

Open Source

Plug-and-play thought-level correction module for improving behavioral safety of tool-use agents before risky actions are executed.

GitHub HF arXiv

MirrorGuard

Open Source

Simulation-to-real reasoning-correction framework and VLM for safer computer-use agents operating over GUI environments.

GitHub HF Website arXiv

ReasoningShield

Open Source

Content-safety detection system for monitoring reasoning traces of large reasoning models.

GitHub

XuanwuBox

Coming Soon

Secure execution layer for agentic runtime environments, positioned as an AI security advisor inside Docker-style agent sandboxes.

GitHub

Guardrail & Safety Models

Lightweight safety models for thought correction, GUI-agent safety, intent modeling, trust estimation, and reasoning safety.

Thought-Aligner

Open Source

Plug-and-play thought-level correction module for improving behavioral safety of tool-use agents before risky actions are executed.

GitHub HF arXiv

MirrorGuard

Open Source

Simulation-to-real reasoning-correction framework and VLM for safer computer-use agents operating over GUI environments.

GitHub HF Website arXiv

IntentNet

Open Source

Fine-tuned model for evaluating whether an AI agent's reasoning contains deceptive, manipulative, or malicious intent in multi-turn interactions.

TrustNet

Open Source

Fine-tuned model for scoring a user's degree of trust in AI responses during multi-turn human-AI interactions.

HF arXiv

ReasoningShield

Open Source

Content-safety detection system for monitoring reasoning traces of large reasoning models.

GitHub

Agent Infrastructure

Frameworks, simulators, intermediate representations, and trajectory datasets for building and evaluating agentic systems.

qitos

Open Source

Torch-like, agent-native framework for researchers building reproducible LLM agents, harnesses, trajectories, and evaluation workflows.

GitHub

YOGA

Open Source

Yet Another General-purpose Agent: an extensible and modular generalist agent framework.

GitHub

Mirror-GUI

Open Source

LLM-based GUI simulator for synthesizing and evaluating agentic desktop interaction trajectories.

GitHub

agentir

Open Source

Compiler infrastructure for agentic trajectories, designed as an LLVM-style intermediate representation and conversion toolkit for agent traces.

GitHub HF (2)

Evaluation & Benchmarks

Frameworks for evaluation of frontier AI and agent safety.

snowl

Open Source

A safety evaluation framework for AI agents, designed to support agent safety benchmarks and risk evaluation workflows.

GitHub

snowl-evals

Open Source

Benchmark integration layer for snowl, connecting third-party agent safety and frontier-risk evaluations into a common workflow.

GitHub

LLMPentest

Active

Measurement and evaluation codebase for LLM-based penetration testing capability and behavior.

GitHub

NVWA Project

Active

Frontier AI safety research project focused on autonomy risk, silicon-based life emergence, proliferation, and control technologies.

GitHub Website

Cybersecurity Data

Large-scale cybersecurity datasets and pipelines for cyber agent training.

cyberhunter

Active

Cybersecurity corpus mining and filtering pipeline for extracting high-quality cyber training data from large web corpora.

GitHub HF (3)

CyberSecurity-100B

Open Source

Large quality-filtered bilingual cybersecurity corpus for continual pre-training, with cyber relevance scoring, topic labels, code-aware splits, and structured metadata.

GitHub HF

CyberSecurity-1M

Open Source

Curated 1.19M-record cybersecurity knowledge dataset covering vulnerabilities, threat intelligence, incident response, security tools, CTF, frameworks, and Chinese security content.

GitHub HF

CyberRepo-10K

Open Source

Dataset of 7,670 real-world vulnerability audit tasks with verified GitHub repositories, fix commits, patch diffs, and vulnerable code checkouts.

GitHub HF

CyberTrainer collection

Active

Hugging Face collection grouping CyberSecurity-1M, CyberSecurity-100B, and CyberRepo-10K as the data foundation for cyber model training.

Contribute to Open Ecosystem

We welcome contributions to our open-source safety infrastructure. Check our repositories on GitHub and models on Hugging Face.

Explore on GitHub Hugging Face