AI/ML

AI Alignment

Também chamado de:AI Safety

📖O que é

The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.

Sua exploração

0 termos visitados no total

Termos relacionados explorados0/3

Termos Relacionados

RLHF (Reinforcement Learning from Human Feedback)AI/ML

A training technique that aligns LLM outputs with human preferences. Process: (1) train a …

Ver termo →

Constitutional AIAI/ML

An alignment technique developed by Anthropic where an AI model is guided by a 'constituti…

Ver termo →

DPO (Direct Preference Optimization)AI/ML

A simplified alternative to RLHF that aligns LLM outputs with human preferences without tr…

Ver termo →

Voltar ao glossário