AI/ML

Quantization (AI)

Também chamado de:GGUFGPTQAWQ

📖O que é

A model compression technique that reduces weight precision (e.g., from 16-bit to 4-bit) to decrease model size and inference cost while preserving most quality. Three dominant formats in 2024-2025: GGUF (flexible CPU/GPU format for llama.cpp), GPTQ (GPU-optimized post-training quantization), and AWQ (activation-aware weight quantization). All keep quality within ~6% of full-precision at 4-bit.

Sua exploração

0 termos visitados no total

Termos relacionados explorados0/3

Termos Relacionados

InferenceAI/ML

The process of running a trained model on new inputs to generate predictions or outputs. I…

Ver termo →

Open-Source AI ModelsAI/ML

AI models with publicly released weights that can be downloaded, modified, and self-hosted…

Ver termo →

Knowledge DistillationAI/ML

A technique for transferring capabilities from a large 'teacher' model to a smaller 'stude…

Ver termo →

Voltar ao glossário