AI/ML
Quantization (AI)
Também chamado de:GGUFGPTQAWQ
📖O que é
A model compression technique that reduces weight precision (e.g., from 16-bit to 4-bit) to decrease model size and inference cost while preserving most quality. Three dominant formats in 2024-2025: GGUF (flexible CPU/GPU format for llama.cpp), GPTQ (GPU-optimized post-training quantization), and AWQ (activation-aware weight quantization). All keep quality within ~6% of full-precision at 4-bit.
Sua exploração
0 termos visitados no totalTermos relacionados explorados0/3
Termos Relacionados
InferenceAI/ML
The process of running a trained model on new inputs to generate predictions or outputs. I…
Ver termo →Open-Source AI ModelsAI/ML
AI models with publicly released weights that can be downloaded, modified, and self-hosted…
Ver termo →Knowledge DistillationAI/ML
A technique for transferring capabilities from a large 'teacher' model to a smaller 'stude…
Ver termo →