AI/ML
Compartilhar

Quantization (AI)

Também chamado de:GGUFGPTQAWQ
📖O que é

A model compression technique that reduces weight precision (e.g., from 16-bit to 4-bit) to decrease model size and inference cost while preserving most quality. Three dominant formats in 2024-2025: GGUF (flexible CPU/GPU format for llama.cpp), GPTQ (GPU-optimized post-training quantization), and AWQ (activation-aware weight quantization). All keep quality within ~6% of full-precision at 4-bit.

Sua exploração

0 termos visitados no total
Termos relacionados explorados0/3

Termos Relacionados