Minor edits
This commit is contained in:
parent
2902e34256
commit
564e75b824
1 changed files with 1 additions and 1 deletions
|
|
@ -188,7 +188,7 @@ At any time during a chat, you can reset the model with `/clear`, and you can le
|
||||||
|
|
||||||
We can see that the `gemma3` model has nearly one billion parameters and a context length of 32,768! The *embedding length* is 1152. This is the equivalent to `n_embd` in `nanoGPT`. It is the size of the embedding vector space.
|
We can see that the `gemma3` model has nearly one billion parameters and a context length of 32,768! The *embedding length* is 1152. This is the equivalent to `n_embd` in `nanoGPT`. It is the size of the embedding vector space.
|
||||||
|
|
||||||
Above, we also see that the quantization is only four bits, but it is a little more complicated than representing numbers with just sixteen values. The `K` and `M` refer to optimizations — first is the "K-block" quantization method, which refers to a groupwise quantization scheme where weights are grouped into blocks (e.g., 32 or 64 values), and each group gets its own scale and offset for better accuracy. `M` refers to a variant of `Q4_K` that applies an alternate encoding or layout for better memory access patterns or inference performance on certain hardware. `Q4_K` is a common choice for quantization when running 7B–70B models on laptop or desktop computers. (That's $10^6$–$10^7$ times more parameters than our first `nanoGPT` model!)
|
Above, we also see that the quantization is only four bits, but it is a little more complicated than representing numbers with just sixteen values. The `K` and `M` refer to optimizations — first is the "K-block" quantization method, which refers to a groupwise quantization scheme where weights are grouped into blocks (e.g., 32 or 64 values), and each group gets its own scale and offset for better accuracy. `M` refers to a variant of `Q4_K` that applies an alternate encoding or layout for better memory access patterns or inference performance on certain hardware. `Q4_K` is a common choice for quantization when running 7B–70B models on laptop or desktop computers. (That's $10^6$ – $10^7$ times more parameters than our first `nanoGPT` model!)
|
||||||
|
|
||||||
With the `/set verbose` command, you can monitor the model performance:
|
With the `/set verbose` command, you can monitor the model performance:
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue