adjust llama gpu params
This commit is contained in:
@@ -93,9 +93,9 @@ spec:
|
|||||||
|
|
||||||
# performance tuning
|
# performance tuning
|
||||||
- "--ctx-size"
|
- "--ctx-size"
|
||||||
- "32768"
|
- "24576"
|
||||||
- "--parallel"
|
- "--parallel"
|
||||||
- "4"
|
- "2"
|
||||||
|
|
||||||
# KV cache quantization
|
# KV cache quantization
|
||||||
- "--cache-type-k"
|
- "--cache-type-k"
|
||||||
|
|||||||
Reference in New Issue
Block a user