adjust llama gpu params
This commit is contained in:
@@ -93,9 +93,9 @@ spec:
|
||||
|
||||
# performance tuning
|
||||
- "--ctx-size"
|
||||
- "32768"
|
||||
- "24576"
|
||||
- "--parallel"
|
||||
- "4"
|
||||
- "2"
|
||||
|
||||
# KV cache quantization
|
||||
- "--cache-type-k"
|
||||
|
||||
Reference in New Issue
Block a user