Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iphone 12 pro, ipad and iphone 15 or some other device this llama.rn pakage has issue. Even the device has latest os and enough gpu to run the model. #102

Open
PratikBhatti83 opened this issue Dec 29, 2024 · 15 comments
Labels
help wanted Extra attention is needed

Comments

@PratikBhatti83
Copy link

When using the llama.rn package, the initLlama function fails to initialize, throwing a "Failed to load model" error.
Screenshot 2024-12-29 at 1 41 18 PM

@PratikBhatti83 PratikBhatti83 changed the title Iphone 12 pro, ipad and iphone 15 or some other device this llama.rn pakage has issue. Iphone 12 pro, ipad and iphone 15 or some other device this llama.rn pakage has issue. Even the device has lates os and enough gpu to run the model. Dec 29, 2024
@PratikBhatti83 PratikBhatti83 changed the title Iphone 12 pro, ipad and iphone 15 or some other device this llama.rn pakage has issue. Even the device has lates os and enough gpu to run the model. Iphone 12 pro, ipad and iphone 15 or some other device this llama.rn pakage has issue. Even the device has latest os and enough gpu to run the model. Dec 29, 2024
@PratikBhatti83
Copy link
Author

@jhen0409 @a-ghorbani is there any solution for this? could you please help me to solve this problem

@jhen0409
Copy link
Member

jhen0409 commented Jan 2, 2025

I can't figure the cause based on the error message. Could you provide more information, like Xcode logs or what models are you using?

@PratikBhatti83
Copy link
Author

PratikBhatti83 commented Jan 2, 2025

@jhen0409 i used Llama-3.2-1B-Instruct-Q4_K_M.gguf this model!

Xcode log is like below error only

Context initialization failed: Error: Failed to load model.

Meanwhile on other device like Iphone 13, Iphone 14 Pro I can able to run this model. but some of the device it's failed to load model while intiazation. i think you should update lllama.cpp version or sync the latest code because I checked llama.cpp doesn't have this type of initialization issue.

@PratikBhatti83
Copy link
Author

@jhen0409 is there any update for this? please help me to solve this issue!

@jhen0409
Copy link
Member

jhen0409 commented Jan 6, 2025

I can update llama.cpp tomorrow, but not sure if it will help you.

You should be able to get Xcode logs like this:

Logs: Load Llama-3.2-1B-Instruct-Q4_K_M.gguf (success)
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from <PATH>
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 16
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  18:                          general.file_type u32              = 15
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q4_K:   96 tensors
llama_model_loader: - type q6_K:   17 tensors
llm_load_vocab: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
llm_load_vocab: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
llm_load_vocab: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
llm_load_vocab: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
llm_load_vocab: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
llm_load_vocab: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
llm_load_vocab: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
llm_load_vocab: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
llm_load_vocab: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
llm_load_vocab: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
llm_load_vocab: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
llm_load_vocab: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
llm_load_vocab: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
llm_load_vocab: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
llm_load_vocab: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
llm_load_vocab: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
llm_load_vocab: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
llm_load_vocab: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
llm_load_vocab: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
llm_load_vocab: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
llm_load_vocab: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
llm_load_vocab: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
llm_load_vocab: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
llm_load_vocab: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
llm_load_vocab: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
llm_load_vocab: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
llm_load_vocab: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
llm_load_vocab: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
llm_load_vocab: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
llm_load_vocab: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
llm_load_vocab: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
llm_load_vocab: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
llm_load_vocab: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
llm_load_vocab: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
llm_load_vocab: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
llm_load_vocab: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
llm_load_vocab: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
llm_load_vocab: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
llm_load_vocab: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
llm_load_vocab: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
llm_load_vocab: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
llm_load_vocab: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
llm_load_vocab: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
llm_load_vocab: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
llm_load_vocab: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
llm_load_vocab: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
llm_load_vocab: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
llm_load_vocab: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
llm_load_vocab: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
llm_load_vocab: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
llm_load_vocab: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
llm_load_vocab: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
llm_load_vocab: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
llm_load_vocab: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
llm_load_vocab: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
llm_load_vocab: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
llm_load_vocab: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
llm_load_vocab: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
llm_load_vocab: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
llm_load_vocab: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
llm_load_vocab: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
llm_load_vocab: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
llm_load_vocab: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
llm_load_vocab: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
llm_load_vocab: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
llm_load_vocab: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
llm_load_vocab: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
llm_load_vocab: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
llm_load_vocab: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
llm_load_vocab: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
llm_load_vocab: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
llm_load_vocab: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
llm_load_vocab: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
llm_load_vocab: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
llm_load_vocab: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
llm_load_vocab: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
llm_load_vocab: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
llm_load_vocab: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
llm_load_vocab: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
llm_load_vocab: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
llm_load_vocab: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
llm_load_vocab: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
llm_load_vocab: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
llm_load_vocab: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
llm_load_vocab: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
llm_load_vocab: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
llm_load_vocab: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
llm_load_vocab: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
llm_load_vocab: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
llm_load_vocab: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
llm_load_vocab: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
llm_load_vocab: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
llm_load_vocab: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
llm_load_vocab: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
llm_load_vocab: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
llm_load_vocab: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
llm_load_vocab: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
llm_load_vocab: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
llm_load_vocab: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
llm_load_vocab: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
llm_load_vocab: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
llm_load_vocab: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
llm_load_vocab: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
llm_load_vocab: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
llm_load_vocab: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
llm_load_vocab: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
llm_load_vocab: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
llm_load_vocab: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
llm_load_vocab: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
llm_load_vocab: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
llm_load_vocab: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
llm_load_vocab: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
llm_load_vocab: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
llm_load_vocab: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
llm_load_vocab: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
llm_load_vocab: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
llm_load_vocab: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
llm_load_vocab: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
llm_load_vocab: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
llm_load_vocab: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
llm_load_vocab: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
llm_load_vocab: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
llm_load_vocab: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
llm_load_vocab: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
llm_load_vocab: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
llm_load_vocab: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
llm_load_vocab: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
llm_load_vocab: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
llm_load_vocab: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
llm_load_vocab: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
llm_load_vocab: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
llm_load_vocab: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
llm_load_vocab: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
llm_load_vocab: control token: 128010 '<|python_tag|>' is not marked as EOG
llm_load_vocab: control token: 128006 '<|start_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
llm_load_vocab: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
llm_load_vocab: control token: 128000 '<|begin_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
llm_load_vocab: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
llm_load_vocab: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
llm_load_vocab: control token: 128007 '<|end_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
llm_load_vocab: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
llm_load_vocab: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
llm_load_vocab: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
llm_load_vocab: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
llm_load_vocab: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
llm_load_vocab: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
llm_load_vocab: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
llm_load_vocab: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
llm_load_vocab: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
llm_load_vocab: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
llm_load_vocab: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
llm_load_vocab: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
llm_load_vocab: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
llm_load_vocab: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
llm_load_vocab: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
llm_load_vocab: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
llm_load_vocab: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
llm_load_vocab: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
llm_load_vocab: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
llm_load_vocab: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
llm_load_vocab: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
llm_load_vocab: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
llm_load_vocab: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
llm_load_vocab: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
llm_load_vocab: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
llm_load_vocab: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
llm_load_vocab: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
llm_load_vocab: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
llm_load_vocab: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
llm_load_vocab: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
llm_load_vocab: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
llm_load_vocab: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
llm_load_vocab: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
llm_load_vocab: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
llm_load_vocab: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
llm_load_vocab: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
llm_load_vocab: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
llm_load_vocab: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
llm_load_vocab: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
llm_load_vocab: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
llm_load_vocab: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
llm_load_vocab: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
llm_load_vocab: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
llm_load_vocab: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
llm_load_vocab: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
llm_load_vocab: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
llm_load_vocab: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
llm_load_vocab: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
llm_load_vocab: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
llm_load_vocab: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
llm_load_vocab: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
llm_load_vocab: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
llm_load_vocab: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
llm_load_vocab: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
llm_load_vocab: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
llm_load_vocab: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
llm_load_vocab: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
llm_load_vocab: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
llm_load_vocab: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
llm_load_vocab: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
llm_load_vocab: control token: 128001 '<|end_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
llm_load_vocab: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
llm_load_vocab: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
llm_load_vocab: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
llm_load_vocab: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
llm_load_vocab: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
llm_load_vocab: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
llm_load_vocab: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
llm_load_vocab: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
llm_load_vocab: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
llm_load_vocab: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
llm_load_vocab: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
llm_load_vocab: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
llm_load_vocab: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
llm_load_vocab: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
llm_load_vocab: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
llm_load_vocab: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
llm_load_vocab: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
llm_load_vocab: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
llm_load_vocab: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
llm_load_vocab: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
llm_load_vocab: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
llm_load_vocab: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
llm_load_vocab: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
llm_load_vocab: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
llm_load_vocab: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
llm_load_vocab: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
llm_load_vocab: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
llm_load_vocab: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
llm_load_vocab: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
llm_load_vocab: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
llm_load_vocab: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
llm_load_vocab: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
llm_load_vocab: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
llm_load_vocab: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
llm_load_vocab: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
llm_load_vocab: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
llm_load_vocab: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
llm_load_vocab: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
llm_load_vocab: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
llm_load_vocab: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
llm_load_vocab: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
llm_load_vocab: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
llm_load_vocab: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
llm_load_vocab: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
llm_load_vocab: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
llm_load_vocab: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
llm_load_vocab: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
llm_load_vocab: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
llm_load_vocab: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 16
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 1.24 B
llm_load_print_meta: model size       = 762.81 MiB (5.18 BPW) 
llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors:   CPU_Mapped model buffer size =   762.81 MiB
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 1024
llama_new_context_with_model: n_ctx_per_seq = 1024
llama_new_context_with_model: n_batch       = 256
llama_new_context_with_model: n_ubatch      = 256
llama_new_context_with_model: flash_attn    = 1
llama_new_context_with_model: freq_base     = 500000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (1024) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 1024, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 16
llama_kv_cache_init: layer 0: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 1: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 2: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 3: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 4: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 5: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 6: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 7: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 8: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 9: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 10: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 11: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 12: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 13: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 14: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init: layer 15: n_embd_k_gqa = 512, n_embd_v_gqa = 512
llama_kv_cache_init:        CPU KV buffer size =    32.00 MiB
llama_new_context_with_model: KV self size  =   32.00 MiB, K (f16):   16.00 MiB, V (f16):   16.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   127.25 MiB
llama_new_context_with_model: graph nodes  = 455
llama_new_context_with_model: graph splits = 1
common_init_from_params: setting dry_penalty_last_n to ctx_size = 1024
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

If it fails, there should be some logs that can help us find the reason.

@PratikBhatti83
Copy link
Author

PratikBhatti83 commented Jan 6, 2025

@jhen0409 let me share the logs

LOG Model info (took 64ms): {"alignment": 32, "data_offset": 7831552, "general.architecture": "llama", "general.basename": "Llama-3.2", "general.file_type": "27", "general.finetune": "Instruct", "general.languages": "["en", "de", "fr", "it", "pt", "hi", "es", "th"]", "general.license": "llama3.2", "general.name": "Llama 3.2 1B Instruct", "general.quantization_version": "2", "general.size_label": "1B", "general.tags": "["facebook", "meta", "pytorch", "llama", "llama-3", "text-generation"]", "general.type": "model", "llama.attention.head_count": "32", "llama.attention.head_count_kv": "8", "llama.attention.key_length": "64", "llama.attention.layer_norm_rms_epsilon": "0.000010", "llama.attention.value_length": "64", "llama.block_count": "16", "llama.context_length": "131072", "llama.embedding_length": "2048", "llama.feed_forward_length": "8192", "llama.rope.dimension_count": "64", "llama.rope.freq_base": "500000.000000", "llama.vocab_size": "128256", "quantize.imatrix.chunks_count": "125", "quantize.imatrix.dataset": "/training_dir/calibration_datav3.txt", "quantize.imatrix.entries_count": "112", "quantize.imatrix.file": "/models_out/Llama-3.2-1B-Instruct-GGUF/Llama-3.2-1B-Instruct.imatrix", "tokenizer.chat_template": "{{- bos_token }}
{%- if custom_tools is defined %}
{%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
{%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
{%- if strftime_now is defined %}
{%- set date_string = strftime_now("%d %b %Y") %}
{%- else %}
{%- set date_string = "26 Jul 2024" %}
{%- endif %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "" %}
{%- endif %}

{#- System message #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if tools is not none %}
{{- "Environment: ipython\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "\n\n" }}
{%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
{#- Extract the first user message so we can plug it in here #}
{%- if messages | length != 0 %}
{%- set first_user_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
{{- "Given the following functions, please respond with a JSON for a function call " }}
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "\n\n" }}
{%- endfor %}
{{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
{%- elif 'tool_calls' in message %}
{%- if not message.tool_calls|length == 1 %}
{{- raise_exception("This model only supports single tool-calls at once!") }}
{%- endif %}
{%- set tool_call = message.tool_calls[0].function %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
{{- '{"name": "' + tool_call.name + '", ' }}
{{- '"parameters": ' }}
{{- tool_call.arguments | tojson }}
{{- "}" }}
{{- "<|eot_id|>" }}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
{%- if message.content is mapping or message.content is iterable %}
{{- message.content | tojson }}
{%- else %}
{{- message.content }}
{%- endif %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}
", "tokenizer.ggml.bos_token_id": "128000", "tokenizer.ggml.eos_token_id": "128009", "tokenizer.ggml.model": "gpt2", "tokenizer.ggml.pre": "llama-bpe", "version": 3}``

@PratikBhatti83
Copy link
Author

@jhen0409 is there any update? please check my logs and let me know the solution.

@PratikBhatti83
Copy link
Author

@jhen0409 is there any update? please help me to solve this issue!

@jhen0409
Copy link
Member

Can you try v0.4.8?

let me share the logs: [...logs]

This looks like logs from JS, not Xcode logs.

@PratikBhatti83
Copy link
Author

bro still having same issue

@jhen0409 jhen0409 added the help wanted Extra attention is needed label Jan 24, 2025
@CodeWithKyrian
Copy link

Same issue here. iPhone 12 Pro Max. Works well on simulator but not on device. Here are the Xcode logs

llama_model_loader: loaded meta data with 27 key-value pairs and 339 tensors from /private/var/mobile/Containers/Data/Application/3A3413E0-B41D-4517-88CC-3A8AD73473DD/tmp/com.anonymous.tutorwise-Inbox/DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
llama_model_loader: - kv   3:                       general.organization str              = Deepseek Ai
llama_model_loader: - kv   4:                           general.basename str              = DeepSeek-R1-Distill-Qwen
llama_model_loader: - kv   5:                         general.size_label str              = 1.5B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 1536
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 8960
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 12
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = deepseek-r1-qwen
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151654
llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                          general.file_type u32              = 17
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q5_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q5_K - Medium
print_info: file size   = 1.19 GiB (5.76 BPW) 
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151646 '<|begin▁of▁sentence|>' is not marked as EOG
load: control token: 151644 '<|User|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: control token: 151647 '<|EOT|>' is not marked as EOG
load: control token: 151643 '<|end▁of▁sentence|>' is not marked as EOG
load: control token: 151645 '<|Assistant|>' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 1536
print_info: n_layer          = 28
print_info: n_head           = 12
print_info: n_head_kv        = 2
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 6
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 8960
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1.5B
print_info: model params     = 1.78 B
print_info: general.name     = DeepSeek R1 Distill Qwen 1.5B
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151646 '<|begin▁of▁sentence|>'
print_info: EOS token        = 151643 '<|end▁of▁sentence|>'
print_info: EOT token        = 151643 '<|end▁of▁sentence|>'
print_info: PAD token        = 151654 '<|vision_pad|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|end▁of▁sentence|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: layer   0 assigned to device CPU
load_tensors: layer   1 assigned to device CPU
load_tensors: layer   2 assigned to device CPU
load_tensors: layer   3 assigned to device CPU
load_tensors: layer   4 assigned to device CPU
load_tensors: layer   5 assigned to device CPU
load_tensors: layer   6 assigned to device CPU
load_tensors: layer   7 assigned to device CPU
load_tensors: layer   8 assigned to device CPU
load_tensors: layer   9 assigned to device CPU
load_tensors: layer  10 assigned to device CPU
load_tensors: layer  11 assigned to device CPU
load_tensors: layer  12 assigned to device CPU
load_tensors: layer  13 assigned to device CPU
load_tensors: layer  14 assigned to device CPU
load_tensors: layer  15 assigned to device CPU
load_tensors: layer  16 assigned to device CPU
load_tensors: layer  17 assigned to device CPU
load_tensors: layer  18 assigned to device CPU
load_tensors: layer  19 assigned to device CPU
load_tensors: layer  20 assigned to device CPU
load_tensors: layer  21 assigned to device CPU
load_tensors: layer  22 assigned to device CPU
load_tensors: layer  23 assigned to device CPU
load_tensors: layer  24 assigned to device CPU
load_tensors: layer  25 assigned to device CPU
load_tensors: layer  26 assigned to device CPU
load_tensors: layer  27 assigned to device CPU
load_tensors: layer  28 assigned to device CPU
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/29 layers to GPU
load_tensors:   CPU_Mapped model buffer size =  1220.27 MiB
llama_init_from_model: n_seq_max     = 1
llama_init_from_model: n_ctx         = 2048
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch       = 2048
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 0
llama_init_from_model: freq_base     = 10000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
lm_ggml_metal_init: allocating
lm_ggml_metal_init: picking default device: Apple A14 GPU
lm_ggml_metal_init: loading '/var/containers/Bundle/Application/70795BBF-F5B7-4F66-8B9A-6DB79D244B9B/tutorwise.app/Frameworks/rnllama.framework/ggml-llama.metallib'
lm_ggml_metal_init: GPU name:   Apple A14 GPU
lm_ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
lm_ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
lm_ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
lm_ggml_metal_init: simdgroup reduction   = true
lm_ggml_metal_init: simdgroup matrix mul. = true
lm_ggml_metal_init: has residency sets    = false
lm_ggml_metal_init: has bfloat            = true
lm_ggml_metal_init: use bfloat            = true
lm_ggml_metal_init: hasUnifiedMemory      = true
lm_ggml_metal_init: recommendedMaxWorkingSetSize  =  4294.98 MB
lm_ggml_metal_init: loaded kernel_add                                            0x0 | th_max =    0 | th_width =    0
lm_ggml_metal_init: error: load pipeline error: Error Domain=AGXMetalA14 Code=3 "Function kernel_add is using language version 3.2 which is incompatible with this OS." UserInfo={NSLocalizedDescription=Function kernel_add is using language version 3.2 which is incompatible with this OS.}
lm_ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/private/var/mobile/Containers/Data/Application/3A3413E0-B41D-4517-88CC-3A8AD73473DD/tmp/com.anonymous.tutorwise-Inbox/DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf'

@jhen0409
Copy link
Member

jhen0409 commented Feb 5, 2025

lm_ggml_metal_init: error: load pipeline error: Error Domain=AGXMetalA14 Code=3 "Function kernel_add is using language version 3.2 which is incompatible with this OS." UserInfo={NSLocalizedDescription=Function kernel_add is using language version 3.2 which is incompatible with this OS.}
lm_ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize Metal backend

Looks like some kernels has been required to use Metal 3.2 (ref) I haven't noticed, so you may need to upgrade your iOS version to 18.

@CodeWithKyrian
Copy link

But I set n_gpu_layers: 0. Shouldn't that prevent use of metal entirely?

@jhen0409
Copy link
Member

jhen0409 commented Feb 5, 2025

But I set n_gpu_layers: 0. Shouldn't that prevent use of metal entirely?

Due to llama.cpp changes, the metal kernels still load even set n_gpu_layers: 0, which it used to offload some other compute ops if available.

@glenau
Copy link

glenau commented Feb 6, 2025

But I set n_gpu_layers: 0. Shouldn't that prevent use of metal entirely?

Due to llama.cpp changes, the metal kernels still load even set n_gpu_layers: 0, which it used to offload some other compute ops if available.

Is it possible to somehow disable the ability to connect metal? On iPad 7 (2019) I get this error when starting:

m_ggml_metal_init: loaded kernel_norm 0x0 | th_max = 0 | th_width = 0
lm_ggml_metal_init: error: load pipeline error: Error Domain=AGXMetalA12 Code=3 "Encountered unlowered function call to air.simd_sum.f32" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.simd_sum.f32}
lm_ggml_backend_metal_device_init: error: failed to allocate context
llama_new_context_with_model: failed to initialize Metal backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants