You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have followed all the step mentioned I can able to execute the ./rag_demo --help command but when i try to run the main command to interact with model its telling the illegal instruction
I have followed all the step mentioned I can able to execute the ./rag_demo --help command but when i try to run the main command to interact with model its telling the illegal instruction
132|OP5155L1:/data/local/tmp/Android/bin $ ./rag_demo --embedding_model ./gte-base-f32.gguf --database_input ./output_chunks.json --database_output ./embedded_document_chunks.json --model_path ./model.ggml --m>
llama_model_loader: loaded meta data with 23 key-value pairs and 197 tensors from ./gte-base-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bert
llama_model_loader: - kv 1: general.name str = gte-base
llama_model_loader: - kv 2: bert.block_count u32 = 12
llama_model_loader: - kv 3: bert.context_length u32 = 512
llama_model_loader: - kv 4: bert.embedding_length u32 = 768
llama_model_loader: - kv 5: bert.feed_forward_length u32 = 3072
llama_model_loader: - kv 6: bert.attention.head_count u32 = 12
llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 8: general.file_type u32 = 0
llama_model_loader: - kv 9: bert.attention.causal bool = false
llama_model_loader: - kv 10: bert.pooling_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 101
llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 102
llama_model_loader: - kv 14: tokenizer.ggml.model str = bert
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 19: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 22: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - type f32: 197 tensors
llm_load_vocab: mismatch in special tokens definition ( 7104/30522 vs 5/30522 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = bert
llm_load_print_meta: vocab type = WPM
llm_load_print_meta: n_vocab = 30522
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 512
llm_load_print_meta: n_embd = 768
llm_load_print_meta: n_head = 12
llm_load_print_meta: n_head_kv = 12
llm_load_print_meta: n_layer = 12
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 768
llm_load_print_meta: n_embd_v_gqa = 768
llm_load_print_meta: f_norm_eps = 1.0e-12
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 3072
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 0
llm_load_print_meta: pooling type = 1
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 512
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 109M
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 108.89 M
llm_load_print_meta: model size = 415.39 MiB (32.00 BPW)
llm_load_print_meta: general.name = gte-base
llm_load_print_meta: BOS token = 101 '[CLS]'
llm_load_print_meta: EOS token = 102 '[SEP]'
llm_load_print_meta: UNK token = 100 '[UNK]'
llm_load_print_meta: SEP token = 102 '[SEP]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_tensors: ggml ctx size = 0.08 MiB
llm_load_tensors: CPU buffer size = 415.39 MiB
........................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 18.00 MiB
llama_new_context_with_model: KV self size = 18.00 MiB, K (f16): 9.00 MiB, V (f16): 9.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.00 MiB
llama_new_context_with_model: CPU compute buffer size = 20.00 MiB
llama_new_context_with_model: graph nodes = 431
llama_new_context_with_model: graph splits = 1
Illegal instruction
The text was updated successfully, but these errors were encountered: