ValueError: draft_choices was not cut enough / draft_len should not exceed 65 #13

yangbohust · 2024-05-08T09:05:19Z

I adapted the REST solution to the Qwen-7b model and encountered the following problem when testing it on the Human-eval data set:
What may be the cause of this problem and how to solve it?

Traceback (most recent call last):
  File "code/opensource/my/REST/human_eval/rest_test.py", line 223, in <module>
    run_eval(
  File "code/opensource/my/REST/human_eval/rest_test.py", line 67, in run_eval
    candidates, tree_candidates, draft_buffers = generate_candidates_and_draft_buffer(
  File "code/opensource/my/REST/human_eval/../rest/model/utils.py", line 105, in generate_candidates_and_draft_buffer
    retrieved_token_list, _draft_attn_mask, _tree_indices, _draft_position_ids, _retrieve_indices = datastore.search(this_token, choices=max_num_draft)
  File "envs/rest/lib/python3.9/site-packages/draftretriever/__init__.py", line 54, in search
    return self.reader.search(
ValueError: draft_choices was not cut enough

REST/DraftRetriever/src/lib.rs

Lines 350 to 359 in 6aed6ad

    
           // Because multiple nodes in the Trie may have same weights around the threshold, the number of draft tokens may exceed choices 
        
           // We roughly cut nodes to be less than choices in most cases.  
        
           let paths = cut_to_choices(verified, choices); 
        
           let (draft_choices, max_branch) = get_draft_choices(paths.clone()); 
        
           if draft_choices.len() > choices as usize { 
        
               // It might not be cut enough because cut_to_choices() is best effort, as mentioned in the comment above 
        
               return Err(exceptions::PyValueError::new_err("draft_choices was not cut enough")); 
        
           }

Why can't the length of draft_choices be directly truncated to the length specified by the choices parameter?

Thanks,

The text was updated successfully, but these errors were encountered:

zhenyuhe00 · 2024-05-08T11:29:17Z

For efficient purposes, the algorithm in

REST/DraftRetriever/src/lib.rs

Lines 335 to 344 in 6aed6ad

    
           for (k, v) in &cnt { 
        
               if heap.len() < (choices as usize) { 
        
                   heap.push((Reverse(*v), k)); 
        
               } else if let Some(&(Reverse(top_v), _)) = heap.peek() { 
        
                   if *v > top_v { 
        
                       heap.pop(); 
        
                       heap.push((Reverse(*v), k)); 
        
                   } 
        
               } 
        
           }

may deviate when forming a Trie if multiple draft sequences have identical occurrence counts. This can cause the number of 'draft_choices' to be approximately equal to the 'choices' we set, rather than being strictly equal or less.

A more strict implementation would be like:

        for (k, v) in &cnt {
            heap.push((Reverse(*v), k));
            if heap.len() > choices as usize {
                heap.pop();
            }
        }

This new implementation would preserve the original order of the draft sequences when they have identical occurrence counts. You may consider switching to this implementation, although it might be slightly slower.

yangbohust · 2024-05-11T07:26:14Z

I modified the lib.rs code so that it can be applied to the qwen-7b model (the vocab_size of the qwen-7b model is 151936), and then when I ran REST's Draft Retriever to retrieve the draft token from the knowledge datastores, the following accidentally appeared mistake. Is this error related to the implementation of the algorithm you pointed out above that is not strictly accurate? If I switch to the strict implementation you provided, can this problem be solved?

thread '<unnamed>' panicked at src/lib.rs:454:5:
draft_len should not exceed 65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)

REST/DraftRetriever/src/lib.rs

Lines 449 to 459 in 6aed6ad

    
           fn generate_draft_buffers(draft_choices: Vec<Vec<i32>>, topk: i32) -> (Vec<Vec<i32>>, Vec<i32>, Vec<i32>, Vec<Vec<i32>>) { 
        
               // Sort the draft_choices based on their lengths and then their values 
        
               let mut sorted_draft_choices = draft_choices; 
        
               sorted_draft_choices.sort_by(|a, b| match a.len().cmp(&b.len()) { 
        
                   Ordering::Equal => a.cmp(b), 
        
                   other => other, 
        
               }); 
        
               let draft_len = sorted_draft_choices.len() + 1; 
        
               assert! (draft_len <= 65, "draft_len should not exceed 65");

Thank you very much~

tim-pan · 2024-05-31T01:52:24Z

For efficient purposes, the algorithm in

REST/DraftRetriever/src/lib.rs

Lines 335 to 344 in 6aed6ad

for (k, v) in &cnt {

if heap.len() < (choices as usize) {

heap.push((Reverse(*v), k));

} else if let Some(&(Reverse(top_v), _)) = heap.peek() {

if *v > top_v {

heap.pop();

heap.push((Reverse(*v), k));

}

}

}

may deviate when forming a Trie if multiple draft sequences have identical occurrence counts. This can cause the number of 'draft_choices' to be approximately equal to the 'choices' we set, rather than being strictly equal or less.
A more strict implementation would be like:
        for (k, v) in &cnt {
            heap.push((Reverse(*v), k));
            if heap.len() > choices as usize {
                heap.pop();
            }
        }
This new implementation would preserve the original order of the draft sequences when they have identical occurrence counts. You may consider switching to this implementation, although it might be slightly slower.

Hi! I still met the same ValueError: draft_choices was not cut enough issue after I modifying this part. Do you have any comment?
I just want to reproduce your results, but the only thing I changed is I modified the num_choices to 8.
Thanks!

yangbohust changed the title ~~ValueError: draft_choices was not cut enough~~ ValueError: draft_choices was not cut enough and draft_len should not exceed 65 May 11, 2024

yangbohust changed the title ~~ValueError: draft_choices was not cut enough and draft_len should not exceed 65~~ ValueError: draft_choices was not cut enough / draft_len should not exceed 65 May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: draft_choices was not cut enough / draft_len should not exceed 65 #13

ValueError: draft_choices was not cut enough / draft_len should not exceed 65 #13

yangbohust commented May 8, 2024

zhenyuhe00 commented May 8, 2024

yangbohust commented May 11, 2024 •

edited

Loading

tim-pan commented May 31, 2024 •

edited

Loading

ValueError: draft_choices was not cut enough / draft_len should not exceed 65 #13

ValueError: draft_choices was not cut enough / draft_len should not exceed 65 #13

Comments

yangbohust commented May 8, 2024

zhenyuhe00 commented May 8, 2024

yangbohust commented May 11, 2024 • edited Loading

tim-pan commented May 31, 2024 • edited Loading

yangbohust commented May 11, 2024 •

edited

Loading

tim-pan commented May 31, 2024 •

edited

Loading