Dataset in example honest notebook #16

Zijian007 · 2023-11-07T00:56:37Z

I notice in the paper and the example Jupyter code, the output of ASSISTANT(response), or the statement is truncated, I would like to know the reason. Thank you so much!

andyzoujm · 2023-11-07T00:59:51Z

To extract functions (such as being honest), we're collecting neural activity at every token position in the response, as described in step 2 of the LAT scan in the paper.

Zijian007 · 2023-11-08T05:56:54Z

Thank you for you reply!

I notice in the data process

representation-engineering/examples/honesty/utils.py

Line 41 in f869e2c

for idx in range(1, len(tokens) - 5):

, the input tokens are truncated, I want to know why not input all the tokens into the model? Thank you!

andyzoujm · 2023-11-08T06:00:19Z

It was a design choice since unfinished sentences don't have strong indication of honesty/dishonesty. But it might not matter that much.

Jeffwang87 · 2023-11-17T06:36:41Z

what is the difference between [true false] and [false true] in the label for honesty dataset

justinphan3110cais · 2023-11-17T06:50:47Z

it is corresponding to the pairs in the train, in the train we randomly shuffle so some them have [0] index as honesty and some of them have [1] index as honesty

Dakingrai · 2024-02-25T13:33:13Z

Why true_statements is only considered for both training and testing? Why were false_statements not considered to generate untruthful_statements?
Why are train labels randomly shuffled? Shouldn't they have [1] for honest_statements and [0] for untruthful_statements?

shamikbosefj · 2024-05-28T13:54:38Z

@andyzoujm , I'm trying to understand the dataset in honesty_function_dataset(). Only the true_statements are being used to create the train_set and I'm not sure why. I believe @Dakingrai also asked a similar question, but it was never answered

joshlevy89 · 2024-05-28T17:46:26Z

@shamikbosefj i am not a contributor to this repo but my guess is that since the true_statements are getting truncated with the functional stimulation paradigm and prefixed with "imagine you are a truthful..." or "imagine you are an untruthful...", it doesn't really matter. In the end, the statement is not completed so it leaves the door open to whatever ideas for completion, depending on whether the LLM is asked to be truthful or untruthful . This is probably better than creating each pair out of a true and false statement because it reduces the amount of variability, and so the activations are likely to vary only on the honesty/dishonesty axis. Now I'm sure they could have also used false statements to create separate pairs (again, truncating the statement and prefixing as described) but they don't need that many pairs so it probably just wasn't necessary.

As to you other question, they need to shuffle in order to create variability on the axis of interest (honesty/dishonesty) I believe. Otherwise, when PCA is done on the difference, there's no variability over the pairs in the direction of the vector in that axis of interest.

See also: #23 (comment)

shamikbosefj · 2024-05-29T07:53:55Z

Thanks for the explanation, @joshlevy89
In the line following the shuffled test data, there's a strange operation

reshaped_data = np.array([[honest, untruthful] for honest, untruthful in 
                          zip(honest_statements[:-1], untruthful_statements[1:])]).flatten()
test_data = reshaped_data[ntrain:ntrain*2].tolist()

Why does it skip the first and last of the honest and untruthful sets respectively? Is this just a way to ensure that the same text isn't picked as honest and untruthful?

joshlevy89 · 2024-05-29T11:18:55Z

@shamikbosefj hm, i'm not sure about that line either. i'm not sure what it's trying to accomplish. i think you could probably be replaced by something simpler...
test_data = np.array(combined_data[ntrain:ntrain+ntrain//2]).flatten().tolist()

shamikbosefj · 2024-05-29T14:00:04Z

@joshlevy89 I'm wondering if it's due to the random.shuffle() call earlier. That edits the array in place, so maybe they wanted to use the original values again?

shamikbosefj · 2024-06-13T13:00:20Z

@justinphan3110 @joshlevy89 @andyzoujm I think there's a bug in the dataset creation of the honesty_function_dataset() in utils.py . The size of the train_data is 1024, but the size of train_labels is 512 (See attached screenshot). This is different to the test_data where both data and labels have the same size (512). If this is not a bug, can someone please explain this discrepancy?

Updated utils.py

Result

ivyllll · 2024-09-30T17:57:11Z

@shamikbosefj i am not a contributor to this repo but my guess is that since the true_statements are getting truncated with the functional stimulation paradigm and prefixed with "imagine you are a truthful..." or "imagine you are an untruthful...", it doesn't really matter. In the end, the statement is not completed so it leaves the door open to whatever ideas for completion, depending on whether the LLM is asked to be truthful or untruthful . This is probably better than creating each pair out of a true and false statement because it reduces the amount of variability, and so the activations are likely to vary only on the honesty/dishonesty axis. Now I'm sure they could have also used false statements to create separate pairs (again, truncating the statement and prefixing as described) but they don't need that many pairs so it probably just wasn't necessary.

justinphan3110 mentioned this issue Nov 17, 2023

Dataset in example honest notebook #19

Closed

justinphan3110cais changed the title ~~why the output of ASSISTANT(response) need to be truncated?~~ Dataset in example honest notebook Nov 17, 2023

justinphan3110cais added the example_honesty label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset in example honest notebook #16

Dataset in example honest notebook #16

Zijian007 commented Nov 7, 2023

andyzoujm commented Nov 7, 2023

Zijian007 commented Nov 8, 2023

andyzoujm commented Nov 8, 2023

Jeffwang87 commented Nov 17, 2023

justinphan3110cais commented Nov 17, 2023

Dakingrai commented Feb 25, 2024

shamikbosefj commented May 28, 2024

joshlevy89 commented May 28, 2024 •

edited

Loading

shamikbosefj commented May 29, 2024

joshlevy89 commented May 29, 2024

shamikbosefj commented May 29, 2024

shamikbosefj commented Jun 13, 2024 •

edited

Loading

ivyllll commented Sep 30, 2024

Dataset in example honest notebook #16

Dataset in example honest notebook #16

Comments

Zijian007 commented Nov 7, 2023

andyzoujm commented Nov 7, 2023

Zijian007 commented Nov 8, 2023

andyzoujm commented Nov 8, 2023

Jeffwang87 commented Nov 17, 2023

justinphan3110cais commented Nov 17, 2023

Dakingrai commented Feb 25, 2024

shamikbosefj commented May 28, 2024

joshlevy89 commented May 28, 2024 • edited Loading

shamikbosefj commented May 29, 2024

joshlevy89 commented May 29, 2024

shamikbosefj commented May 29, 2024

shamikbosefj commented Jun 13, 2024 • edited Loading

ivyllll commented Sep 30, 2024

joshlevy89 commented May 28, 2024 •

edited

Loading

shamikbosefj commented Jun 13, 2024 •

edited

Loading