Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summary] Explanation Operations #13

Open
nfelnlp opened this issue Mar 23, 2023 · 2 comments
Open

[Summary] Explanation Operations #13

nfelnlp opened this issue Mar 23, 2023 · 2 comments
Labels
enhancement New feature or request summary

Comments

@nfelnlp
Copy link
Member

nfelnlp commented Mar 23, 2023

Operation Terminals / Prompts Action Description Tools Status
nlpattribute nlpattribute token | phrase | sentence {classes} feature_importance Provides feature importances at the token (default), phrase or sentence level. Captum (Integrated Gradients)
globaltopk important {number} {classes} global_topk Returns top k most attributed tokens across the entire dataset. Captum (Integrated Gradients)
nlpcfe nlpcfe {number} counterfactuals Returns counterfactual explanations (model predicts another label) for a single instance. Polyjuice
adversarial adversarial {number} Returns adversarial examples (model predicts wrong label) for a single instance. OpenAttack
similar similar {number} similarity Gets number of training data instances that are most similar to the current one. Sentence Transformers
rules rules {number} Outputs the decision rules for the dataset. Anchors
interact interact Gets feature interactions. HEDGE
rationalize rationalize rationalize Explains the prediction for some specified instance in natural language. Zero-shot prompting with GPTNeo parser
@nfelnlp
Copy link
Member Author

nfelnlp commented Mar 30, 2023

adversarial (via OpenAttack) has more than twice the execution time than Polyjuice which already takes quite a while. Since CFEs already cover a similar operation, OpenAttack is not part of the roadmap anymore. The long-term plan is to train one multi-purpose model than can reasonably perturb text for generating adversarial attacks, counterfactuals and general data augmentation at once.

interact (via HEDGE) is not possible to implement, because hierarchical explanations don't have an obvious natural language representation. Visualizations are not part of the agenda as of now.

rules (via Anchors) does not appear to return rules that inherently make sense (mostly single tokens) and takes very long to compute.

rationalize (via OpenAI API or a rationalizing LLM) will be implemented soon.

@nfelnlp
Copy link
Member Author

nfelnlp commented Apr 13, 2023

For rationalize, we can do the following:

  1. Design one prompt for each dataset
  2. Insert the input texts into the prompts
  3. Use GPT-3.5 / -4 to generate a few hundred rationales in a zero-shot setup
  4. Fine-tune a T5 for each dataset of rationales
  5. Run inference with the fine-tuned T5 to produce rationales for the rest of the datasets (because using ChatGPT for the tens of thousands of examples in BoolQ, OLID & DD is too expensive)
  6. Store generated rationales as CSVs or JSONs (see pre-computed feature attribution explanations in the cache folder for reference)

@nfelnlp nfelnlp added the enhancement New feature or request label May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request summary
Projects
None yet
Development

No branches or pull requests

1 participant