Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 3.1 KB

TRANSPARENCY.md

File metadata and controls

32 lines (21 loc) · 3.1 KB

TRANSPARENCY

Overview

UniPrompt assists prompt engineers by generating high-accuracy prompt candidates for any task. Given a one-line task description and a representative sample of input-output task demonstrations, UniPrompt iteratively generates and refines the prompt text such that it maximizes accuracy on a held-out validation set. UniPrompt relies on an auxiliary LLM like GPT-4 to generate the text edits.

Objective

Our objective is to minimize manual effort in developing high-accuracy prompts for LLMs. Prompt engineering is a tedious process. Human prompt engineers often spend considerable time to identify errors with a given prompt, consider the different facets of a task (e.g., counter-examples, explanations, analogies) that may fix those errors, include them in the prompt if they improve the accuracy, and iterate. UniPrompt mimics the human prompt engineering process by iteratively adding or editing prompt text and evaluating its accuracy.

Audience

UniPrompt algorithm is intended for researchers, AI practitioners, and industry professionals who are interested in the optimizing performance of LLMs on specific tasks.

Intended Uses

UniPrompt can be used to generate prompt candidates for any task. For any new task, UniPrompt requires a short task description, a few hundred input-output demonstrations, an accuracy or quality metric, and access to an auxiliary LLM. As a result, the algorithm is most applicable for tasks where precise input-output demonstrations can be evaluated, for example, classification or ranking tasks. It can also be used for optimizing generation tasks as long as the LLM output can be evaluated programmatically.

Out of Scope Uses

UniPrompt algorithm is not intended to be used to circumvent any policies adopted by LLM providers.

Evaluation

UniPrompt has been shown to generate high-accuracy prompts for a range of tasks, including text classification, semantic relevance, math tasks, and other reasoning tasks. In all these tasks, the prompt instruction generated by the algorithm had a higher accuracy on an unseen test set than the best-known manual or auto-generated prompt. For more details, check out the paper: https://arxiv.org/abs/2406.10504.

Limitations

  • UniPrompt was tested on popular NLP benchmarks and a real-world semantic relevance task. Performance of the method on other real-world tasks may differ.
  • We tested UniPrompt only for generating prompts in English language.
  • The accuracy of UniPrompt's output depends on the diversity and representativeness of the training examples (input-output demonstrations) provided.

Usage

This project is primarily designed for research and experimental purposes. We strongly recommend conducting further testing and validation before considering its application in industrial or real-world scenarios.

Feedback and Collaboration

We welcome feedback and collaboration from our audience. If you have suggestions, questions, or would like to contribute to the project, please feel free to raise an issue or add a pull request