Skip to content

Commit

Permalink
Update README.md (#18)
Browse files Browse the repository at this point in the history
* Update README.md
  • Loading branch information
celedue authored Jan 23, 2025
1 parent ec4911f commit a0e1f86
Showing 1 changed file with 13 additions and 15 deletions.
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,33 @@
![Python compatibility](https://badgen.net/pypi/python/folktexts)
[![Huggingface dataset](https://img.shields.io/badge/HuggingFace-FDEE21?style=flat&logo=huggingface&logoColor=black&color=%23FFD21E)](https://huggingface.co/datasets/acruz/folktexts)

## A toolbox for evaluating statistical properties of LLMs <!-- omit in toc -->

> This package is the basis for our NeurIPS'24 paper titled ["Evaluating language models as risk scores"](https://arxiv.org/abs/2407.14614)
Folktexts provides a suite of Q&A datasets for evaluating calibration and accuracy of LLMs
on prediction tasks with varying outcome uncertainty.

Folktexts is a suite of Q&A
datasets with natural outcome uncertainty, aimed at evaluating LLMs' calibration
on unrealizable tasks.
The `folktexts` python package provides functionalities to derive prediction tasks from survey data, translates these tasks into natural text prompts and implements different methods to extract _risk scores_ from LLMs.

With folktexts every LLM can be turned into a score function and the evaluation layer offers tools to compute statistical properties on top of these risk scores by comparing them to the ground truth outcomes.

The `folktexts` python package enables computing and evaluating classification _risk scores_ for tabular prediction tasks using LLMs.

<!-- ![folktexts-diagram](docs/_static/folktexts-loop-diagram.png) -->
<p align="center">
<img src="docs/_static/folktexts-loop-diagram.png" alt="folktexts-diagram" width="700px">
</p>

Several benchmark tasks are provided based on data from the American Community Survey.
**Use folktexts to benchmark your LLM:**

- Pre-defined benchmark tasks are provided based on data from the American Community Survey.
Namely, each tabular prediction task from the popular
[folktables](https://github.com/socialfoundations/folktables) package is made available
as a natural-language Q&A task.
- Parsed and ready-to-use versions of each *folktexts* dataset can be found on
<a href="https://huggingface.co/datasets/acruz/folktexts"> Huggingface</a>.
- Package documentation can be found [here](https://socialfoundations.github.io/folktexts/).

Parsed and ready-to-use versions of each *folktexts* dataset can be found on
<a href="https://huggingface.co/datasets/acruz/folktexts">
<span style="display: inline-block; vertical-align: middle;">
<img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="Logo" style="height: 1em; vertical-align: text-bottom;">
</span>
Huggingface</a>.

Package documentation can be found [here](https://socialfoundations.github.io/folktexts/).

**Table of contents:**
## Table of contents <!-- omit in toc -->
- [Getting started](#getting-started)
- [Installing](#installing)
- [Basic setup](#basic-setup)
Expand Down

0 comments on commit a0e1f86

Please sign in to comment.