Skip to content

Commit

Permalink
Add a beam search implementation and a .generate() method to the mo…
Browse files Browse the repository at this point in the history
…del (#83)
  • Loading branch information
epwalsh authored Apr 6, 2023
1 parent 9911b78 commit 5607566
Show file tree
Hide file tree
Showing 5 changed files with 2,019 additions and 3 deletions.
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,33 @@ This may require a reservation on the Infiniband cluster.

See the [Beaker documentation](https://beaker-docs.apps.allenai.org/distributed-training.html) for more information on distributed training.

## Generating text

You can use the `generate()` method to produce text using beam search with a variety of options.

For example:

```python
# Prepare inputs.
# Note: we don't want the EOS token added to the end of the input, hence
# the `add_special_tokens=False`.
input_ids = tokenizer.encode("I'm a large language model, ", add_special_tokens=False)
# `model.generate()` expects a batch.
input_tensor = torch.tensor(input_ids).unsqueeze(0)

# Run beam search.
outputs = model.generate(input_tensor, max_steps=3, beam_size=3)

# The output token IDs are shape (batch_size, beam_size, max_steps)
best_generation = outputs.token_ids[0][0].tolist()
print(tokenizer.decode(best_generation))
```

## Finding official runs

We keep all of our runs in WandB under [the "ai2-llm" entity](https://wandb.ai/ai2-llm).
We don't store model checkpoints in WandB. Those are in GCS under `gs://allennlp-olmo/<wandb_run_path>`.

### Highlighted models

* 300M parameters, ~70B tokens, a starter model that's not completely random: https://wandb.ai/ai2-llm/LLM-scripts/runs/ed5krfk9
* 300M parameters, ~70B tokens, a starter model that's not completely random: https://wandb.ai/ai2-llm/LLM-scripts/runs/ed5krfk9
Loading

0 comments on commit 5607566

Please sign in to comment.