diff --git a/docs/OtterHD.md b/docs/OtterHD.md index c7740f5e..46a2b4bb 100644 --- a/docs/OtterHD.md +++ b/docs/OtterHD.md @@ -26,7 +26,7 @@ -[Technical Report](link) | [Demo](https://huggingface.co/spaces/Otter-AI/OtterHD-8B-demo) | [Benchmarks](https://huggingface.co/spaces/Otter-AI) +[Technical Report](https://arxiv.org/abs/2311.04219) | [Demo](https://huggingface.co/spaces/Otter-AI/OtterHD-8B-demo) | [Benchmarks](https://huggingface.co/spaces/Otter-AI) We introduce OtterHD-8B, a multimodal model fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input without requiring a vision encoder. OtterHD-8B also supports flexible input sizes at test time, ensuring adaptability to diverse inference budgets.