From 7f1159a1fe04cfb783dc31d4fbdef3bda0ce19e4 Mon Sep 17 00:00:00 2001 From: Fanyi Pu Date: Wed, 19 Jun 2024 15:35:05 +0800 Subject: [PATCH] update preparation --- tools/make_image_hf_dataset.ipynb | 44 ++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/tools/make_image_hf_dataset.ipynb b/tools/make_image_hf_dataset.ipynb index e0791f78..b8b9957b 100755 --- a/tools/make_image_hf_dataset.ipynb +++ b/tools/make_image_hf_dataset.ipynb @@ -6,13 +6,55 @@ "source": [ "# Make Image dataset on Hugging Face Datasets\n", "\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/main/tools/make_image_hf_dataset.ipynb)\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/pufanyi/hf_dataset_docs/tools/make_image_hf_dataset.ipynb)\n", "\n", "This notebook will guide you to make correct format of Huggingface dataset, in proper parquet format and visualizable in Huggingface dataset hub.\n", "\n", "We will take the example of the dataset [`pufanyi/VQAv2_Example`](https://huggingface.co/datasets/lmms-lab/VQAv2) and convert it to the proper format." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "We need to install `datasets` library to create the dataset and `Pillow` to handle images." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "outputs": [], + "source": [ + "!pip install datasets Pillow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And we need to login into Hugging Face to upload the dataset. You should goto the [Hugging Face website](https://huggingface.co/settings/tokens) to get your API token." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "outputs": [], + "source": [ + "!huggingface-cli login --token hf_YOUR_HF_TOKEN # replace hf_YOUR_HF_TOKEN to your own Hugging Face token." + ] + }, { "cell_type": "markdown", "metadata": {