readme-update

RodionfromHSE · Nov 17, 2023 · f6d16df · f6d16df
1 parent b0d7d5b
commit f6d16df
Showing 1 changed file with 18 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -67,15 +67,17 @@ pip3 install -r requirements.txt
 The training dataset is based on `saier/unarxive_citrec` [hf](https://huggingface.co/datasets/saier/unarxive_citrec).
 
 *Details*:
-<!-- Train: 9082, Valid: 702, Test: 568 -->
+```yaml
 Train size: 9082
 Valid size: 702
 Test size: 568
+```
 
-All the samples have length from `128` to `512` characters (TO-DO: characters -> tokens)
+All the samples have length from `128` to `512` characters (TO-DO: characters -> tokens)\
 More in `notebooks/data/dataset_download.ipynb`
 
-After collecting the dataset, we carefully translated the samples from English to Russian using the OpenAI API. Details in `notebooks/data/dataset_translate.ipynb`
+After collecting the dataset, we carefully translated the samples from English to Russian using the OpenAI API.\
+Details in `notebooks/data/dataset_translate.ipynb`
 
 #### Dataset for model comparison (EvalDataset)
 This dataset is based on `turkic_xwmt`, `subset=ru-en`, `split=test` [hf](https://huggingface.co/datasets/turkic_xwmt).
@@ -86,26 +88,32 @@ Dataset size: 1000
 
 Models comparison is based on bleu score of the translated samples and reference translation by OpenAI.
 
-*Models*:
-transformer-en-ru: `Helsinki-NLP/opus-mt-en-ru` [hf](https://huggingface.co/Helsinki-NLP/opus-mt-en-ru)
+**Models**:\
+transformer-en-ru: `Helsinki-NLP/opus-mt-en-ru` [hf](https://huggingface.co/Helsinki-NLP/opus-mt-en-ru)\
 nnlb-1.3B-distilled: `facebook/nllb-200-distilled-1.3B` [hf](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
 
+
 **Results**:
+```yaml
 transformer-en-ru BLEU: 2.58
 nnlb-1.3B-distilled BLEU: 2.55
+```
 
-Even though results aren't statistically important, transformer-en-ru model was chosen since it's faster and has smaller size.
+Even though results aren't statistically important, transformer-en-ru model was chosen since it's faster and has smaller size.\
 Details in `src/finetune/eval_bleu.py`
 
 ## Model finetuning
 
-Simple seq2seq model finetuning transformer-en-ru.
-Details in `notebooks/finetune/finetune.ipynb`. 
+Simple seq2seq model finetuning transformer-en-ru.\
+Details in `notebooks/finetune/finetune.ipynb`.\
 Model on [hf](https://huggingface.co/under-tree/transformer-en-ru)
 
-**Fine-tuned model results:**
+**Fine-tuned model results**:
+```yaml
 eval_loss: 0.656
-eval_bleu: 67.197 (suspeciously high)
+eval_bleu: 67.197
+```
+(BLEU is suspeciously high)