From 989ca859f37e73401104aa63526b5fbd0730c90f Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 11:41:05 +0200
Subject: [PATCH 1/6] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 365985d..a252f0a 100755
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ You can also leverage [tasksource](https://github.com/sileod/tasksource/) with t
 ```py
 rte = tn.AutoTask("glue/rte")
 ```
-AutoTask guesses a tempalte based on the dataset structure.
+AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).
 ## Sampling
 ```py
 tn.Classification(dataset,nrow=5000,nrows_eval=500 oversampling=2)

From 45de34a5ff5a6e7c47462f20a0309edf4e81ae26 Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 11:46:35 +0200
Subject: [PATCH 2/6] Update README.md

---
 README.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index a252f0a..b5a6b4a 100755
--- a/README.md
+++ b/README.md
@@ -39,17 +39,18 @@ Tasknet is multitask by design. `model.task_models_list` contains one model per
 ## Installation
 `pip install tasknet`
 
+## Balancing dataset sizes 
+```py
+tn.Classification(dataset,nrow=5000,nrows_eval=500 oversampling=2)
+```
+You can balance multiple datasets with `nrows` and `oversampling`. `nrows` is the maximal number of examples. If a dataset has less than `nrows`, it will be oversampled at most `oversampling` times.
+
 ## AutoTask
 You can also leverage [tasksource](https://github.com/sileod/tasksource/) with tn.AutoTask and have one-line access to 600+ datasets, see [implemented tasks](https://github.com/sileod/tasksource/blob/main/README.md).
 ```py
-rte = tn.AutoTask("glue/rte")
+rte = tn.AutoTask("glue/rte",nrow=5000)
 ```
 AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).
-## Sampling
-```py
-tn.Classification(dataset,nrow=5000,nrows_eval=500 oversampling=2)
-```
-You can balance multiple datasets with `nrows` and `oversampling`. `nrows` is the maximal number of examples. If a dataset has less than `nrows`, it will be oversampled at most `oversampling` times.
 
 ## Colab examples
 Minimal-ish example:

From 93ece10867054a29203e236e0be4fb838ef7b2b4 Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 11:47:10 +0200
Subject: [PATCH 3/6] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index b5a6b4a..1b18811 100755
--- a/README.md
+++ b/README.md
@@ -41,14 +41,14 @@ Tasknet is multitask by design. `model.task_models_list` contains one model per
 
 ## Balancing dataset sizes 
 ```py
-tn.Classification(dataset,nrow=5000,nrows_eval=500 oversampling=2)
+tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2)
 ```
 You can balance multiple datasets with `nrows` and `oversampling`. `nrows` is the maximal number of examples. If a dataset has less than `nrows`, it will be oversampled at most `oversampling` times.
 
 ## AutoTask
 You can also leverage [tasksource](https://github.com/sileod/tasksource/) with tn.AutoTask and have one-line access to 600+ datasets, see [implemented tasks](https://github.com/sileod/tasksource/blob/main/README.md).
 ```py
-rte = tn.AutoTask("glue/rte",nrow=5000)
+rte = tn.AutoTask("glue/rte", nrows=5000)
 ```
 AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).
 

From e6a5f32d92fc5875d52935bc5e3eda79370c1abf Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 12:02:29 +0200
Subject: [PATCH 4/6] Delete .gitattributes

---
 .gitattributes | 1 -
 1 file changed, 1 deletion(-)
 delete mode 100644 .gitattributes

diff --git a/.gitattributes b/.gitattributes
deleted file mode 100644
index b457ff5..0000000
--- a/.gitattributes
+++ /dev/null
@@ -1 +0,0 @@
-*.py linguist-vendored

From 79afb6a4c5d00ddc02eb3f050f0175c168fea02d Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 12:14:10 +0200
Subject: [PATCH 5/6] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 1b18811..6fe2f19 100755
--- a/README.md
+++ b/README.md
@@ -20,7 +20,7 @@ import tasknet as tn; from datasets import load_dataset
 
 rte = tn.Classification(
     dataset=load_dataset("glue", "rte"),
-    s1="sentence1", s2="sentence2", y="label") #s2 is optional
+    s1="sentence1", s2="sentence2", y="label") #s2 is optional # See AutoTask for shorter code
 
 class hparams:
   model_name='microsoft/deberta-v3-base' # deberta models have the best results (and tasknet support)

From 859260b311e81fecf665d5e742fed12f9132897e Mon Sep 17 00:00:00 2001
From: sileod <damien.sileo@gmail.com>
Date: Fri, 30 Jun 2023 13:51:01 +0200
Subject: [PATCH 6/6] Update README.md

---
 README.md | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 6fe2f19..2cba434 100755
--- a/README.md
+++ b/README.md
@@ -12,7 +12,9 @@
 The task templates follow the same interface. They implement `preprocess_function`, a data collator and `compute_metrics`.
 Look at [tasks.py](https://github.com/sileod/tasknet/blob/main/src/tasknet/tasks.py) and use existing templates as a starting point to implement a custom task template.
 
-## Task instances and example
+## Installation and example
+
+`pip install tasknet`
 
 Each task template has fields that should be matched with specific dataset columns. Classification has two text fields `s1`,`s2`, and a label `y`. Pass a dataset to a template, and fill in the mapping between the template fields and the dataset columns to instantiate a task. 
 ```py
@@ -36,9 +38,6 @@ p([{'text':x.premise,'text_pair': x.hypothesis}]) # HuggingFace pipeline for inf
 ```
 Tasknet is multitask by design. `model.task_models_list` contains one model per task, with a shared encoder.
 
-## Installation
-`pip install tasknet`
-
 ## Balancing dataset sizes 
 ```py
 tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2)