update README

svirpioj · svirpioj · commit 3d05260219b7 · 2024-10-02T16:45:47.000+03:00
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ See also [examples](./examples).
 BERT model, sequence classification task:
 
 1. Load pretrained Bert model by `base_model = AutoModelForSequenceClassification.from_pretrained(name_or_path)`
-2. Initialize SWAG model by `swag_model = SwagBertForSequenceClassification.from_base(base_model)`
+2. Initialize SWAG model by `swag_model = SwagBertForSequenceClassification.from_base(base_model, no_cov_mat=False)`
 3. Initialize SWAG callback object `swag_callback = SwagUpdateCallback(swag_model)`
 4. Initialize `transformers.Trainer` with the `base_model` as model and `swag_callback` in callbacks.
 5. Train the model (`trainer.train()`)
@@ -35,6 +35,24 @@ For collecting the SWAG parameters, two possible schedules are supported:
 * After the end of each training epoch (default, `collect_steps = 0` for `SwagUpdateCallback`)
 * After each N training steps (set `collect_steps > 0` for `SwagUpdateCallback`)
 
+### SWA versus SWAG
+
+The library supports both SWA (stochastic weight averaging without
+covariance estimation) and SWAG (stochastic weight averaging with
+Gaussian covariance estimation). The method is selected by the
+`no_cov_mat` attribute when initializing the model
+(e.g. `SwagModel.from_base(model, no_cov_mat=True)`). The default
+value `True` corresponds to SWA, and you need to explicitly set
+`no_cov_mat=False` to activate SWAG.
+
+With SWAG, the `max_num_models` option controls the maximum rank of
+the covariance matrix. The rank is increased by each parameter
+collection step until the maximum is reached. The current rank is
+stored in `model.swag.cov_mat_rank` and automatically updated to
+`model.config.cov_mat_rank` when using `SwagUpdateCallback`. If you
+call `model.swag.collect_model()` manually, you should also update the
+configuration accordingly before saving the model.
+
 ### Sampling model parameters
 
 After `swag_model` is trained or fine-tuned as described above,