MASS objective + transform #16

TimotheeMickus · 2023-09-21T08:27:24Z

closes #15

Essentially, adds a labels key to batches to distinguish between decoder-side inputs (tgt and targets') and decoder-side labels (currently only in use for MASS, but can be helpful for other decoder-side denoising schemes).
labels are initialized as a copy of target unless otherwise specified by a transform.

With this change, the MASS noising transform devolves into a slightly more complex formatting in a BART noising transform.

TimotheeMickus · 2023-09-22T14:25:34Z

smoketested on unpc. requires a thorough review.

Waino

It looks like this should work. Some changes would improve clarity.

Waino · 2023-09-25T06:28:54Z

onmt/tests/test_transform.py

@@ -22,22 +22,22 @@ def test_transform_register(self):
            "sentencepiece",
            "bpe",
            "onmt_tokenize",
-            "bart",
+            "ae_noise",


The transform is named ae_noise but the file is named denoising. Would it be better to use only one of the names in both locations?

Waino · 2023-09-25T06:41:45Z

onmt/trainer.py

@@ -461,7 +461,7 @@ def _gradient_accumulation_over_lang_pairs(
            seen_comm_batches.add(comm_batch)
            if self.norm_method == "tokens":
                num_tokens = (
-                    batch.tgt[1:, :, 0].ne(self.train_loss_md[f'trainloss{metadata.tgt_lang}'].padding_idx).sum()
+                    batch.labels[1:, :, 0].ne(self.train_loss_md[f'trainloss{metadata.tgt_lang}'].padding_idx).sum()
                )
                normalization += num_tokens.item()
            else:


Later on line 484 tgt is used: tgt_outer = batch.tgt.

If I read this correctly, the idea is that tgt_outer is used only as the input sequence to the decoder. The whole batch is then given to the loss function self.train_loss_md which will take care of using the label sequence instead.

This is not easy to figure out, because the loss function is wrapped in several layers of onmt scaffolding. A comment would be nice.

Yes.

Full disclosure: I did minimal edits. I don't know why the code doesn't use batch.tgt directly line 490. I try not to fix what's not broken.

Waino · 2023-09-25T06:50:02Z

onmt/utils/loss.py

@@ -328,6 +330,7 @@ def _make_shard_state(self, batch, output, range_, attns=None):
        shard_state = {
            "output": output,
            "target": batch.tgt[range_start:range_end, :, 0],
+            "labels": batch.labels[range_start:range_end, :, 0],


It may not actually be necessary to have separate "target" and "labels" here. As this is the loss, it is only concerned with comparing the output to the desired labels. The decoder input ("target") shouldn't be needed, and as far as I can tell is not used.

This way is clearer, though. Also, slicing a tensor and then not touching the slice should not incur much of a cost, as it will be just a view, so removing the "target" might not make any difference.

I'm strongly in favor of the currently implemented approach here (I'd rather anything that exits the batchers to have roughly the same attributes)

Yes, consistent naming and structure is desirable, and is a good enough motivation for keeping target here.

On the topic of consistent naming: I noticed that there are some places (at least in the _stats method) where the name target is used, but it now actually refers to labels. _stats takes gtruth, which was changed in this PR.

Waino · 2023-09-25T07:01:12Z

onmt/transforms/denoising.py

    def __init__(self, opts):
        super().__init__(opts)
+        self.denoising_objective = opts.denoising_objective


We should validate that mask-random is not used with the MASS objective.

In BART, it is possible to occasionally substitute a random token instead of the mask token. This slightly alleviates the tendency to copy all unmasked tokens verbatim.

In MASS, substituting a random token on the source side would also affect the target sequence: the random token would be complemented into a mask token, and would not contribute to the loss. This does not make sense.

so random_ratio > 0

Waino

LGTM

mass impl start

21d21e1

TimotheeMickus requested a review from Waino September 21, 2023 08:27

Mickus Timothee added 3 commits September 21, 2023 11:36

merging mass and bart

9f3aab8

more generic transform name

e801ea4

naming convention

4a93e0e

TimotheeMickus marked this pull request as ready for review September 22, 2023 14:25

Waino requested changes Sep 25, 2023

View reviewed changes

Mickus Timothee added 2 commits September 25, 2023 12:11

mass, commented

904a706

lint

d13601e

TimotheeMickus requested a review from Waino September 25, 2023 09:25

fixing names

43d2460

Waino approved these changes Sep 25, 2023

View reviewed changes

TimotheeMickus merged commit 470d466 into main Sep 25, 2023
4 checks passed

TimotheeMickus deleted the feats/mass branch September 25, 2023 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MASS objective + transform #16

MASS objective + transform #16

TimotheeMickus commented Sep 21, 2023 •

edited

Loading

TimotheeMickus commented Sep 22, 2023

Waino left a comment

Waino Sep 25, 2023

Waino Sep 25, 2023

TimotheeMickus Sep 25, 2023

Waino Sep 25, 2023

TimotheeMickus Sep 25, 2023

Waino Sep 25, 2023

Waino Sep 25, 2023

TimotheeMickus Sep 25, 2023

Waino left a comment

MASS objective + transform #16

MASS objective + transform #16

Conversation

TimotheeMickus commented Sep 21, 2023 • edited Loading

TimotheeMickus commented Sep 22, 2023

Waino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Waino left a comment

Choose a reason for hiding this comment

TimotheeMickus commented Sep 21, 2023 •

edited

Loading