Opts need a major cleanup #60

TimotheeMickus · 2024-03-08T08:37:47Z

Going through the existing catalogue of options listed in our docs, a number of them seem to not be plugged in.

The list below is most likely not exhaustive.

Data/Tasks

skip_empty_level: would be a useful feature but is actually currently not supported. I would suggest replacing with a transform if we ever end up needing it.

Vocab

share_vocab is likely not functional (in tests, in build_vocab.py that we no longer use, and in one transform)
vocab_paths, vocab_size_multiple, and {src,tgt}_min_word_frequency} are only listed in opts.py

Pruning

The flags here are not plugged to anything, Seem to be somewhat duplicate with Transform/Filter (first)

Embeddings

All flags here have been nuked. There seems to be a fixme in modules/embeddings.py. I'd suggest commenting them out with a fixme and creating an issue, if still relevant

Transform/Filter

We have multiple groups with the same name, which isn't great. They're likely functional but need to be renamed according to the transform they actually correspond to

Model- Embeddings

I am not sure why this is a separate group from Embedding.

given that we only support transformers, I guess we should have position_encoding True by default
share_embeddingsdoesn't seem to be used
update_vocab doesn't seem to be used

Model-Embedding Features

We don't support features. This should go.

Model- Task

We don't support CLM / no-decoder models. This should go.

Model- Encoder-Decoder

model_type only offers text option; seems duplicate with General -data_type. Do we want to keep it in place given Supporting other data types (e.g. video) #53 ? Do we want to remove it for now?
encoder_type I don't know that we support mean encoders?
decoder_type only has one option. Might be relevant for External dependencies for layer architectures #56
layers is deprecated. can we remove it?
{enc,dec}_layers: should mention stacks or modules in their help text
bridge, n_node, n_steps: misleading and almost certainly only in opts
bridge_extra_node, bidir_edges, state_dim, n_edge_types, src_ggnn_size: only in opts

Model- Attention

global_attention, global_attention_function: only in opts
max_relative_positions and aan_useffnseems like they are used, but i have doubts? at least I'm not familiar with those.

Model - Alignement

There's a typo in the opts group name (took me a while, it's the expected French orthography). Otherwise seems supported, would be worth a check later on as I'm not familiar with these.

Generator

i recall the copy_attn being half functional in onmt, would need a deeper check

General

save_all_gpus: do we want to support that?
data_type: cf. above wrt model_type in Encoder-Decoder
gpuid deprecated and should be removed
gpu_backend has only one option. Do we want to support anything else than NCCL?

Initialization

pre_word_vecs_{enc,dec} only in opts, same as with flags in Embeddings above

Optimization- Type

This contain both flags regarding batching (batch size, batch type, etc) and optimizers hparams, as well as miscellanies like early stopping, etc. Not sure why it's separate from the Optimization- Rate group.

truncated_decoder is apparently not compatible with gpu_ranks / accum, so it should probably be removed

Logging

report_stats_from_parameters= has a trailing equal sign

Dynamic data

Should be merged with batching options

The text was updated successfully, but these errors were encountered:

TimotheeMickus added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 8, 2024

TimotheeMickus self-assigned this Mar 8, 2024

TimotheeMickus mentioned this issue Mar 8, 2024

Opts cleaning #61

Merged

TimotheeMickus closed this as completed in #61 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opts need a major cleanup #60

Opts need a major cleanup #60

TimotheeMickus commented Mar 8, 2024 •

edited

Loading

Opts need a major cleanup #60

Opts need a major cleanup #60

Comments

TimotheeMickus commented Mar 8, 2024 • edited Loading

Data/Tasks

Vocab

Pruning

Embeddings

Transform/Filter

Model- Embeddings

Model-Embedding Features

Model- Task

Model- Encoder-Decoder

Model- Attention

Model - Alignement

Generator

General

Initialization

Optimization- Type

Logging

Dynamic data

TimotheeMickus commented Mar 8, 2024 •

edited

Loading