Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opts need a major cleanup #60

Closed
TimotheeMickus opened this issue Mar 8, 2024 · 0 comments · Fixed by #61
Closed

Opts need a major cleanup #60

TimotheeMickus opened this issue Mar 8, 2024 · 0 comments · Fixed by #61
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Comments

@TimotheeMickus
Copy link
Collaborator

TimotheeMickus commented Mar 8, 2024

Going through the existing catalogue of options listed in our docs, a number of them seem to not be plugged in.

The list below is most likely not exhaustive.

Data/Tasks

  • skip_empty_level: would be a useful feature but is actually currently not supported. I would suggest replacing with a transform if we ever end up needing it.

Vocab

  • share_vocab is likely not functional (in tests, in build_vocab.py that we no longer use, and in one transform)
  • vocab_paths, vocab_size_multiple, and {src,tgt}_min_word_frequency} are only listed in opts.py

Pruning

The flags here are not plugged to anything, Seem to be somewhat duplicate with Transform/Filter (first)

Embeddings

All flags here have been nuked. There seems to be a fixme in modules/embeddings.py. I'd suggest commenting them out with a fixme and creating an issue, if still relevant

Transform/Filter

We have multiple groups with the same name, which isn't great. They're likely functional but need to be renamed according to the transform they actually correspond to

Model- Embeddings

I am not sure why this is a separate group from Embedding.

  • given that we only support transformers, I guess we should have position_encoding True by default
  • share_embeddingsdoesn't seem to be used
  • update_vocab doesn't seem to be used

Model-Embedding Features

We don't support features. This should go.

Model- Task

We don't support CLM / no-decoder models. This should go.

Model- Encoder-Decoder

  • model_type only offers text option; seems duplicate with General -data_type. Do we want to keep it in place given Supporting other data types (e.g. video) #53 ? Do we want to remove it for now?
  • encoder_type I don't know that we support mean encoders?
  • decoder_type only has one option. Might be relevant for External dependencies for layer architectures #56
  • layers is deprecated. can we remove it?
  • {enc,dec}_layers: should mention stacks or modules in their help text
  • bridge, n_node, n_steps: misleading and almost certainly only in opts
  • bridge_extra_node, bidir_edges, state_dim, n_edge_types, src_ggnn_size: only in opts

Model- Attention

  • global_attention, global_attention_function: only in opts
  • max_relative_positions and aan_useffnseems like they are used, but i have doubts? at least I'm not familiar with those.

Model - Alignement

There's a typo in the opts group name (took me a while, it's the expected French orthography). Otherwise seems supported, would be worth a check later on as I'm not familiar with these.

Generator

i recall the copy_attn being half functional in onmt, would need a deeper check

General

  • save_all_gpus: do we want to support that?
  • data_type: cf. above wrt model_type in Encoder-Decoder
  • gpuid deprecated and should be removed
  • gpu_backend has only one option. Do we want to support anything else than NCCL?

Initialization

  • pre_word_vecs_{enc,dec} only in opts, same as with flags in Embeddings above

Optimization- Type

This contain both flags regarding batching (batch size, batch type, etc) and optimizers hparams, as well as miscellanies like early stopping, etc. Not sure why it's separate from the Optimization- Rate group.

  • truncated_decoder is apparently not compatible with gpu_ranks / accum, so it should probably be removed

Logging

  • report_stats_from_parameters= has a trailing equal sign

Dynamic data

Should be merged with batching options

@TimotheeMickus TimotheeMickus added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 8, 2024
@TimotheeMickus TimotheeMickus self-assigned this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant