-
Notifications
You must be signed in to change notification settings - Fork 9
Inference & Parameters
Welcome to the Inference Parameters wiki page! This page will give you an overview of the various parameters that are used to control the behavior of the HuggingFace Transformer's inference model. These parameters can have a significant impact on the quality of the generated summary as well as on the computational cost of the inference process.
By understanding how these parameters work and how to adjust them, you will be able to tune the inference of the model for your specific use case and achieve the best possible results.
- "min_length": This parameter controls the minimum length of the generated summary. If set to a higher value, it will ensure that the generated summary is of a certain length, but it may also result in longer and less concise summaries.
- "max_length": This parameter controls the maximum length of the generated summary. If set to a lower value, it will ensure that the generated summary is more concise, but it may also result in summaries that are too short and do not contain all the important information.
- "no_repeat_ngram_size": This parameter controls the number of consecutive words that cannot be repeated in the generated summary. If set to a higher value, it will reduce the chances of the generated summary repeating the same words, but it may also make the summary less fluent.
- "encoder_no_repeat_ngram_size": This parameter controls the number of consecutive words in the input text that cannot be repeated in the generated summary. If set to a higher value, it will ensure that the generated summary is more diverse but may also make it less fluent.
- "repetition_penalty": This parameter controls the penalty applied when a repeated word is generated. If set to a higher value, it will increase the chances of the generated summary being more unique but may also make it less fluent.
- "num_beams": This parameter controls the number of beams used during beam search decoding. A higher value will result in more diverse outputs but it will also increase the computational cost of running inference.
- "num_beam_groups": This parameter controls the number of beam groups used during beam search decoding. A higher value will result in more diverse outputs but it will also increase the computational cost of running inference.
- "length_penalty": This parameter controls the penalty applied on longer sequences during beam search decoding. A higher value will result in more concise outputs, but it may also sacrifice the completeness of the summary.
- "early_stopping": This parameter controls whether to stop decoding early when the generated summary reaches a certain length. If set to true, it will save computation time, but it may also result in incomplete summaries.
- "do_sample": This parameter controls whether to use sampling during decoding or not. If set to true, it will result in more diverse outputs but it will also increase the computational cost of running inference.
The default model is pszemraj/long-t5-tglobal-base-16384-book-summary
, and the default parameters reflect empirical evidence for the tradeoff of performance/compute for the model.
{
"min_length": 8,
"max_length": <DYNAMICALLY SET w.r.t. BATCH SIZE>,
"no_repeat_ngram_size": 3,
"encoder_no_repeat_ngram_size": 4,
"repetition_penalty": 2.5,
"num_beams": 4,
"num_beam_groups": 1,
"length_penalty": 0.8,
"early_stopping": true,
"do_sample": false
}
These parameters should be fairly generalizable to other models but can be updated/reset with the set_inference_params()
method of the Summarizer
class.
THIS IS A WIP, MORE TO COME