Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

CUDA graph fixes for Llama3.1
#1534 opened Apr 14, 2025 by vasunvidia Loading…
Fix AttributeError in MultiTokenPredictionLayer
#1529 opened Apr 12, 2025 by shenyunhang Loading…
Fix typo on distrib_optimizer.py
#1505 opened Mar 26, 2025 by wplf Loading…
fix: MultiLatentAttention cp_comm_type
#1499 opened Mar 24, 2025 by RandMist Loading…
Fix llama_mistral loader by using args.true_vocab_size
#1491 opened Mar 20, 2025 by zhuzilin Loading…
vscode/cursor devcontainer
#1483 opened Mar 14, 2025 by yzhang123 Loading…
Set hashlib.md5 usedforsecurity=False, #1471
#1472 opened Mar 12, 2025 by jsta Loading…
Draft: Youngeun/a2a hiding
#1460 opened Mar 10, 2025 by lhb8125 Loading…
[ENHANCEMENT] add z-loss (improved version)
#1442 opened Feb 28, 2025 by wdevazelhes Loading…
fix seq_aux_loss for DeepSeek-V3
#1439 opened Feb 27, 2025 by yzlnew Loading…
a proof of concept for Distributed Muon
#1428 opened Feb 24, 2025 by toothacher17 Loading…
Fix document regarding GQA (--group-query-attention) argument stale No activity in 60 days on issue or PR
#1401 opened Feb 12, 2025 by eagle705 Loading…
Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc stale No activity in 60 days on issue or PR
#1397 opened Feb 11, 2025 by yeahdongcn Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.