-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[BUGFIX] Save dist_checkpointing metadata on all nodes for multi-node training
#1531
opened Apr 13, 2025 by
Pranaykarvi
Loading…
added fix to avoid overflow with new numpy casting behaviour (Issue: #1519)
#1520
opened Apr 4, 2025 by
Apsod
Loading…
Add full support for Local mode without Apex/TE, and add support for Open XLA on CUDA
#1510
opened Mar 31, 2025 by
ajayvohra2005
Loading…
[BUG]: Updating the logic for reducing the load_balancing_loss during logging, such that the correct value is logged while using CUDA Graphs
#1507
opened Mar 27, 2025 by
arjun-choudhry
Loading…
fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups
#1502
opened Mar 25, 2025 by
ladyrick
Loading…
[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad
#1495
opened Mar 22, 2025 by
ETOgaosion
Loading…
Fix llama_mistral loader by using args.true_vocab_size
#1491
opened Mar 20, 2025 by
zhuzilin
Loading…
Build dataset for all GPUs with tp_rank=0 and pp_rank=0 or -1 in multi-machine training.
#1480
opened Mar 14, 2025 by
wan-nan
Loading…
Enabling variable_seq_lengths when encoder has Different TP Size
#1470
opened Mar 12, 2025 by
xiaojunjie
Loading…
fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__
#1463
opened Mar 11, 2025 by
AsakusaRinne
Loading…
Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0
#1440
opened Feb 27, 2025 by
mustious
Loading…
fix a bug in load balancing loss aggregation when recompute is turned on
#1433
opened Feb 26, 2025 by
lyuwen
Loading…
fix: return float instead of tensor from
get_rotary_seq_len
#1419
opened Feb 20, 2025 by
jasonchiu-codeium
Loading…
Fix document regarding GQA (No activity in 60 days on issue or PR
--group-query-attention
) argument
stale
#1401
opened Feb 12, 2025 by
eagle705
Loading…
Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc
stale
No activity in 60 days on issue or PR
#1397
opened Feb 11, 2025 by
yeahdongcn
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.