NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.7k
Star 12.1k

Code
Issues 260
Pull requests 181
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

New pull request New

181 Open 282 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

CUDA graph fixes for Llama3.1

#1534 opened Apr 14, 2025 by vasunvidia

Loading…

[BUGFIX] Save dist_checkpointing metadata on all nodes for multi-node training

#1531 opened Apr 13, 2025 by Pranaykarvi

Loading…

Fix AttributeError in MultiTokenPredictionLayer

#1529 opened Apr 12, 2025 by shenyunhang

Loading…

added fix to avoid overflow with new numpy casting behaviour (Issue: #1519)

#1520 opened Apr 4, 2025 by Apsod

Loading…

Add full support for Local mode without Apex/TE, and add support for Open XLA on CUDA

#1510 opened Mar 31, 2025 by ajayvohra2005

Loading…

[BUG]: Updating the logic for reducing the load_balancing_loss during logging, such that the correct value is logged while using CUDA Graphs

#1507 opened Mar 27, 2025 by arjun-choudhry

Loading…

Fix typo on distrib_optimizer.py

#1505 opened Mar 26, 2025 by wplf

Loading…

fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups

#1502 opened Mar 25, 2025 by ladyrick

Loading…

fix: MultiLatentAttention cp_comm_type

#1499 opened Mar 24, 2025 by RandMist

Loading…

[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad

#1495 opened Mar 22, 2025 by ETOgaosion

Loading…

Fix llama_mistral loader by using args.true_vocab_size

#1491 opened Mar 20, 2025 by zhuzilin

Loading…

vscode/cursor devcontainer

#1483 opened Mar 14, 2025 by yzhang123

Loading…

Build dataset for all GPUs with tp_rank=0 and pp_rank=0 or -1 in multi-machine training.

#1480 opened Mar 14, 2025 by wan-nan

Loading…

Set hashlib.md5 usedforsecurity=False, #1471

#1472 opened Mar 12, 2025 by jsta

Loading…

Enabling variable_seq_lengths when encoder has Different TP Size

#1470 opened Mar 12, 2025 by xiaojunjie

Loading…

fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__

#1463 opened Mar 11, 2025 by AsakusaRinne

Loading…

Draft: Youngeun/a2a hiding

#1460 opened Mar 10, 2025 by lhb8125

Loading…

[ENHANCEMENT] add z-loss (improved version)

#1442 opened Feb 28, 2025 by wdevazelhes

Loading…

Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0

#1440 opened Feb 27, 2025 by mustious

Loading…

fix seq_aux_loss for DeepSeek-V3

#1439 opened Feb 27, 2025 by yzlnew

Loading…

fix a bug in load balancing loss aggregation when recompute is turned on

#1433 opened Feb 26, 2025 by lyuwen

Loading…

a proof of concept for Distributed Muon

#1428 opened Feb 24, 2025 by toothacher17

Loading…

fix: return float instead of tensor from get_rotary_seq_len

#1419 opened Feb 20, 2025 by jasonchiu-codeium

Loading…

Fix document regarding GQA (--group-query-attention) argument stale

No activity in 60 days on issue or PR

#1401 opened Feb 12, 2025 by eagle705

Loading…

Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc stale

No activity in 60 days on issue or PR

#1397 opened Feb 11, 2025 by yeahdongcn

Loading…

Previous 1 2 3 4 5 6 7 8 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly