InternEvo-v0.4.0dev20240403
sunpengsdu
released this
03 Apr 12:29
·
122 commits
to develop
since this release
What's Changed
- feat(internlm): refactor code structure based on InternTrain by @huangting4201 in #82
- fix(tokenized/packed_dataset.py): fix packed dataset when train_folder is not None by @huangting4201 in #88
- fix(transformers): fix no white space when chatting with fast tokenizer by @x54-729 in #90
- Fix(TrainState): fix trainstate batch sampler by @zigzagcai in #102
- feat: rm grad profiling by @JiaoPL in #100
- fix(train/pipeline.py): fix nan grad norm by @huangting4201 in #103
- Fix(QA): fix check ckpt loss by @li126com in #89
- improve zero grad communication overlap with pp by @mwiacx in #104
- feat(optimizer/hybrid_zero_optim.py): remove two stage compute norm by @huangting4201 in #106
- fix(embedding.py): fix triton apply_rotary to rotary_emb version by @sallyjunjun in #105
- feat(npu): add Ascend 910B support by @SolenoidWGT in #110
- feat(tokenized/dummy_dataset.py): support fixed seqlen for random dataset samples by @huangting4201 in #119
- Fix(QA): fix test optimizer and no_fa_output by @li126com in #124
- feat(initialize/launch.py): support switch use_packed_dataset by @huangting4201 in #117
- fix apply_rotary_torch not inplace problem by @sallyjunjun in #123
- fix(npu): refactor split_half_float_double and remove str key by @SolenoidWGT in #131
- feat(launch.py): update assert info for use_packed_dataset and fix backend accelerator get error by @huangting4201 in #125
- remove global variable internlm_accelerator by @sallyjunjun in #133
- fix(gpc): remove unused num_processes_on_current_node by @SolenoidWGT in #136
- Fix(support npu): some little bugs for npu support by @li126com in #129
- feat(model): extend dim bsz for packed data for standardizing the sp processing dimension by @huangting4201 in #141
- Fix(device name): use consist way for get device by @li126com in #139
- replace is_cuda with get_accelerator_backend by @sallyjunjun in #143
- Feat(npu): change current_time format to adapt npu profiler by @li126com in #147
- fix INTERNLM2_PUBLIC by @sallyjunjun in #150
- feat(eval): optimize evaluation context and remove DtypeTensor by @huangting4201 in #149
- fix(QA): fix test_forward_output_no_fa by @li126com in #151
- fix(QA): re-adapt some QA code for new version by @li126com in #146
- fix(unpack_data): pad -100 on labels by @sunpengsdu in #154
- feat(attn): support npu flash attention by @SolenoidWGT in #145
- fix(dummy_dataset): fixed_random_dataset_seqlen default is true by @sunpengsdu in #156
- fix(npu): fix attn mask move device by @SolenoidWGT in #159
- fix: little bug by @JiaoPL in #160
- refactor(moe): expose more interfaces for moe by @blankde in #157
- set dummy data fix length false in ci by @sunpengsdu in #163
- feat(mlp): support mlp layer fusion by @SolenoidWGT in #161
- feat(deeplink): add deeplink as new backend by @caikun-pjlab in #168
- fix(optimizer): skip param with requires grad is False by @huangting4201 in #169
- fix internlm_accelerator by @sallyjunjun in #166
- remove timer_diagnosis and bench_gpu by @sallyjunjun in #170
- feat(model): support npu with packed data by @huangting4201 in #167
- fix(modules/multi_head_attention.py): fix distributed attn argument err in npu by @huangting4201 in #172
- fix(utils/logger.py): remove uniscale logger in public repo by @huangting4201 in #118
- fix(activation_checkpoint.py): fix rng mode in activation ckpt by @huangting4201 in #177
New Contributors
- @JiaoPL made their first contribution in #100
- @SolenoidWGT made their first contribution in #110
- @caikun-pjlab made their first contribution in #168
Full Changelog: v0.3.3dev20240315...v0.4.0dev20240403