Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage] When quantizing Qwen2.5-7B-Instruct, the loss is very high and generate many ! #1042

Open
JeremyGe07 opened this issue Jan 7, 2025 · 4 comments

Comments

@JeremyGe07
Copy link

JeremyGe07 commented Jan 7, 2025

Describe the bug

I'm quantizing Qwen2.5-7B-Instruct to GPTQ-INT4 format, using the wikitext demo, and it shows a very high loss. It is kind of the same situation when I use AutoGPTQ to quantize this model using the wikitext. I dont know if it is the reason of Qwen2.5 or the clib_data.
Here is the log:

INFO - {'layer': 0, 'module': 'self_attn.k_proj', 'loss': '0.33916', 'damp': '0.01000', 'time': '0.977', 'fwd_time': '0.953'}
INFO - {'layer': 0, 'module': 'self_attn.v_proj', 'loss': '0.03693', 'damp': '0.01000', 'time': '0.813', 'fwd_time': '0.953'}
INFO - {'layer': 0, 'module': 'self_attn.q_proj', 'loss': '1.39348', 'damp': '0.01000', 'time': '0.862', 'fwd_time': '0.953'}
INFO - {'layer': 0, 'module': 'self_attn.o_proj', 'loss': '0.22118', 'damp': '0.01000', 'time': '0.845', 'fwd_time': '0.547'}
INFO - {'layer': 0, 'module': 'mlp.up_proj', 'loss': '1.57048', 'damp': '0.01000', 'time': '1.090', 'fwd_time': '0.701'}
INFO - {'layer': 0, 'module': 'mlp.gate_proj', 'loss': '2.79310', 'damp': '0.01000', 'time': '1.069', 'fwd_time': '0.701'}
INFO - {'layer': 0, 'module': 'mlp.down_proj', 'loss': '0.30619', 'damp': '0.01000', 'time': '5.746', 'fwd_time': '4.456'}
INFO - {'layer': 1, 'module': 'self_attn.k_proj', 'loss': '0.19179', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.802'}
INFO - {'layer': 1, 'module': 'self_attn.v_proj', 'loss': '0.05828', 'damp': '0.01000', 'time': '0.811', 'fwd_time': '0.802'}
INFO - {'layer': 1, 'module': 'self_attn.q_proj', 'loss': '0.74267', 'damp': '0.01000', 'time': '0.865', 'fwd_time': '0.802'}
INFO - {'layer': 1, 'module': 'self_attn.o_proj', 'loss': '0.05288', 'damp': '0.01000', 'time': '0.852', 'fwd_time': '0.512'}
INFO - {'layer': 1, 'module': 'mlp.up_proj', 'loss': '32.07098', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.670'}
INFO - {'layer': 1, 'module': 'mlp.gate_proj', 'loss': '52.45516', 'damp': '0.01000', 'time': '1.076', 'fwd_time': '0.670'}
INFO - {'layer': 1, 'module': 'mlp.down_proj', 'loss': '0.38026', 'damp': '0.01000', 'time': '5.768', 'fwd_time': '4.452'}
INFO - {'layer': 2, 'module': 'self_attn.k_proj', 'loss': '1.01705', 'damp': '0.01000', 'time': '0.814', 'fwd_time': '0.807'}
INFO - {'layer': 2, 'module': 'self_attn.v_proj', 'loss': '0.18351', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.807'}
INFO - {'layer': 2, 'module': 'self_attn.q_proj', 'loss': '3.82406', 'damp': '0.01000', 'time': '0.864', 'fwd_time': '0.807'}
INFO - {'layer': 2, 'module': 'self_attn.o_proj', 'loss': '0.10180', 'damp': '0.01000', 'time': '0.853', 'fwd_time': '0.519'}
INFO - {'layer': 2, 'module': 'mlp.up_proj', 'loss': '39.45424', 'damp': '0.01000', 'time': '1.072', 'fwd_time': '0.670'}
INFO - {'layer': 2, 'module': 'mlp.gate_proj', 'loss': '65.73445', 'damp': '0.01000', 'time': '1.069', 'fwd_time': '0.670'}
INFO - {'layer': 2, 'module': 'mlp.down_proj', 'loss': '0.78992', 'damp': '0.01000', 'time': '5.759', 'fwd_time': '4.466'}
INFO - {'layer': 3, 'module': 'self_attn.k_proj', 'loss': '1.14903', 'damp': '0.01000', 'time': '0.811', 'fwd_time': '0.805'}
INFO - {'layer': 3, 'module': 'self_attn.v_proj', 'loss': '0.31933', 'damp': '0.01000', 'time': '0.815', 'fwd_time': '0.805'}
INFO - {'layer': 3, 'module': 'self_attn.q_proj', 'loss': '4.63377', 'damp': '0.01000', 'time': '0.863', 'fwd_time': '0.805'}
INFO - {'layer': 3, 'module': 'self_attn.o_proj', 'loss': '0.27049', 'damp': '0.01000', 'time': '0.862', 'fwd_time': '0.516'}
INFO - {'layer': 3, 'module': 'mlp.up_proj', 'loss': '108.06517', 'damp': '0.01000', 'time': '1.088', 'fwd_time': '0.676'}
INFO - {'layer': 3, 'module': 'mlp.gate_proj', 'loss': '140.61984', 'damp': '0.01000', 'time': '1.094', 'fwd_time': '0.676'}
INFO - {'layer': 3, 'module': 'mlp.down_proj', 'loss': '2.41272', 'damp': '0.01000', 'time': '5.783', 'fwd_time': '4.459'}
INFO - {'layer': 4, 'module': 'self_attn.k_proj', 'loss': '2.59363', 'damp': '0.01000', 'time': '0.821', 'fwd_time': '0.806'}
INFO - {'layer': 4, 'module': 'self_attn.v_proj', 'loss': '0.98566', 'damp': '0.01000', 'time': '0.805', 'fwd_time': '0.806'}
INFO - {'layer': 4, 'module': 'self_attn.q_proj', 'loss': '11.84018', 'damp': '0.01000', 'time': '0.859', 'fwd_time': '0.806'}
INFO - {'layer': 4, 'module': 'self_attn.o_proj', 'loss': '0.22618', 'damp': '0.01000', 'time': '0.865', 'fwd_time': '0.518'}
INFO - {'layer': 4, 'module': 'mlp.up_proj', 'loss': '106.62918', 'damp': '0.01000', 'time': '1.073', 'fwd_time': '0.678'}
INFO - {'layer': 4, 'module': 'mlp.gate_proj', 'loss': '153.13940', 'damp': '0.01000', 'time': '1.089', 'fwd_time': '0.678'}
INFO - {'layer': 4, 'module': 'mlp.down_proj', 'loss': '2.27433', 'damp': '0.01000', 'time': '5.834', 'fwd_time': '4.456'}
INFO - {'layer': 5, 'module': 'self_attn.k_proj', 'loss': '2.21922', 'damp': '0.01000', 'time': '0.806', 'fwd_time': '0.808'}
INFO - {'layer': 5, 'module': 'self_attn.v_proj', 'loss': '0.98093', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.808'}
INFO - {'layer': 5, 'module': 'self_attn.q_proj', 'loss': '10.87604', 'damp': '0.01000', 'time': '0.863', 'fwd_time': '0.808'}
INFO - {'layer': 5, 'module': 'self_attn.o_proj', 'loss': '0.21165', 'damp': '0.01000', 'time': '0.850', 'fwd_time': '0.515'}
INFO - {'layer': 5, 'module': 'mlp.up_proj', 'loss': '170.64059', 'damp': '0.01000', 'time': '1.074', 'fwd_time': '0.674'}
INFO - {'layer': 5, 'module': 'mlp.gate_proj', 'loss': '208.53418', 'damp': '0.01000', 'time': '1.092', 'fwd_time': '0.674'}
INFO - {'layer': 5, 'module': 'mlp.down_proj', 'loss': '1.27804', 'damp': '0.01000', 'time': '5.769', 'fwd_time': '4.460'}
INFO - {'layer': 6, 'module': 'self_attn.k_proj', 'loss': '1.45379', 'damp': '0.01000', 'time': '0.804', 'fwd_time': '0.807'}
INFO - {'layer': 6, 'module': 'self_attn.v_proj', 'loss': '0.73029', 'damp': '0.01000', 'time': '0.808', 'fwd_time': '0.807'}
INFO - {'layer': 6, 'module': 'self_attn.q_proj', 'loss': '7.31900', 'damp': '0.01000', 'time': '0.858', 'fwd_time': '0.807'}
INFO - {'layer': 6, 'module': 'self_attn.o_proj', 'loss': '0.38487', 'damp': '0.01000', 'time': '0.848', 'fwd_time': '0.515'}
INFO - {'layer': 6, 'module': 'mlp.up_proj', 'loss': '42.93635', 'damp': '0.01000', 'time': '1.070', 'fwd_time': '0.672'}
INFO - {'layer': 6, 'module': 'mlp.gate_proj', 'loss': '63.74856', 'damp': '0.01000', 'time': '1.062', 'fwd_time': '0.672'}
INFO - {'layer': 6, 'module': 'mlp.down_proj', 'loss': '2.74412', 'damp': '0.01000', 'time': '5.747', 'fwd_time': '4.462'}
INFO - {'layer': 7, 'module': 'self_attn.k_proj', 'loss': '1.57051', 'damp': '0.01000', 'time': '0.812', 'fwd_time': '0.809'}
INFO - {'layer': 7, 'module': 'self_attn.v_proj', 'loss': '1.59149', 'damp': '0.01000', 'time': '0.807', 'fwd_time': '0.809'}
INFO - {'layer': 7, 'module': 'self_attn.q_proj', 'loss': '9.15306', 'damp': '0.01000', 'time': '0.875', 'fwd_time': '0.809'}
INFO - {'layer': 7, 'module': 'self_attn.o_proj', 'loss': '0.92291', 'damp': '0.01000', 'time': '0.891', 'fwd_time': '0.529'}
INFO - {'layer': 7, 'module': 'mlp.up_proj', 'loss': '41.25745', 'damp': '0.01000', 'time': '1.082', 'fwd_time': '0.677'}
INFO - {'layer': 7, 'module': 'mlp.gate_proj', 'loss': '46.96888', 'damp': '0.01000', 'time': '1.078', 'fwd_time': '0.677'}
INFO - {'layer': 7, 'module': 'mlp.down_proj', 'loss': '4.02016', 'damp': '0.01000', 'time': '5.772', 'fwd_time': '4.475'}
INFO - {'layer': 8, 'module': 'self_attn.k_proj', 'loss': '3.00691', 'damp': '0.01000', 'time': '0.818', 'fwd_time': '0.811'}
INFO - {'layer': 8, 'module': 'self_attn.v_proj', 'loss': '1.27559', 'damp': '0.01000', 'time': '0.820', 'fwd_time': '0.811'}
INFO - {'layer': 8, 'module': 'self_attn.q_proj', 'loss': '13.43159', 'damp': '0.01000', 'time': '0.918', 'fwd_time': '0.811'}
INFO - {'layer': 8, 'module': 'self_attn.o_proj', 'loss': '0.70201', 'damp': '0.01000', 'time': '0.852', 'fwd_time': '0.517'}
INFO - {'layer': 8, 'module': 'mlp.up_proj', 'loss': '46.12687', 'damp': '0.01000', 'time': '1.087', 'fwd_time': '0.676'}
INFO - {'layer': 8, 'module': 'mlp.gate_proj', 'loss': '49.03519', 'damp': '0.01000', 'time': '1.075', 'fwd_time': '0.676'}
INFO - {'layer': 8, 'module': 'mlp.down_proj', 'loss': '4.27250', 'damp': '0.01000', 'time': '5.779', 'fwd_time': '4.483'}
INFO - {'layer': 9, 'module': 'self_attn.k_proj', 'loss': '2.09701', 'damp': '0.01000', 'time': '0.808', 'fwd_time': '0.811'}
INFO - {'layer': 9, 'module': 'self_attn.v_proj', 'loss': '2.02709', 'damp': '0.01000', 'time': '0.806', 'fwd_time': '0.811'}
INFO - {'layer': 9, 'module': 'self_attn.q_proj', 'loss': '11.79724', 'damp': '0.01000', 'time': '0.886', 'fwd_time': '0.811'}
INFO - {'layer': 9, 'module': 'self_attn.o_proj', 'loss': '1.44822', 'damp': '0.01000', 'time': '0.854', 'fwd_time': '0.518'}
INFO - {'layer': 9, 'module': 'mlp.up_proj', 'loss': '83.63410', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.677'}
INFO - {'layer': 9, 'module': 'mlp.gate_proj', 'loss': '129.03470', 'damp': '0.01000', 'time': '1.093', 'fwd_time': '0.677'}
INFO - {'layer': 9, 'module': 'mlp.down_proj', 'loss': '4.23457', 'damp': '0.01000', 'time': '5.771', 'fwd_time': '4.482'}
INFO - {'layer': 10, 'module': 'self_attn.k_proj', 'loss': '1.92156', 'damp': '0.01000', 'time': '0.811', 'fwd_time': '0.814'}
INFO - {'layer': 10, 'module': 'self_attn.v_proj', 'loss': '1.15658', 'damp': '0.01000', 'time': '0.815', 'fwd_time': '0.814'}
INFO - {'layer': 10, 'module': 'self_attn.q_proj', 'loss': '10.04698', 'damp': '0.01000', 'time': '0.864', 'fwd_time': '0.814'}
INFO - {'layer': 10, 'module': 'self_attn.o_proj', 'loss': '0.93930', 'damp': '0.01000', 'time': '0.855', 'fwd_time': '0.520'}
INFO - {'layer': 10, 'module': 'mlp.up_proj', 'loss': '45.48151', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.680'}
INFO - {'layer': 10, 'module': 'mlp.gate_proj', 'loss': '51.18117', 'damp': '0.01000', 'time': '1.086', 'fwd_time': '0.680'}
INFO - {'layer': 10, 'module': 'mlp.down_proj', 'loss': '3.59220', 'damp': '0.01000', 'time': '5.797', 'fwd_time': '4.489'}
INFO - {'layer': 11, 'module': 'self_attn.k_proj', 'loss': '2.50373', 'damp': '0.01000', 'time': '0.812', 'fwd_time': '0.813'}
INFO - {'layer': 11, 'module': 'self_attn.v_proj', 'loss': '1.03661', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.813'}
INFO - {'layer': 11, 'module': 'self_attn.q_proj', 'loss': '11.50942', 'damp': '0.01000', 'time': '0.867', 'fwd_time': '0.813'}
INFO - {'layer': 11, 'module': 'self_attn.o_proj', 'loss': '0.91412', 'damp': '0.01000', 'time': '0.853', 'fwd_time': '0.529'}
INFO - {'layer': 11, 'module': 'mlp.up_proj', 'loss': '42.92680', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.677'}
INFO - {'layer': 11, 'module': 'mlp.gate_proj', 'loss': '44.78328', 'damp': '0.01000', 'time': '1.086', 'fwd_time': '0.677'}
INFO - {'layer': 11, 'module': 'mlp.down_proj', 'loss': '3.46916', 'damp': '0.01000', 'time': '5.760', 'fwd_time': '4.497'}
INFO - {'layer': 12, 'module': 'self_attn.k_proj', 'loss': '2.66821', 'damp': '0.01000', 'time': '0.812', 'fwd_time': '0.815'}
INFO - {'layer': 12, 'module': 'self_attn.v_proj', 'loss': '1.35205', 'damp': '0.01000', 'time': '0.812', 'fwd_time': '0.815'}
INFO - {'layer': 12, 'module': 'self_attn.q_proj', 'loss': '12.47675', 'damp': '0.01000', 'time': '0.868', 'fwd_time': '0.815'}
INFO - {'layer': 12, 'module': 'self_attn.o_proj', 'loss': '0.98128', 'damp': '0.01000', 'time': '0.853', 'fwd_time': '0.528'}
INFO - {'layer': 12, 'module': 'mlp.up_proj', 'loss': '43.56601', 'damp': '0.01000', 'time': '1.088', 'fwd_time': '0.681'}
INFO - {'layer': 12, 'module': 'mlp.gate_proj', 'loss': '43.05183', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.681'}
INFO - {'layer': 12, 'module': 'mlp.down_proj', 'loss': '3.55483', 'damp': '0.01000', 'time': '5.811', 'fwd_time': '4.497'}
INFO - {'layer': 13, 'module': 'self_attn.k_proj', 'loss': '2.44931', 'damp': '0.01000', 'time': '0.813', 'fwd_time': '0.813'}
INFO - {'layer': 13, 'module': 'self_attn.v_proj', 'loss': '1.70639', 'damp': '0.01000', 'time': '0.806', 'fwd_time': '0.813'}
INFO - {'layer': 13, 'module': 'self_attn.q_proj', 'loss': '13.17400', 'damp': '0.01000', 'time': '0.860', 'fwd_time': '0.813'}
INFO - {'layer': 13, 'module': 'self_attn.o_proj', 'loss': '1.45650', 'damp': '0.01000', 'time': '0.850', 'fwd_time': '0.520'}
INFO - {'layer': 13, 'module': 'mlp.up_proj', 'loss': '42.50768', 'damp': '0.01000', 'time': '1.082', 'fwd_time': '0.676'}
INFO - {'layer': 13, 'module': 'mlp.gate_proj', 'loss': '44.99652', 'damp': '0.01000', 'time': '1.086', 'fwd_time': '0.676'}
INFO - {'layer': 13, 'module': 'mlp.down_proj', 'loss': '3.30323', 'damp': '0.01000', 'time': '5.748', 'fwd_time': '4.490'}
INFO - {'layer': 14, 'module': 'self_attn.k_proj', 'loss': '3.48210', 'damp': '0.01000', 'time': '0.804', 'fwd_time': '0.812'}
INFO - {'layer': 14, 'module': 'self_attn.v_proj', 'loss': '1.74438', 'damp': '0.01000', 'time': '0.808', 'fwd_time': '0.812'}
INFO - {'layer': 14, 'module': 'self_attn.q_proj', 'loss': '17.86998', 'damp': '0.01000', 'time': '0.861', 'fwd_time': '0.812'}
INFO - {'layer': 14, 'module': 'self_attn.o_proj', 'loss': '1.06893', 'damp': '0.01000', 'time': '0.852', 'fwd_time': '0.517'}
INFO - {'layer': 14, 'module': 'mlp.up_proj', 'loss': '46.18967', 'damp': '0.01000', 'time': '1.072', 'fwd_time': '0.678'}
INFO - {'layer': 14, 'module': 'mlp.gate_proj', 'loss': '46.00350', 'damp': '0.01000', 'time': '1.075', 'fwd_time': '0.678'}
INFO - {'layer': 14, 'module': 'mlp.down_proj', 'loss': '3.57745', 'damp': '0.01000', 'time': '5.795', 'fwd_time': '4.503'}
INFO - {'layer': 15, 'module': 'self_attn.k_proj', 'loss': '3.03772', 'damp': '0.01000', 'time': '0.808', 'fwd_time': '0.813'}
INFO - {'layer': 15, 'module': 'self_attn.v_proj', 'loss': '1.30402', 'damp': '0.01000', 'time': '0.804', 'fwd_time': '0.813'}
INFO - {'layer': 15, 'module': 'self_attn.q_proj', 'loss': '13.75975', 'damp': '0.01000', 'time': '0.865', 'fwd_time': '0.813'}
INFO - {'layer': 15, 'module': 'self_attn.o_proj', 'loss': '0.93601', 'damp': '0.01000', 'time': '0.850', 'fwd_time': '0.521'}
INFO - {'layer': 15, 'module': 'mlp.up_proj', 'loss': '43.17038', 'damp': '0.01000', 'time': '1.074', 'fwd_time': '0.681'}
INFO - {'layer': 15, 'module': 'mlp.gate_proj', 'loss': '41.63631', 'damp': '0.01000', 'time': '1.073', 'fwd_time': '0.681'}
INFO - {'layer': 15, 'module': 'mlp.down_proj', 'loss': '3.64557', 'damp': '0.01000', 'time': '5.777', 'fwd_time': '4.494'}
INFO - {'layer': 16, 'module': 'self_attn.k_proj', 'loss': '2.93110', 'damp': '0.01000', 'time': '0.839', 'fwd_time': '0.832'}
INFO - {'layer': 16, 'module': 'self_attn.v_proj', 'loss': '2.06739', 'damp': '0.01000', 'time': '0.816', 'fwd_time': '0.832'}
INFO - {'layer': 16, 'module': 'self_attn.q_proj', 'loss': '15.05182', 'damp': '0.01000', 'time': '0.877', 'fwd_time': '0.832'}
INFO - {'layer': 16, 'module': 'self_attn.o_proj', 'loss': '1.30684', 'damp': '0.01000', 'time': '0.881', 'fwd_time': '0.530'}
INFO - {'layer': 16, 'module': 'mlp.up_proj', 'loss': '44.90529', 'damp': '0.01000', 'time': '1.110', 'fwd_time': '0.680'}
INFO - {'layer': 16, 'module': 'mlp.gate_proj', 'loss': '43.39167', 'damp': '0.01000', 'time': '1.108', 'fwd_time': '0.680'}
INFO - {'layer': 16, 'module': 'mlp.down_proj', 'loss': '3.72981', 'damp': '0.01000', 'time': '5.887', 'fwd_time': '4.518'}
INFO - {'layer': 17, 'module': 'self_attn.k_proj', 'loss': '3.02538', 'damp': '0.01000', 'time': '0.819', 'fwd_time': '0.816'}
INFO - {'layer': 17, 'module': 'self_attn.v_proj', 'loss': '2.30172', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.816'}
INFO - {'layer': 17, 'module': 'self_attn.q_proj', 'loss': '17.68964', 'damp': '0.01000', 'time': '0.871', 'fwd_time': '0.816'}
INFO - {'layer': 17, 'module': 'self_attn.o_proj', 'loss': '1.16173', 'damp': '0.01000', 'time': '0.855', 'fwd_time': '0.521'}
INFO - {'layer': 17, 'module': 'mlp.up_proj', 'loss': '52.66678', 'damp': '0.01000', 'time': '1.080', 'fwd_time': '0.689'}
INFO - {'layer': 17, 'module': 'mlp.gate_proj', 'loss': '49.82415', 'damp': '0.01000', 'time': '1.085', 'fwd_time': '0.689'}
INFO - {'layer': 17, 'module': 'mlp.down_proj', 'loss': '5.00965', 'damp': '0.01000', 'time': '5.816', 'fwd_time': '4.508'}
INFO - {'layer': 18, 'module': 'self_attn.k_proj', 'loss': '2.40221', 'damp': '0.01000', 'time': '0.850', 'fwd_time': '0.825'}
INFO - {'layer': 18, 'module': 'self_attn.v_proj', 'loss': '2.63599', 'damp': '0.01000', 'time': '0.834', 'fwd_time': '0.825'}
INFO - {'layer': 18, 'module': 'self_attn.q_proj', 'loss': '14.09145', 'damp': '0.01000', 'time': '0.882', 'fwd_time': '0.825'}
INFO - {'layer': 18, 'module': 'self_attn.o_proj', 'loss': '2.54329', 'damp': '0.01000', 'time': '0.866', 'fwd_time': '0.520'}
INFO - {'layer': 18, 'module': 'mlp.up_proj', 'loss': '55.72952', 'damp': '0.01000', 'time': '1.099', 'fwd_time': '0.691'}
INFO - {'layer': 18, 'module': 'mlp.gate_proj', 'loss': '51.83221', 'damp': '0.01000', 'time': '1.100', 'fwd_time': '0.691'}
INFO - {'layer': 18, 'module': 'mlp.down_proj', 'loss': '5.79591', 'damp': '0.01000', 'time': '5.902', 'fwd_time': '4.507'}
INFO - {'layer': 19, 'module': 'self_attn.k_proj', 'loss': '2.57190', 'damp': '0.01000', 'time': '0.810', 'fwd_time': '0.828'}
INFO - {'layer': 19, 'module': 'self_attn.v_proj', 'loss': '3.18076', 'damp': '0.01000', 'time': '0.815', 'fwd_time': '0.828'}
INFO - {'layer': 19, 'module': 'self_attn.q_proj', 'loss': '16.61487', 'damp': '0.01000', 'time': '0.869', 'fwd_time': '0.828'}
INFO - {'layer': 19, 'module': 'self_attn.o_proj', 'loss': '2.55834', 'damp': '0.01000', 'time': '0.850', 'fwd_time': '0.522'}
INFO - {'layer': 19, 'module': 'mlp.up_proj', 'loss': '59.09712', 'damp': '0.01000', 'time': '1.084', 'fwd_time': '0.680'}
INFO - {'layer': 19, 'module': 'mlp.gate_proj', 'loss': '57.89482', 'damp': '0.01000', 'time': '1.084', 'fwd_time': '0.680'}
INFO - {'layer': 19, 'module': 'mlp.down_proj', 'loss': '7.30152', 'damp': '0.01000', 'time': '5.779', 'fwd_time': '4.506'}
INFO - {'layer': 20, 'module': 'self_attn.k_proj', 'loss': '2.43335', 'damp': '0.01000', 'time': '0.812', 'fwd_time': '0.815'}
INFO - {'layer': 20, 'module': 'self_attn.v_proj', 'loss': '3.25151', 'damp': '0.01000', 'time': '0.814', 'fwd_time': '0.815'}
INFO - {'layer': 20, 'module': 'self_attn.q_proj', 'loss': '14.86107', 'damp': '0.01000', 'time': '0.892', 'fwd_time': '0.815'}
INFO - {'layer': 20, 'module': 'self_attn.o_proj', 'loss': '1.92242', 'damp': '0.01000', 'time': '0.851', 'fwd_time': '0.520'}
INFO - {'layer': 20, 'module': 'mlp.up_proj', 'loss': '71.84976', 'damp': '0.01000', 'time': '1.085', 'fwd_time': '0.678'}
INFO - {'layer': 20, 'module': 'mlp.gate_proj', 'loss': '70.50723', 'damp': '0.01000', 'time': '1.078', 'fwd_time': '0.678'}
INFO - {'layer': 20, 'module': 'mlp.down_proj', 'loss': '12.65564', 'damp': '0.01000', 'time': '5.788', 'fwd_time': '4.514'}
INFO - {'layer': 21, 'module': 'self_attn.k_proj', 'loss': '2.55544', 'damp': '0.01000', 'time': '0.809', 'fwd_time': '0.822'}
INFO - {'layer': 21, 'module': 'self_attn.v_proj', 'loss': '4.88172', 'damp': '0.01000', 'time': '0.815', 'fwd_time': '0.822'}
INFO - {'layer': 21, 'module': 'self_attn.q_proj', 'loss': '17.89766', 'damp': '0.01000', 'time': '0.870', 'fwd_time': '0.822'}
INFO - {'layer': 21, 'module': 'self_attn.o_proj', 'loss': '3.67259', 'damp': '0.01000', 'time': '0.857', 'fwd_time': '0.519'}
INFO - {'layer': 21, 'module': 'mlp.up_proj', 'loss': '92.25527', 'damp': '0.01000', 'time': '1.081', 'fwd_time': '0.682'}
INFO - {'layer': 21, 'module': 'mlp.gate_proj', 'loss': '94.65651', 'damp': '0.01000', 'time': '1.079', 'fwd_time': '0.682'}
INFO - {'layer': 21, 'module': 'mlp.down_proj', 'loss': '18.66656', 'damp': '0.01000', 'time': '5.814', 'fwd_time': '4.509'}
INFO - {'layer': 22, 'module': 'self_attn.k_proj', 'loss': '3.25979', 'damp': '0.01000', 'time': '0.819', 'fwd_time': '0.817'}
INFO - {'layer': 22, 'module': 'self_attn.v_proj', 'loss': '7.86918', 'damp': '0.01000', 'time': '0.819', 'fwd_time': '0.817'}
INFO - {'layer': 22, 'module': 'self_attn.q_proj', 'loss': '24.58274', 'damp': '0.01000', 'time': '0.874', 'fwd_time': '0.817'}
INFO - {'layer': 22, 'module': 'self_attn.o_proj', 'loss': '2.46829', 'damp': '0.01000', 'time': '0.866', 'fwd_time': '0.521'}
INFO - {'layer': 22, 'module': 'mlp.up_proj', 'loss': '124.84958', 'damp': '0.01000', 'time': '1.104', 'fwd_time': '0.682'}
INFO - {'layer': 22, 'module': 'mlp.gate_proj', 'loss': '125.82623', 'damp': '0.01000', 'time': '1.110', 'fwd_time': '0.682'}
INFO - {'layer': 22, 'module': 'mlp.down_proj', 'loss': '33.57420', 'damp': '0.01000', 'time': '5.828', 'fwd_time': '4.504'}
INFO - {'layer': 23, 'module': 'self_attn.k_proj', 'loss': '4.11344', 'damp': '0.01000', 'time': '0.813', 'fwd_time': '0.817'}
INFO - {'layer': 23, 'module': 'self_attn.v_proj', 'loss': '11.07268', 'damp': '0.01000', 'time': '0.828', 'fwd_time': '0.817'}
INFO - {'layer': 23, 'module': 'self_attn.q_proj', 'loss': '28.72223', 'damp': '0.01000', 'time': '0.875', 'fwd_time': '0.817'}
INFO - {'layer': 23, 'module': 'self_attn.o_proj', 'loss': '5.86737', 'damp': '0.01000', 'time': '0.856', 'fwd_time': '0.535'}
INFO - {'layer': 23, 'module': 'mlp.up_proj', 'loss': '165.48361', 'damp': '0.01000', 'time': '1.088', 'fwd_time': '0.682'}
INFO - {'layer': 23, 'module': 'mlp.gate_proj', 'loss': '164.49231', 'damp': '0.01000', 'time': '1.084', 'fwd_time': '0.682'}
INFO - {'layer': 23, 'module': 'mlp.down_proj', 'loss': '45.29255', 'damp': '0.01000', 'time': '5.810', 'fwd_time': '4.514'}
INFO - {'layer': 24, 'module': 'self_attn.k_proj', 'loss': '3.46469', 'damp': '0.01000', 'time': '0.823', 'fwd_time': '0.817'}
INFO - {'layer': 24, 'module': 'self_attn.v_proj', 'loss': '11.41009', 'damp': '0.01000', 'time': '0.820', 'fwd_time': '0.817'}
INFO - {'layer': 24, 'module': 'self_attn.q_proj', 'loss': '25.66007', 'damp': '0.01000', 'time': '0.872', 'fwd_time': '0.817'}
INFO - {'layer': 24, 'module': 'self_attn.o_proj', 'loss': '5.21267', 'damp': '0.01000', 'time': '0.861', 'fwd_time': '0.528'}
INFO - {'layer': 24, 'module': 'mlp.up_proj', 'loss': '186.23547', 'damp': '0.01000', 'time': '1.085', 'fwd_time': '0.685'}
INFO - {'layer': 24, 'module': 'mlp.gate_proj', 'loss': '170.98428', 'damp': '0.01000', 'time': '1.084', 'fwd_time': '0.685'}
INFO - {'layer': 24, 'module': 'mlp.down_proj', 'loss': '66.09803', 'damp': '0.01000', 'time': '5.859', 'fwd_time': '4.520'}
INFO - {'layer': 25, 'module': 'self_attn.k_proj', 'loss': '3.48425', 'damp': '0.01000', 'time': '0.808', 'fwd_time': '0.827'}
INFO - {'layer': 25, 'module': 'self_attn.v_proj', 'loss': '18.36866', 'damp': '0.01000', 'time': '0.813', 'fwd_time': '0.827'}
INFO - {'layer': 25, 'module': 'self_attn.q_proj', 'loss': '28.27850', 'damp': '0.01000', 'time': '0.878', 'fwd_time': '0.827'}
INFO - {'layer': 25, 'module': 'self_attn.o_proj', 'loss': '6.77303', 'damp': '0.01000', 'time': '0.854', 'fwd_time': '0.520'}
INFO - {'layer': 25, 'module': 'mlp.up_proj', 'loss': '238.68318', 'damp': '0.01000', 'time': '1.063', 'fwd_time': '0.684'}
INFO - {'layer': 25, 'module': 'mlp.gate_proj', 'loss': '210.09753', 'damp': '0.01000', 'time': '1.066', 'fwd_time': '0.684'}
INFO - {'layer': 25, 'module': 'mlp.down_proj', 'loss': '105.92010', 'damp': '0.01000', 'time': '5.747', 'fwd_time': '4.511'}
INFO - {'layer': 26, 'module': 'self_attn.k_proj', 'loss': '4.48450', 'damp': '0.01000', 'time': '0.842', 'fwd_time': '0.818'}
INFO - {'layer': 26, 'module': 'self_attn.v_proj', 'loss': '35.02821', 'damp': '0.01000', 'time': '0.843', 'fwd_time': '0.818'}
INFO - {'layer': 26, 'module': 'self_attn.q_proj', 'loss': '36.46730', 'damp': '0.01000', 'time': '0.895', 'fwd_time': '0.818'}
INFO - {'layer': 26, 'module': 'self_attn.o_proj', 'loss': '11.41115', 'damp': '0.01000', 'time': '0.868', 'fwd_time': '0.530'}
INFO - {'layer': 26, 'module': 'mlp.up_proj', 'loss': '248.47745', 'damp': '0.01000', 'time': '1.087', 'fwd_time': '0.690'}
INFO - {'layer': 26, 'module': 'mlp.gate_proj', 'loss': '216.40228', 'damp': '0.01000', 'time': '1.112', 'fwd_time': '0.690'}
INFO - {'layer': 26, 'module': 'mlp.down_proj', 'loss': '275.36530', 'damp': '0.01000', 'time': '5.849', 'fwd_time': '4.518'}
INFO - {'layer': 27, 'module': 'self_attn.k_proj', 'loss': '5.46829', 'damp': '0.01000', 'time': '0.829', 'fwd_time': '0.826'}
INFO - {'layer': 27, 'module': 'self_attn.v_proj', 'loss': '42.55381', 'damp': '0.01000', 'time': '0.817', 'fwd_time': '0.826'}
INFO - {'layer': 27, 'module': 'self_attn.q_proj', 'loss': '49.57449', 'damp': '0.01000', 'time': '0.875', 'fwd_time': '0.826'}
INFO - {'layer': 27, 'module': 'self_attn.o_proj', 'loss': '14.80920', 'damp': '0.01000', 'time': '0.876', 'fwd_time': '0.534'}
INFO - {'layer': 27, 'module': 'mlp.up_proj', 'loss': '289.95807', 'damp': '0.01000', 'time': '1.095', 'fwd_time': '0.683'}
INFO - {'layer': 27, 'module': 'mlp.gate_proj', 'loss': '276.49518', 'damp': '0.01000', 'time': '1.080', 'fwd_time': '0.683'}
INFO - {'layer': 27, 'module': 'mlp.down_proj', 'loss': '481.69083', 'damp': '0.01000', 'time': '5.828', 'fwd_time': '4.515'}
INFO - Packing model...
 Packing model.layers.27.mlp.down_proj |----------------------------------------| 100.0%INFO - Model packed.0.0%.0%
 Quantizing mlp.down_proj in layer 27 of 27 |----------------------------------------| 100.0%INFO - Pre-Quantized model size: 14525.67MB, 14.19GB
INFO - Quantized model size: 5317.10MB, 5.19GB
INFO - Size difference: 9208.57MB, 8.99GB - 63.40%
INFO - Pre-Quantized model size: 14525.67MB, 14.19GB
INFO - Quantized model size: 5317.10MB, 5.19GB
INFO - Size difference: 9208.57MB, 8.99GB - 63.40%

GPU Info

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A800 80GB PCIe          Off |   00000000:35:00.0 Off |                    0 |
| N/A   38C    P0             68W /  300W |   34283MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A800 80GB PCIe          Off |   00000000:36:00.0 Off |                    0 |
| N/A   55C    P0             90W /  300W |    3315MiB /  81920MiB |     39%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

Software Info

I'm using the latest GPTQModel==1.5.4.dev0 which is installed from source, torch==2.4.1, transformers==4.47.
I'm also using Cuda-12.0 (I dont know if the cuda12.0 instead of other popular ones like 12.1 or 12.4 casued this problem).

Additional context

My code is

import torch
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
from transformers import AutoTokenizer

pretrained_model_id ='/sharestorage/gjz/models/Qwen2.5-7B-Instruct'
quantized_model_id = '/sharestorage/gjz/models/GPTQModel/Qwen2.5-7B-Instruct-Int4'


# os.makedirs(quantized_model_dir, exist_ok=True)
def get_wikitext2(tokenizer, nsamples, seqlen):
    traindata = load_dataset("wikitext", "wikitext-2-raw-v1", split="train").filter(
        lambda x: len(x["text"]) >= seqlen)

    return [tokenizer(example["text"]) for example in traindata.select(range(nsamples))]


def main():
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_id, use_fast=True)

    traindataset = get_wikitext2(tokenizer, nsamples=256, seqlen=1024)

    quantize_config = QuantizeConfig(
        bits=4,  # quantize model to 4-bit
        group_size=128,  # it is recommended to set the value to 128
        desc_act=False,
    )

    # load un-quantized model, the model will always be force loaded into cpu
    model = GPTQModel.load(pretrained_model_id, quantize_config)

    # quantize model, the calibration_dataset should be list of dict whose keys can only be "input_ids" and "attention_mask"
    # with value under torch.LongTensor type.
    model.quantize(traindataset)

    # save quantized model
    model.save(quantized_model_id)

    # save quantized model using safetensors
    model.save(quantized_model_id)


if __name__ == "__main__":
    import logging

    logging.basicConfig(
        format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
        level=logging.INFO,
        datefmt="%Y-%m-%d %H:%M:%S",
    )

    main()

When I use the quantized model, it generated many !
Here is the code:

from transformers import AutoTokenizer
from gptqmodel import GPTQModel

import time
model_path = '/sharestorage/gjz/models/GPTQModel/Qwen2.5-7B-Instruct-Int4'

device = "cuda:0"
model = GPTQModel.load(model_path, device=device)

tokenizer = AutoTokenizer.from_pretrained(model_path)

prompt = "Give me a short introduction to large language model."

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

Here is the result:

python use_GPTQModel.py 
INFO - You passed a model that is compatible with the Marlin kernel. Use `BACKEND.MARLIN` for optimal inference with batching on Nvidia GPU: `model = GPTQModel.load(..., backend=BACKEND.MARLIN)`.
INFO - Auto pick kernel based on compatibility: <class 'gptqmodel.nn_modules.qlinear.marlin.MarlinQuantLinear'>
INFO - Compatibility: converting `checkpoint_format` from `gptq` to `gptq_v2`.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
A! Large!!! Language! Model! (LL!M! for! short! :)) is!!!!! a!!!!!!! type!! of!!!!!!!!!!! artificial!! intelligence!!!!! system!! that!! is!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@JeremyGe07 JeremyGe07 added the bug Something isn't working label Jan 7, 2025
@JeremyGe07 JeremyGe07 changed the title [BUG] When quantizing Qwen2.5-7B-Instruct, the loss is very high [BUG] When quantizing Qwen2.5-7B-Instruct, the loss is very high and generate many ! Jan 7, 2025
@JeremyGe07
Copy link
Author

Here is log that I use this wikitest demo to quantiz Qwen1.5-7B-Chat into GPTQ-INT4. The loss starts to be large from middle layers and gets very high in the final layers

INFO - {'layer': 0, 'module': 'self_attn.k_proj', 'loss': '0.52142', 'damp': '0.01000', 'time': '1.168', 'fwd_time': '1.044'}
INFO - {'layer': 0, 'module': 'self_attn.v_proj', 'loss': '0.09466', 'damp': '0.01000', 'time': '1.015', 'fwd_time': '1.044'}
INFO - {'layer': 0, 'module': 'self_attn.q_proj', 'loss': '0.50696', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '1.044'}
INFO - {'layer': 0, 'module': 'self_attn.o_proj', 'loss': '0.01221', 'damp': '0.01000', 'time': '1.015', 'fwd_time': '0.539'}
INFO - {'layer': 0, 'module': 'mlp.up_proj', 'loss': '0.58659', 'damp': '0.01000', 'time': '1.129', 'fwd_time': '0.733'}
INFO - {'layer': 0, 'module': 'mlp.gate_proj', 'loss': '0.63909', 'damp': '0.01000', 'time': '1.121', 'fwd_time': '0.733'}
INFO - {'layer': 0, 'module': 'mlp.down_proj', 'loss': '0.01344', 'damp': '0.01000', 'time': '2.966', 'fwd_time': '1.718'}
INFO - {'layer': 1, 'module': 'self_attn.k_proj', 'loss': '0.59070', 'damp': '0.01000', 'time': '1.022', 'fwd_time': '0.889'}
INFO - {'layer': 1, 'module': 'self_attn.v_proj', 'loss': '0.08821', 'damp': '0.01000', 'time': '1.020', 'fwd_time': '0.889'}
INFO - {'layer': 1, 'module': 'self_attn.q_proj', 'loss': '0.50913', 'damp': '0.01000', 'time': '1.025', 'fwd_time': '0.889'}
INFO - {'layer': 1, 'module': 'self_attn.o_proj', 'loss': '0.00368', 'damp': '0.01000', 'time': '1.016', 'fwd_time': '0.500'}
INFO - {'layer': 1, 'module': 'mlp.up_proj', 'loss': '1.10237', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.694'}
INFO - {'layer': 1, 'module': 'mlp.gate_proj', 'loss': '1.20491', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.694'}
INFO - {'layer': 1, 'module': 'mlp.down_proj', 'loss': '0.02448', 'damp': '0.01000', 'time': '2.948', 'fwd_time': '1.692'}
INFO - {'layer': 2, 'module': 'self_attn.k_proj', 'loss': '1.52690', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.891'}
INFO - {'layer': 2, 'module': 'self_attn.v_proj', 'loss': '0.22792', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.891'}
INFO - {'layer': 2, 'module': 'self_attn.q_proj', 'loss': '1.35252', 'damp': '0.01000', 'time': '1.006', 'fwd_time': '0.891'}
INFO - {'layer': 2, 'module': 'self_attn.o_proj', 'loss': '0.00891', 'damp': '0.01000', 'time': '1.007', 'fwd_time': '0.500'}
INFO - {'layer': 2, 'module': 'mlp.up_proj', 'loss': '1.85052', 'damp': '0.01000', 'time': '1.129', 'fwd_time': '0.697'}
INFO - {'layer': 2, 'module': 'mlp.gate_proj', 'loss': '2.07158', 'damp': '0.01000', 'time': '1.112', 'fwd_time': '0.697'}
INFO - {'layer': 2, 'module': 'mlp.down_proj', 'loss': '0.04613', 'damp': '0.01000', 'time': '2.943', 'fwd_time': '1.685'}
INFO - {'layer': 3, 'module': 'self_attn.k_proj', 'loss': '5.86716', 'damp': '0.01000', 'time': '1.014', 'fwd_time': '0.894'}
INFO - {'layer': 3, 'module': 'self_attn.v_proj', 'loss': '1.66758', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.894'}
INFO - {'layer': 3, 'module': 'self_attn.q_proj', 'loss': '5.56973', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.894'}
INFO - {'layer': 3, 'module': 'self_attn.o_proj', 'loss': '0.01230', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.501'}
INFO - {'layer': 3, 'module': 'mlp.up_proj', 'loss': '2.79134', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.697'}
INFO - {'layer': 3, 'module': 'mlp.gate_proj', 'loss': '3.19748', 'damp': '0.01000', 'time': '1.115', 'fwd_time': '0.697'}
INFO - {'layer': 3, 'module': 'mlp.down_proj', 'loss': '0.05241', 'damp': '0.01000', 'time': '2.955', 'fwd_time': '1.686'}
INFO - {'layer': 4, 'module': 'self_attn.k_proj', 'loss': '8.54772', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.893'}
INFO - {'layer': 4, 'module': 'self_attn.v_proj', 'loss': '2.73727', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.893'}
INFO - {'layer': 4, 'module': 'self_attn.q_proj', 'loss': '8.26476', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.893'}
INFO - {'layer': 4, 'module': 'self_attn.o_proj', 'loss': '0.02275', 'damp': '0.01000', 'time': '1.006', 'fwd_time': '0.500'}
INFO - {'layer': 4, 'module': 'mlp.up_proj', 'loss': '4.02876', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.698'}
INFO - {'layer': 4, 'module': 'mlp.gate_proj', 'loss': '4.69588', 'damp': '0.01000', 'time': '1.112', 'fwd_time': '0.698'}
INFO - {'layer': 4, 'module': 'mlp.down_proj', 'loss': '0.08908', 'damp': '0.01000', 'time': '2.936', 'fwd_time': '1.685'}
INFO - {'layer': 5, 'module': 'self_attn.k_proj', 'loss': '9.99406', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.895'}
INFO - {'layer': 5, 'module': 'self_attn.v_proj', 'loss': '3.60776', 'damp': '0.01000', 'time': '1.005', 'fwd_time': '0.895'}
INFO - {'layer': 5, 'module': 'self_attn.q_proj', 'loss': '9.71521', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.895'}
INFO - {'layer': 5, 'module': 'self_attn.o_proj', 'loss': '0.03304', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.501'}
INFO - {'layer': 5, 'module': 'mlp.up_proj', 'loss': '5.22676', 'damp': '0.01000', 'time': '1.121', 'fwd_time': '0.698'}
INFO - {'layer': 5, 'module': 'mlp.gate_proj', 'loss': '6.15891', 'damp': '0.01000', 'time': '1.113', 'fwd_time': '0.698'}
INFO - {'layer': 5, 'module': 'mlp.down_proj', 'loss': '0.13915', 'damp': '0.01000', 'time': '2.953', 'fwd_time': '1.688'}
INFO - {'layer': 6, 'module': 'self_attn.k_proj', 'loss': '10.16533', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.892'}
INFO - {'layer': 6, 'module': 'self_attn.v_proj', 'loss': '3.67891', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.892'}
INFO - {'layer': 6, 'module': 'self_attn.q_proj', 'loss': '9.96202', 'damp': '0.01000', 'time': '1.006', 'fwd_time': '0.892'}
INFO - {'layer': 6, 'module': 'self_attn.o_proj', 'loss': '0.05319', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.501'}
INFO - {'layer': 6, 'module': 'mlp.up_proj', 'loss': '6.59286', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.696'}
INFO - {'layer': 6, 'module': 'mlp.gate_proj', 'loss': '7.76001', 'damp': '0.01000', 'time': '1.116', 'fwd_time': '0.696'}
INFO - {'layer': 6, 'module': 'mlp.down_proj', 'loss': '0.20929', 'damp': '0.01000', 'time': '2.957', 'fwd_time': '1.689'}
INFO - {'layer': 7, 'module': 'self_attn.k_proj', 'loss': '11.03183', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.892'}
INFO - {'layer': 7, 'module': 'self_attn.v_proj', 'loss': '4.21034', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.892'}
INFO - {'layer': 7, 'module': 'self_attn.q_proj', 'loss': '10.97921', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.892'}
INFO - {'layer': 7, 'module': 'self_attn.o_proj', 'loss': '0.08762', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.502'}
INFO - {'layer': 7, 'module': 'mlp.up_proj', 'loss': '8.12533', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.698'}
INFO - {'layer': 7, 'module': 'mlp.gate_proj', 'loss': '9.82936', 'damp': '0.01000', 'time': '1.118', 'fwd_time': '0.698'}
INFO - {'layer': 7, 'module': 'mlp.down_proj', 'loss': '0.33249', 'damp': '0.01000', 'time': '2.942', 'fwd_time': '1.692'}
INFO - {'layer': 8, 'module': 'self_attn.k_proj', 'loss': '11.32015', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.895'}
INFO - {'layer': 8, 'module': 'self_attn.v_proj', 'loss': '4.50380', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.895'}
INFO - {'layer': 8, 'module': 'self_attn.q_proj', 'loss': '11.38033', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.895'}
INFO - {'layer': 8, 'module': 'self_attn.o_proj', 'loss': '0.12802', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.510'}
INFO - {'layer': 8, 'module': 'mlp.up_proj', 'loss': '9.87536', 'damp': '0.01000', 'time': '1.121', 'fwd_time': '0.699'}
INFO - {'layer': 8, 'module': 'mlp.gate_proj', 'loss': '11.56413', 'damp': '0.01000', 'time': '1.114', 'fwd_time': '0.699'}
INFO - {'layer': 8, 'module': 'mlp.down_proj', 'loss': '0.93453', 'damp': '0.01000', 'time': '2.948', 'fwd_time': '1.692'}
INFO - {'layer': 9, 'module': 'self_attn.k_proj', 'loss': '12.94000', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.895'}
INFO - {'layer': 9, 'module': 'self_attn.v_proj', 'loss': '5.12518', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.895'}
INFO - {'layer': 9, 'module': 'self_attn.q_proj', 'loss': '13.91830', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.895'}
INFO - {'layer': 9, 'module': 'self_attn.o_proj', 'loss': '0.29626', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.504'}
INFO - {'layer': 9, 'module': 'mlp.up_proj', 'loss': '11.22705', 'damp': '0.01000', 'time': '1.120', 'fwd_time': '0.705'}
INFO - {'layer': 9, 'module': 'mlp.gate_proj', 'loss': '12.80832', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.705'}
INFO - {'layer': 9, 'module': 'mlp.down_proj', 'loss': '0.59443', 'damp': '0.01000', 'time': '2.945', 'fwd_time': '1.692'}
INFO - {'layer': 10, 'module': 'self_attn.k_proj', 'loss': '13.29867', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.894'}
INFO - {'layer': 10, 'module': 'self_attn.v_proj', 'loss': '5.49040', 'damp': '0.01000', 'time': '1.020', 'fwd_time': '0.894'}
INFO - {'layer': 10, 'module': 'self_attn.q_proj', 'loss': '13.87386', 'damp': '0.01000', 'time': '1.004', 'fwd_time': '0.894'}
INFO - {'layer': 10, 'module': 'self_attn.o_proj', 'loss': '0.26038', 'damp': '0.01000', 'time': '1.018', 'fwd_time': '0.505'}
INFO - {'layer': 10, 'module': 'mlp.up_proj', 'loss': '12.24315', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.699'}
INFO - {'layer': 10, 'module': 'mlp.gate_proj', 'loss': '13.16537', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.699'}
INFO - {'layer': 10, 'module': 'mlp.down_proj', 'loss': '0.64925', 'damp': '0.01000', 'time': '2.940', 'fwd_time': '1.705'}
INFO - {'layer': 11, 'module': 'self_attn.k_proj', 'loss': '13.85116', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.896'}
INFO - {'layer': 11, 'module': 'self_attn.v_proj', 'loss': '6.67272', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.896'}
INFO - {'layer': 11, 'module': 'self_attn.q_proj', 'loss': '14.83078', 'damp': '0.01000', 'time': '1.007', 'fwd_time': '0.896'}
INFO - {'layer': 11, 'module': 'self_attn.o_proj', 'loss': '0.29521', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.503'}
INFO - {'layer': 11, 'module': 'mlp.up_proj', 'loss': '12.50910', 'damp': '0.01000', 'time': '1.116', 'fwd_time': '0.703'}
INFO - {'layer': 11, 'module': 'mlp.gate_proj', 'loss': '12.51478', 'damp': '0.01000', 'time': '1.109', 'fwd_time': '0.703'}
INFO - {'layer': 11, 'module': 'mlp.down_proj', 'loss': '0.74081', 'damp': '0.01000', 'time': '2.938', 'fwd_time': '1.702'}
INFO - {'layer': 12, 'module': 'self_attn.k_proj', 'loss': '15.52137', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.896'}
INFO - {'layer': 12, 'module': 'self_attn.v_proj', 'loss': '7.78562', 'damp': '0.01000', 'time': '1.007', 'fwd_time': '0.896'}
INFO - {'layer': 12, 'module': 'self_attn.q_proj', 'loss': '16.84501', 'damp': '0.01000', 'time': '1.004', 'fwd_time': '0.896'}
INFO - {'layer': 12, 'module': 'self_attn.o_proj', 'loss': '0.32880', 'damp': '0.01000', 'time': '1.019', 'fwd_time': '0.505'}
INFO - {'layer': 12, 'module': 'mlp.up_proj', 'loss': '13.80448', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.700'}
INFO - {'layer': 12, 'module': 'mlp.gate_proj', 'loss': '13.79405', 'damp': '0.01000', 'time': '1.107', 'fwd_time': '0.700'}
INFO - {'layer': 12, 'module': 'mlp.down_proj', 'loss': '0.85139', 'damp': '0.01000', 'time': '2.930', 'fwd_time': '1.694'}
INFO - {'layer': 13, 'module': 'self_attn.k_proj', 'loss': '15.87298', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.896'}
INFO - {'layer': 13, 'module': 'self_attn.v_proj', 'loss': '8.25939', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.896'}
INFO - {'layer': 13, 'module': 'self_attn.q_proj', 'loss': '16.58621', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.896'}
INFO - {'layer': 13, 'module': 'self_attn.o_proj', 'loss': '0.41532', 'damp': '0.01000', 'time': '1.014', 'fwd_time': '0.505'}
INFO - {'layer': 13, 'module': 'mlp.up_proj', 'loss': '14.42260', 'damp': '0.01000', 'time': '1.121', 'fwd_time': '0.712'}
INFO - {'layer': 13, 'module': 'mlp.gate_proj', 'loss': '14.25448', 'damp': '0.01000', 'time': '1.126', 'fwd_time': '0.712'}
INFO - {'layer': 13, 'module': 'mlp.down_proj', 'loss': '0.97871', 'damp': '0.01000', 'time': '2.945', 'fwd_time': '1.704'}
INFO - {'layer': 14, 'module': 'self_attn.k_proj', 'loss': '17.97318', 'damp': '0.01000', 'time': '1.014', 'fwd_time': '0.897'}
INFO - {'layer': 14, 'module': 'self_attn.v_proj', 'loss': '9.42839', 'damp': '0.01000', 'time': '1.006', 'fwd_time': '0.897'}
INFO - {'layer': 14, 'module': 'self_attn.q_proj', 'loss': '18.57025', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.897'}
INFO - {'layer': 14, 'module': 'self_attn.o_proj', 'loss': '0.36061', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.508'}
INFO - {'layer': 14, 'module': 'mlp.up_proj', 'loss': '15.38669', 'damp': '0.01000', 'time': '1.117', 'fwd_time': '0.706'}
INFO - {'layer': 14, 'module': 'mlp.gate_proj', 'loss': '14.62755', 'damp': '0.01000', 'time': '1.112', 'fwd_time': '0.706'}
INFO - {'layer': 14, 'module': 'mlp.down_proj', 'loss': '1.11825', 'damp': '0.01000', 'time': '2.945', 'fwd_time': '1.700'}
INFO - {'layer': 15, 'module': 'self_attn.k_proj', 'loss': '17.26902', 'damp': '0.01000', 'time': '1.023', 'fwd_time': '0.897'}
INFO - {'layer': 15, 'module': 'self_attn.v_proj', 'loss': '9.85618', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.897'}
INFO - {'layer': 15, 'module': 'self_attn.q_proj', 'loss': '17.83921', 'damp': '0.01000', 'time': '1.004', 'fwd_time': '0.897'}
INFO - {'layer': 15, 'module': 'self_attn.o_proj', 'loss': '0.50244', 'damp': '0.01000', 'time': '1.020', 'fwd_time': '0.503'}
INFO - {'layer': 15, 'module': 'mlp.up_proj', 'loss': '16.93336', 'damp': '0.01000', 'time': '1.132', 'fwd_time': '0.703'}
INFO - {'layer': 15, 'module': 'mlp.gate_proj', 'loss': '16.25486', 'damp': '0.01000', 'time': '1.115', 'fwd_time': '0.703'}
INFO - {'layer': 15, 'module': 'mlp.down_proj', 'loss': '1.45094', 'damp': '0.01000', 'time': '2.968', 'fwd_time': '1.696'}
INFO - {'layer': 16, 'module': 'self_attn.k_proj', 'loss': '19.07309', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.897'}
INFO - {'layer': 16, 'module': 'self_attn.v_proj', 'loss': '11.73635', 'damp': '0.01000', 'time': '1.007', 'fwd_time': '0.897'}
INFO - {'layer': 16, 'module': 'self_attn.q_proj', 'loss': '19.46841', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.897'}
INFO - {'layer': 16, 'module': 'self_attn.o_proj', 'loss': '0.51474', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.505'}
INFO - {'layer': 16, 'module': 'mlp.up_proj', 'loss': '19.06713', 'damp': '0.01000', 'time': '1.117', 'fwd_time': '0.705'}
INFO - {'layer': 16, 'module': 'mlp.gate_proj', 'loss': '18.08609', 'damp': '0.01000', 'time': '1.109', 'fwd_time': '0.705'}
INFO - {'layer': 16, 'module': 'mlp.down_proj', 'loss': '1.84688', 'damp': '0.01000', 'time': '2.940', 'fwd_time': '1.700'}
INFO - {'layer': 17, 'module': 'self_attn.k_proj', 'loss': '19.06202', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.900'}
INFO - {'layer': 17, 'module': 'self_attn.v_proj', 'loss': '12.06197', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.900'}
INFO - {'layer': 17, 'module': 'self_attn.q_proj', 'loss': '19.46583', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.900'}
INFO - {'layer': 17, 'module': 'self_attn.o_proj', 'loss': '0.65390', 'damp': '0.01000', 'time': '1.022', 'fwd_time': '0.505'}
INFO - {'layer': 17, 'module': 'mlp.up_proj', 'loss': '21.08129', 'damp': '0.01000', 'time': '1.124', 'fwd_time': '0.702'}
INFO - {'layer': 17, 'module': 'mlp.gate_proj', 'loss': '20.09839', 'damp': '0.01000', 'time': '1.114', 'fwd_time': '0.702'}
INFO - {'layer': 17, 'module': 'mlp.down_proj', 'loss': '2.40124', 'damp': '0.01000', 'time': '2.938', 'fwd_time': '1.696'}
INFO - {'layer': 18, 'module': 'self_attn.k_proj', 'loss': '20.13816', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.909'}
INFO - {'layer': 18, 'module': 'self_attn.v_proj', 'loss': '15.00561', 'damp': '0.01000', 'time': '1.025', 'fwd_time': '0.909'}
INFO - {'layer': 18, 'module': 'self_attn.q_proj', 'loss': '20.52071', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.909'}
INFO - {'layer': 18, 'module': 'self_attn.o_proj', 'loss': '0.69705', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.504'}
INFO - {'layer': 18, 'module': 'mlp.up_proj', 'loss': '24.18870', 'damp': '0.01000', 'time': '1.114', 'fwd_time': '0.709'}
INFO - {'layer': 18, 'module': 'mlp.gate_proj', 'loss': '23.03753', 'damp': '0.01000', 'time': '1.109', 'fwd_time': '0.709'}
INFO - {'layer': 18, 'module': 'mlp.down_proj', 'loss': '3.06773', 'damp': '0.01000', 'time': '2.934', 'fwd_time': '1.705'}
INFO - {'layer': 19, 'module': 'self_attn.k_proj', 'loss': '21.15881', 'damp': '0.01000', 'time': '1.027', 'fwd_time': '0.907'}
INFO - {'layer': 19, 'module': 'self_attn.v_proj', 'loss': '16.22458', 'damp': '0.01000', 'time': '1.023', 'fwd_time': '0.907'}
INFO - {'layer': 19, 'module': 'self_attn.q_proj', 'loss': '21.75827', 'damp': '0.01000', 'time': '1.033', 'fwd_time': '0.907'}
INFO - {'layer': 19, 'module': 'self_attn.o_proj', 'loss': '0.74257', 'damp': '0.01000', 'time': '1.025', 'fwd_time': '0.508'}
INFO - {'layer': 19, 'module': 'mlp.up_proj', 'loss': '28.13776', 'damp': '0.01000', 'time': '1.123', 'fwd_time': '0.709'}
INFO - {'layer': 19, 'module': 'mlp.gate_proj', 'loss': '26.70659', 'damp': '0.01000', 'time': '1.132', 'fwd_time': '0.709'}
INFO - {'layer': 19, 'module': 'mlp.down_proj', 'loss': '3.83124', 'damp': '0.01000', 'time': '2.973', 'fwd_time': '1.701'}
INFO - {'layer': 20, 'module': 'self_attn.k_proj', 'loss': '22.89868', 'damp': '0.01000', 'time': '1.024', 'fwd_time': '0.899'}
INFO - {'layer': 20, 'module': 'self_attn.v_proj', 'loss': '18.50813', 'damp': '0.01000', 'time': '1.026', 'fwd_time': '0.899'}
INFO - {'layer': 20, 'module': 'self_attn.q_proj', 'loss': '23.87872', 'damp': '0.01000', 'time': '1.022', 'fwd_time': '0.899'}
INFO - {'layer': 20, 'module': 'self_attn.o_proj', 'loss': '0.83256', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.504'}
INFO - {'layer': 20, 'module': 'mlp.up_proj', 'loss': '32.66272', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.702'}
INFO - {'layer': 20, 'module': 'mlp.gate_proj', 'loss': '31.56024', 'damp': '0.01000', 'time': '1.110', 'fwd_time': '0.702'}
INFO - {'layer': 20, 'module': 'mlp.down_proj', 'loss': '4.71069', 'damp': '0.01000', 'time': '2.939', 'fwd_time': '1.698'}
INFO - {'layer': 21, 'module': 'self_attn.k_proj', 'loss': '24.55713', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.900'}
INFO - {'layer': 21, 'module': 'self_attn.v_proj', 'loss': '22.65279', 'damp': '0.01000', 'time': '1.010', 'fwd_time': '0.900'}
INFO - {'layer': 21, 'module': 'self_attn.q_proj', 'loss': '25.48935', 'damp': '0.01000', 'time': '1.018', 'fwd_time': '0.900'}
INFO - {'layer': 21, 'module': 'self_attn.o_proj', 'loss': '1.15544', 'damp': '0.01000', 'time': '1.021', 'fwd_time': '0.505'}
INFO - {'layer': 21, 'module': 'mlp.up_proj', 'loss': '38.30571', 'damp': '0.01000', 'time': '1.120', 'fwd_time': '0.712'}
INFO - {'layer': 21, 'module': 'mlp.gate_proj', 'loss': '37.08963', 'damp': '0.01000', 'time': '1.122', 'fwd_time': '0.712'}
INFO - {'layer': 21, 'module': 'mlp.down_proj', 'loss': '6.02415', 'damp': '0.01000', 'time': '2.941', 'fwd_time': '1.699'}
INFO - {'layer': 22, 'module': 'self_attn.k_proj', 'loss': '25.46214', 'damp': '0.01000', 'time': '1.027', 'fwd_time': '0.903'}
INFO - {'layer': 22, 'module': 'self_attn.v_proj', 'loss': '25.23735', 'damp': '0.01000', 'time': '1.031', 'fwd_time': '0.903'}
INFO - {'layer': 22, 'module': 'self_attn.q_proj', 'loss': '27.06219', 'damp': '0.01000', 'time': '1.024', 'fwd_time': '0.903'}
INFO - {'layer': 22, 'module': 'self_attn.o_proj', 'loss': '0.90869', 'damp': '0.01000', 'time': '1.027', 'fwd_time': '0.522'}
INFO - {'layer': 22, 'module': 'mlp.up_proj', 'loss': '43.44689', 'damp': '0.01000', 'time': '1.133', 'fwd_time': '0.703'}
INFO - {'layer': 22, 'module': 'mlp.gate_proj', 'loss': '43.00708', 'damp': '0.01000', 'time': '1.112', 'fwd_time': '0.703'}
INFO - {'layer': 22, 'module': 'mlp.down_proj', 'loss': '7.38065', 'damp': '0.01000', 'time': '2.939', 'fwd_time': '1.700'}
INFO - {'layer': 23, 'module': 'self_attn.k_proj', 'loss': '28.20182', 'damp': '0.01000', 'time': '1.022', 'fwd_time': '0.900'}
INFO - {'layer': 23, 'module': 'self_attn.v_proj', 'loss': '28.54940', 'damp': '0.01000', 'time': '1.021', 'fwd_time': '0.900'}
INFO - {'layer': 23, 'module': 'self_attn.q_proj', 'loss': '29.75095', 'damp': '0.01000', 'time': '1.015', 'fwd_time': '0.900'}
INFO - {'layer': 23, 'module': 'self_attn.o_proj', 'loss': '1.53267', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.508'}
INFO - {'layer': 23, 'module': 'mlp.up_proj', 'loss': '48.24734', 'damp': '0.01000', 'time': '1.128', 'fwd_time': '0.712'}
INFO - {'layer': 23, 'module': 'mlp.gate_proj', 'loss': '47.57780', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.712'}
INFO - {'layer': 23, 'module': 'mlp.down_proj', 'loss': '9.48507', 'damp': '0.01000', 'time': '2.949', 'fwd_time': '1.700'}
INFO - {'layer': 24, 'module': 'self_attn.k_proj', 'loss': '29.54028', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.911'}
INFO - {'layer': 24, 'module': 'self_attn.v_proj', 'loss': '32.33040', 'damp': '0.01000', 'time': '1.033', 'fwd_time': '0.911'}
INFO - {'layer': 24, 'module': 'self_attn.q_proj', 'loss': '31.04103', 'damp': '0.01000', 'time': '1.008', 'fwd_time': '0.911'}
INFO - {'layer': 24, 'module': 'self_attn.o_proj', 'loss': '1.65086', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.506'}
INFO - {'layer': 24, 'module': 'mlp.up_proj', 'loss': '54.99402', 'damp': '0.01000', 'time': '1.123', 'fwd_time': '0.705'}
INFO - {'layer': 24, 'module': 'mlp.gate_proj', 'loss': '54.45904', 'damp': '0.01000', 'time': '1.117', 'fwd_time': '0.705'}
INFO - {'layer': 24, 'module': 'mlp.down_proj', 'loss': '11.70113', 'damp': '0.01000', 'time': '2.973', 'fwd_time': '1.708'}
INFO - {'layer': 25, 'module': 'self_attn.k_proj', 'loss': '33.14951', 'damp': '0.01000', 'time': '1.013', 'fwd_time': '0.907'}
INFO - {'layer': 25, 'module': 'self_attn.v_proj', 'loss': '39.42808', 'damp': '0.01000', 'time': '1.044', 'fwd_time': '0.907'}
INFO - {'layer': 25, 'module': 'self_attn.q_proj', 'loss': '34.91953', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.907'}
INFO - {'layer': 25, 'module': 'self_attn.o_proj', 'loss': '2.63641', 'damp': '0.01000', 'time': '1.020', 'fwd_time': '0.506'}
INFO - {'layer': 25, 'module': 'mlp.up_proj', 'loss': '62.43451', 'damp': '0.01000', 'time': '1.130', 'fwd_time': '0.705'}
INFO - {'layer': 25, 'module': 'mlp.gate_proj', 'loss': '61.15372', 'damp': '0.01000', 'time': '1.118', 'fwd_time': '0.705'}
INFO - {'layer': 25, 'module': 'mlp.down_proj', 'loss': '14.79325', 'damp': '0.01000', 'time': '2.991', 'fwd_time': '1.714'}
INFO - {'layer': 26, 'module': 'self_attn.k_proj', 'loss': '33.59509', 'damp': '0.01000', 'time': '1.024', 'fwd_time': '0.900'}
INFO - {'layer': 26, 'module': 'self_attn.v_proj', 'loss': '40.18967', 'damp': '0.01000', 'time': '1.023', 'fwd_time': '0.900'}
INFO - {'layer': 26, 'module': 'self_attn.q_proj', 'loss': '35.37654', 'damp': '0.01000', 'time': '1.016', 'fwd_time': '0.900'}
INFO - {'layer': 26, 'module': 'self_attn.o_proj', 'loss': '1.81306', 'damp': '0.01000', 'time': '1.017', 'fwd_time': '0.516'}
INFO - {'layer': 26, 'module': 'mlp.up_proj', 'loss': '71.54100', 'damp': '0.01000', 'time': '1.126', 'fwd_time': '0.708'}
INFO - {'layer': 26, 'module': 'mlp.gate_proj', 'loss': '69.37092', 'damp': '0.01000', 'time': '1.127', 'fwd_time': '0.708'}
INFO - {'layer': 26, 'module': 'mlp.down_proj', 'loss': '17.38165', 'damp': '0.01000', 'time': '2.952', 'fwd_time': '1.725'}
INFO - {'layer': 27, 'module': 'self_attn.k_proj', 'loss': '38.19382', 'damp': '0.01000', 'time': '1.016', 'fwd_time': '0.900'}
INFO - {'layer': 27, 'module': 'self_attn.v_proj', 'loss': '46.24634', 'damp': '0.01000', 'time': '1.014', 'fwd_time': '0.900'}
INFO - {'layer': 27, 'module': 'self_attn.q_proj', 'loss': '40.07170', 'damp': '0.01000', 'time': '1.019', 'fwd_time': '0.900'}
INFO - {'layer': 27, 'module': 'self_attn.o_proj', 'loss': '3.17724', 'damp': '0.01000', 'time': '1.019', 'fwd_time': '0.506'}
INFO - {'layer': 27, 'module': 'mlp.up_proj', 'loss': '80.58144', 'damp': '0.01000', 'time': '1.141', 'fwd_time': '0.704'}
INFO - {'layer': 27, 'module': 'mlp.gate_proj', 'loss': '76.71621', 'damp': '0.01000', 'time': '1.128', 'fwd_time': '0.704'}
INFO - {'layer': 27, 'module': 'mlp.down_proj', 'loss': '21.11519', 'damp': '0.01000', 'time': '2.959', 'fwd_time': '1.710'}
INFO - {'layer': 28, 'module': 'self_attn.k_proj', 'loss': '38.74039', 'damp': '0.01000', 'time': '1.020', 'fwd_time': '0.902'}
INFO - {'layer': 28, 'module': 'self_attn.v_proj', 'loss': '53.06177', 'damp': '0.01000', 'time': '1.016', 'fwd_time': '0.902'}
INFO - {'layer': 28, 'module': 'self_attn.q_proj', 'loss': '40.71293', 'damp': '0.01000', 'time': '1.014', 'fwd_time': '0.902'}
INFO - {'layer': 28, 'module': 'self_attn.o_proj', 'loss': '4.16609', 'damp': '0.01000', 'time': '1.016', 'fwd_time': '0.512'}
INFO - {'layer': 28, 'module': 'mlp.up_proj', 'loss': '89.11636', 'damp': '0.01000', 'time': '1.148', 'fwd_time': '0.718'}
INFO - {'layer': 28, 'module': 'mlp.gate_proj', 'loss': '83.74364', 'damp': '0.01000', 'time': '1.143', 'fwd_time': '0.718'}
INFO - {'layer': 28, 'module': 'mlp.down_proj', 'loss': '25.53521', 'damp': '0.01000', 'time': '3.013', 'fwd_time': '1.711'}
INFO - {'layer': 29, 'module': 'self_attn.k_proj', 'loss': '37.80397', 'damp': '0.01000', 'time': '1.041', 'fwd_time': '0.901'}
INFO - {'layer': 29, 'module': 'self_attn.v_proj', 'loss': '55.42223', 'damp': '0.01000', 'time': '1.038', 'fwd_time': '0.901'}
INFO - {'layer': 29, 'module': 'self_attn.q_proj', 'loss': '39.53946', 'damp': '0.01000', 'time': '1.025', 'fwd_time': '0.901'}
INFO - {'layer': 29, 'module': 'self_attn.o_proj', 'loss': '6.58904', 'damp': '0.01000', 'time': '1.021', 'fwd_time': '0.507'}
INFO - {'layer': 29, 'module': 'mlp.up_proj', 'loss': '97.27983', 'damp': '0.01000', 'time': '1.128', 'fwd_time': '0.705'}
INFO - {'layer': 29, 'module': 'mlp.gate_proj', 'loss': '90.20169', 'damp': '0.01000', 'time': '1.127', 'fwd_time': '0.705'}
INFO - {'layer': 29, 'module': 'mlp.down_proj', 'loss': '32.00798', 'damp': '0.01000', 'time': '2.976', 'fwd_time': '1.709'}
INFO - {'layer': 30, 'module': 'self_attn.k_proj', 'loss': '37.61945', 'damp': '0.01000', 'time': '1.027', 'fwd_time': '0.912'}
INFO - {'layer': 30, 'module': 'self_attn.v_proj', 'loss': '51.77380', 'damp': '0.01000', 'time': '1.022', 'fwd_time': '0.912'}
INFO - {'layer': 30, 'module': 'self_attn.q_proj', 'loss': '39.38412', 'damp': '0.01000', 'time': '1.018', 'fwd_time': '0.912'}
INFO - {'layer': 30, 'module': 'self_attn.o_proj', 'loss': '7.93464', 'damp': '0.01000', 'time': '1.019', 'fwd_time': '0.509'}
INFO - {'layer': 30, 'module': 'mlp.up_proj', 'loss': '104.18385', 'damp': '0.01000', 'time': '1.131', 'fwd_time': '0.705'}
INFO - {'layer': 30, 'module': 'mlp.gate_proj', 'loss': '97.08953', 'damp': '0.01000', 'time': '1.119', 'fwd_time': '0.705'}
INFO - {'layer': 30, 'module': 'mlp.down_proj', 'loss': '50.70626', 'damp': '0.01000', 'time': '2.965', 'fwd_time': '1.712'}
INFO - {'layer': 31, 'module': 'self_attn.k_proj', 'loss': '33.36644', 'damp': '0.01000', 'time': '1.009', 'fwd_time': '0.902'}
INFO - {'layer': 31, 'module': 'self_attn.v_proj', 'loss': '36.59711', 'damp': '0.01000', 'time': '1.006', 'fwd_time': '0.902'}
INFO - {'layer': 31, 'module': 'self_attn.q_proj', 'loss': '32.75974', 'damp': '0.01000', 'time': '1.011', 'fwd_time': '0.902'}
INFO - {'layer': 31, 'module': 'self_attn.o_proj', 'loss': '7.34139', 'damp': '0.01000', 'time': '1.012', 'fwd_time': '0.509'}
INFO - {'layer': 31, 'module': 'mlp.up_proj', 'loss': '109.49529', 'damp': '0.01000', 'time': '1.118', 'fwd_time': '0.708'}
INFO - {'layer': 31, 'module': 'mlp.gate_proj', 'loss': '102.16747', 'damp': '0.01000', 'time': '1.121', 'fwd_time': '0.708'}
INFO - {'layer': 31, 'module': 'mlp.down_proj', 'loss': '138.47464', 'damp': '0.01000', 'time': '2.971', 'fwd_time': '1.712'}
INFO - Packing model...
 Packing model.layers.31.mlp.down_proj |----------------------------------------| 100.0%INFO - Model packed.0.0%.0%
 Quantizing mlp.down_proj in layer 31 of 31 |----------------------------------------| 100.0%INFO - Pre-Quantized model size: 14727.30MB, 14.38GB
INFO - Quantized model size: 5588.34MB, 5.46GB
INFO - Size difference: 9138.96MB, 8.92GB - 62.05%
INFO - Pre-Quantized model size: 14727.30MB, 14.38GB
INFO - Quantized model size: 5588.34MB, 5.46GB
INFO - Size difference: 9138.96MB, 8.92GB - 62.05%

Quantized model generates kinda okay respones but will keeps generateing \n in the end of its answer like this:

A large language model, also known as a neural network-based language generator or artificial intelligence (AI) system, is a complex computational system designed to process and generate human-like language at scale. These models are trained on massive amounts of text data, often from the internet, books, or other sources, to learn the patterns, structures, and relationships in language.

The primary objective of large language models is to understand and generate text in various forms, such as text completions, responses to prompts, or even entire articles. They have become prominent in recent years due to their ability to handle tasks like language translation, text summarization, chatbots, and creative writing. These models are typically deep neural networks, like transformers, which have shown exceptional performance in capturing the context and context-free language understanding.

The size of these models refers to the number of parameters they contain, which determines their capacity to capture intricate language nuances and generate more coherent and diverse output. Some of the most well-known large language models include GPT (by OpenAI) and BERT (by Google), which have set new benchmarks in language understanding and generated a lot of buzz in the field.












































@Qubitium
Copy link
Collaborator

Qubitium commented Jan 7, 2025

@JeremyGe07

  • error_loss increasing as layers increase is normal behavior. The errors slowly propagate the deeper you go.
  • Try using c4. Wikitext is not great for calibration and increase your seqlen to 2048.

@Qubitium Qubitium changed the title [BUG] When quantizing Qwen2.5-7B-Instruct, the loss is very high and generate many ! [Usage] When quantizing Qwen2.5-7B-Instruct, the loss is very high and generate many ! Jan 7, 2025
@Qubitium Qubitium removed the bug Something isn't working label Jan 7, 2025
@JeremyGe07
Copy link
Author

JeremyGe07 commented Jan 8, 2025

@Qubitium
Thanks for your reply, I just finish other works and start the experiment now . About choosing calibration data, I have few questions, plz have a look. Really appreciate it !

  1. I have checked the chinese text from c4 , but is looks the quality is low, cuz context is kinda not incoherence (The text looks like it was scraped from websites, and the placement of the origin text probably are random), and there are some strange characters like �. Im curious if this calibration data will give a good result. Correct me if Im wrong. But I will using this to quantize later for test anyway.

  2. I'm quantizing the model to eval on a datatset named CMMLU. It is a chinese dataset like below:

0,世界市场最终完备是在,19世纪末到20世纪初,18世纪末到19世纪初,17世纪末到18世纪初,20世纪末到21世纪初,A
1,“伦理”区别于其他社会关系的特殊性在于,它包含了人自身各种潜质潜能的全面发展,它包含了好坏善恶的价值标准,它包含了天、人、物的和谐相处,它包含了人与人、人与世界关系的事实,B
2,下列不属于董事的伦理责任的是,独立性与监督性融合,底线思维、客观决策,勤勉监督董事及管理层的工作和行为,专业性与判断性兼具,C
3,下列选项中,制定企业经营决策的重要依据是,会计责任,会计信息,会计工作,会计教育,B
4,跨国公司以东道国的市场需求为基点,进行一系列的产品推广、价格制定、渠道扩、广告促销等经营活动,这一策略属于,营销本土化,人员本土化,利益本土化,采购本土化,A
...

Sence I will test the quantized model on this dataset eventually, just for a higher eval score, would it be better to use this CMMLU dataset to create calibration data, or would it be better to use a more general dataset (like the previously mentioned c4) ?

Secondly, If use CMMLU dataset, sence it is made up of multiple-choice questions, so each question and especially its options are quite short. If I simply combine all the text into one to create calibration data (using the join function), would it result in poor quantization performance for the model?

Thirdly, if I should use a more general dataset(says c4), sence I plan to evaluate the quantized model's performance on a CMMLU, a Chinese text dataset , should I use only Chinese text from the c4, or both Chinese and English text?

@Qubitium
Copy link
Collaborator

Qubitium commented Jan 8, 2025

I would recommend both english and chinese mixed dataset. It is good to verify, like you are doing, to make sure the data values are valid and without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@Qubitium @JeremyGe07 and others