Gigabit Ethernet poorly iperf performance #5

calonsoecler · 2023-07-05T11:21:14Z

I got poorly iperf results testing NuMaker-IoT-MA35D1-A1 board Gigabit Ethernet (ETH0).

I could not achieve 1Gbps in standard iperf tests (got around ~750 Mbits/sec) but during iperf bidirectional test results have been really poor not achieving 1Gbps in (TX: ~500 Mbits/sec and RX: ~200 Mbits/sec).

In addiction during iperf test the CPU is highly loaded (CPU0 100%, CPU1 60%). Using top I have seen that one of the elements that is high loading the CPU is ksoftirqd:

Mem: 41876K used, 360564K free, 40K shrd, 432K buff, 8144K cached
CPU:  1.1% usr 17.0% sys  0.0% nic 31.5% idle  0.0% io  0.0% irq 50.1% sirq
Load average: 0.30 0.07 0.02 3/80 356
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    9     2 root     RW       0  0.0   0 49.7 [ksoftirqd/0]
  352   326 root     S     248m 63.1   1 18.0 iperf -c 10.10.10.100 -i 1 -t 30 -d

Here are the iperf output too:

/ # iperf -c 10.10.10.1 -i 1 -t 10
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.10.10.100 port 52930 connected with 10.10.10.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  85.6 MBytes   718 Mbits/sec
[  3]  1.0- 2.0 sec  85.9 MBytes   720 Mbits/sec
[  3]  2.0- 3.0 sec  87.0 MBytes   730 Mbits/sec
[  3]  3.0- 4.0 sec  86.9 MBytes   729 Mbits/sec
[  3]  4.0- 5.0 sec  88.1 MBytes   739 Mbits/sec
[  3]  5.0- 6.0 sec  87.0 MBytes   730 Mbits/sec
[  3]  6.0- 7.0 sec  88.8 MBytes   744 Mbits/sec
[  3]  7.0- 8.0 sec  89.0 MBytes   747 Mbits/sec
[  3]  8.0- 9.0 sec  89.0 MBytes   747 Mbits/sec
[  3]  9.0-10.0 sec  89.1 MBytes   748 Mbits/sec
[  3]  0.0-10.0 sec   876 MBytes   734 Mbits/sec

/ # iperf -c 10.10.10.1 -i 1 -t 10 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  4] local 10.10.10.100 port 56938 connected with 10.10.10.1 port 5001
[  5] local 10.10.10.100 port 5001 connected with 10.10.10.1 port 35472
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 1.0 sec  53.6 MBytes   450 Mbits/sec
[  5]  0.0- 1.0 sec  35.1 MBytes   294 Mbits/sec
[  4]  1.0- 2.0 sec  67.9 MBytes   569 Mbits/sec
[  5]  1.0- 2.0 sec  18.6 MBytes   156 Mbits/sec
[  4]  2.0- 3.0 sec  70.8 MBytes   593 Mbits/sec
[  5]  2.0- 3.0 sec  20.6 MBytes   173 Mbits/sec
[  4]  3.0- 4.0 sec  68.8 MBytes   577 Mbits/sec
[  5]  3.0- 4.0 sec  20.5 MBytes   172 Mbits/sec
[  4]  4.0- 5.0 sec  65.5 MBytes   549 Mbits/sec
[  5]  4.0- 5.0 sec  19.2 MBytes   161 Mbits/sec
[  4]  5.0- 6.0 sec  68.5 MBytes   575 Mbits/sec
[  5]  5.0- 6.0 sec  20.1 MBytes   169 Mbits/sec
[  5]  6.0- 7.0 sec  18.8 MBytes   158 Mbits/sec
[  4]  6.0- 7.0 sec  69.2 MBytes   581 Mbits/sec
[  5]  7.0- 8.0 sec  18.8 MBytes   158 Mbits/sec
[  4]  7.0- 8.0 sec  61.1 MBytes   513 Mbits/sec
[  4]  8.0- 9.0 sec  59.9 MBytes   502 Mbits/sec
[  5]  8.0- 9.0 sec  18.8 MBytes   158 Mbits/sec
[  4]  9.0-10.0 sec  61.1 MBytes   513 Mbits/sec
[  4]  0.0-10.0 sec   646 MBytes   542 Mbits/sec
[  5]  9.0-10.0 sec  19.0 MBytes   160 Mbits/sec
[  5]  0.0-10.0 sec   210 MBytes   176 Mbits/sec
[SUM]  0.0-10.0 sec   245 MBytes   205 Mbits/sec

I'm using last 5.10 kernel from this repository.

Do you know how to fix this issue?

The text was updated successfully, but these errors were encountered:

yclu-ntc · 2023-07-06T09:30:22Z

Here is my testing result with defconfig
device : IOT board <--> 1G router <--> windows PC
iperf version of remote PC : 2.0.10

It looks fine on my environment.

A known way to improve iperf performance is configuring 'compiler optimization level' to -O2 for kernel.
The result of bidirectional test achieves ~550M/s on both Rx & Tx.
Hope it helps!

calonsoecler · 2023-07-06T09:53:04Z

Thank you for answer.
Your results looks better but I was expecting results higher than 900Mbps in the bidirectional test (you are only getting less than 500Mbps) and the CPU are highly loaded during test (ksoftirqd process increments a lot the CPU load).
So I'm guessing maybe there's something related with the kernel

[ Upstream commit 14694179e561b5f2f7e56a0f590e2cb49a9cc7ab ] Trying to suspend to RAM on SAMA5D27 EVK leads to the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.7.0-rc5-wt+ #532 Not tainted -------------------------------------------- sh/92 is trying to acquire lock: c3cf306c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 but task is already holding lock: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by sh/92: #0: c3aa0258 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0xd8/0x178 OpenNuvoton#1: c4c2df44 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x138/0x284 OpenNuvoton#2: c32684a0 (kn->active){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x148/0x284 OpenNuvoton#3: c232b6d4 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x13c/0x4e8 OpenNuvoton#4: c387b088 (&dev->mutex){....}-{3:3}, at: __device_suspend+0x1e8/0x91c OpenNuvoton#5: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 stack backtrace: CPU: 0 PID: 92 Comm: sh Not tainted 6.7.0-rc5-wt+ #532 Hardware name: Atmel SAMA5 unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x48 dump_stack_lvl from __lock_acquire+0x19ec/0x3a0c __lock_acquire from lock_acquire.part.0+0x124/0x2d0 lock_acquire.part.0 from _raw_spin_lock_irqsave+0x5c/0x78 _raw_spin_lock_irqsave from __irq_get_desc_lock+0xe8/0x100 __irq_get_desc_lock from irq_set_irq_wake+0xa8/0x204 irq_set_irq_wake from atmel_gpio_irq_set_wake+0x58/0xb4 atmel_gpio_irq_set_wake from irq_set_irq_wake+0x100/0x204 irq_set_irq_wake from gpio_keys_suspend+0xec/0x2b8 gpio_keys_suspend from dpm_run_callback+0xe4/0x248 dpm_run_callback from __device_suspend+0x234/0x91c __device_suspend from dpm_suspend+0x224/0x43c dpm_suspend from dpm_suspend_start+0x9c/0xa8 dpm_suspend_start from suspend_devices_and_enter+0x1e0/0xa84 suspend_devices_and_enter from pm_suspend+0x460/0x4e8 pm_suspend from state_store+0x78/0xe4 state_store from kernfs_fop_write_iter+0x1a0/0x284 kernfs_fop_write_iter from vfs_write+0x38c/0x6f4 vfs_write from ksys_write+0xd8/0x178 ksys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xc52b3fa8 to 0xc52b3ff0) 3fa0: 00000004 005a0ae8 00000001 005a0ae8 00000004 00000001 3fc0: 00000004 005a0ae8 00000001 00000004 00000004 b6c616c0 00000020 0059d190 3fe0: 00000004 b6c61678 aec5a041 aebf1a26 This warning is raised because pinctrl-at91-pio4 uses chained IRQ. Whenever a wake up source configures an IRQ through irq_set_irq_wake, it will lock the corresponding IRQ desc, and then call irq_set_irq_wake on "parent" IRQ which will do the same on its own IRQ desc, but since those two locks share the same class, lockdep reports this as an issue. Fix lockdep false positive by setting a different class for parent and children IRQ Fixes: 7761808 ("pinctrl: introduce driver for Atmel PIO4 controller") Signed-off-by: Alexis Lothoré <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

@Req

[ Upstream commit 33eae65c6f49770fec7a662935d4eb4a6406d24b ] A small CIFS buffer (448 bytes) isn't big enough to hold SMB2_QUERY_INFO request along with user's input data from CIFS_QUERY_INFO ioctl. That is, if the user passed an input buffer > 344 bytes, the client will memcpy() off the end of @req->Buffer in SMB2_query_info_init() thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in SMB2_query_info_init+0x242/0x250 [cifs] Write of size 1023 at addr ffff88801308c5a8 by task a.out/1240 CPU: 1 PID: 1240 Comm: a.out Not tainted 6.7.0-rc4 OpenNuvoton#5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? SMB2_query_info_init+0x242/0x250 [cifs] ? SMB2_query_info_init+0x242/0x250 [cifs] kasan_check_range+0x105/0x1b0 __asan_memcpy+0x3c/0x60 SMB2_query_info_init+0x242/0x250 [cifs] ? __pfx_SMB2_query_info_init+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? smb_rqst_len+0xa6/0xc0 [cifs] smb2_ioctl_query_info+0x4f4/0x9a0 [cifs] ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs] ? __pfx_cifsConvertToUTF16+0x10/0x10 [cifs] ? kasan_set_track+0x25/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kasan_kmalloc+0x8f/0xa0 ? srso_alias_return_thunk+0x5/0xfbef5 ? cifs_strndup_to_utf16+0x12d/0x1a0 [cifs] ? __build_path_from_dentry_optional_prefix+0x19d/0x2d0 [cifs] ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs] cifs_ioctl+0x11c7/0x1de0 [cifs] ? __pfx_cifs_ioctl+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? rcu_is_watching+0x23/0x50 ? srso_alias_return_thunk+0x5/0xfbef5 ? __rseq_handle_notify_resume+0x6cd/0x850 ? __pfx___schedule+0x10/0x10 ? blkcg_iostat_update+0x250/0x290 ? srso_alias_return_thunk+0x5/0xfbef5 ? ksys_write+0xe9/0x170 __x64_sys_ioctl+0xc9/0x100 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f893dde49cf Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP: 002b:00007ffc03ff4160 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc03ff4378 RCX: 00007f893dde49cf RDX: 00007ffc03ff41d0 RSI: 00000000c018cf07 RDI: 0000000000000003 RBP: 00007ffc03ff4260 R08: 0000000000000410 R09: 0000000000000001 R10: 00007f893dce7300 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc03ff4388 R14: 00007f893df15000 R15: 0000000000406de0 </TASK> Fix this by increasing size of SMB2_QUERY_INFO request buffers and validating input length to prevent other callers from overflowing @Req in SMB2_query_info_init() as well. Fixes: f5b05d6 ("cifs: add IOCTL for QUERY_INFO passthrough to userspace") Cc: [email protected] Reported-by: Robert Morris <[email protected]> Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gigabit Ethernet poorly iperf performance #5

Gigabit Ethernet poorly iperf performance #5

calonsoecler commented Jul 5, 2023

yclu-ntc commented Jul 6, 2023

calonsoecler commented Jul 6, 2023

Gigabit Ethernet poorly iperf performance #5

Gigabit Ethernet poorly iperf performance #5

Comments

calonsoecler commented Jul 5, 2023

yclu-ntc commented Jul 6, 2023

calonsoecler commented Jul 6, 2023