Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel crash when trying to create a datastore from ESXi host #122

Open
naresh1609 opened this issue Jan 20, 2023 · 4 comments
Open

Kernel crash when trying to create a datastore from ESXi host #122

naresh1609 opened this issue Jan 20, 2023 · 4 comments

Comments

@naresh1609
Copy link

Hi,

We are seeing a kernel crash when trying to create datastore on the scst luns.

OS used: RHEL8.6
SCST version: 3.7

Below is the crash log. Can you please let us know if there is any known issue ?

Thanks!!
Naresh

First we see below followed by crash.

[ 3182.061758] [0]: scst: scst_process_active_cmd:4758:CRITICAL ERROR: cmd 0000000035a6257d is in invalid state 15)
[ 3182.061775] BUG at /root/scst-3.7/scst/src/scst_targ.c:4759
[ 3182.061788] WARNING: CPU: 17 PID: 0 at kernel/softirq.c:175 __local_bh_enable_ip+0x35/0x50
[ 3182.061810] Modules linked in: ocs_fc_scst(OE) scst_vdisk(OE) scst(OE) scsi_transport_fc xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate wmi_bmof dell_smbios iTCO_wdt dell_wmi_descriptor iTCO_vendor_support dcdbas ipmi_ssif intel_uncore pcspkr dlm mei_me joydev i2c_i801 lpc_ich mei wmi acpi_ipmi ipmi_si acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm crc32c_intel tg3 libata megaraid_sas i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: scsi_transport_fc]
[ 3182.061964] CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G IOE --------- - - 4.18.0-372.9.1.el8.x86_64 #1
[ 3182.061972] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS 2.12.2 07/09/2021
[ 3182.061975] RIP: 0010:__local_bh_enable_ip+0x35/0x50
[ 3182.061985] Code: 59 a9 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 35 89 22 f2 59 65 8b 05 82 22 f2 59 a9 00 ff ff 00 74 0c 65 ff 0d 74 22 f2 59 c3 <0f> 0b eb da 65 66 8b 05 5f 6c f3 59 66 85 c0 74 e7 e8 65 ff ff ff
[ 3182.061991] RSP: 0018:ffffba0ec6ca0e78 EFLAGS: 00010206
[ 3182.061996] RAX: 000000007fffff00 RBX: ffffffffc0ee014e RCX: ffff9c3f4639c200
[ 3182.062001] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffffc0ee014e
[ 3182.062004] RBP: 0000000000000002 R08: 0000000000000000 R09: c0000000ffff7fff
[ 3182.062007] R10: 0000000000000001 R11: ffffba0ec6ca0c90 R12: 0000000400110011
[ 3182.062010] R13: ffffffffc0f16d6d R14: 000000000000000a R15: 0000000000000002
[ 3182.062014] FS: 0000000000000000(0000) GS:ffff9c4eff600000(0000) knlGS:0000000000000000
[ 3182.062018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3182.062022] CR2: 000055fe044be0a0 CR3: 0000000e02210002 CR4: 00000000007706e0
[ 3182.062026] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3182.062029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3182.062033] PKRU: 55555554
[ 3182.062035] Call Trace:
[ 3182.062040]
[ 3182.062046] scst_process_active_cmd+0x1d8b/0x2050 [scst]
[ 3182.062158] scst_cmd_tasklet+0x9e/0x130 [scst]
[ 3182.062242] tasklet_action_common.isra.17+0x5a/0x100
[ 3182.062254] __do_softirq+0xd7/0x2c4
[ 3182.062265] irq_exit_rcu+0xcb/0xd0
[ 3182.062269] irq_exit+0xa/0x10
[ 3182.062273] do_IRQ+0x7f/0xd0
[ 3182.062280] common_interrupt+0xf/0xf
[ 3182.062286]
[ 3182.062289] RIP: 0010:cpuidle_enter_state+0xda/0x3d0
[ 3182.062302] Code: e8 0b 71 9c ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 a6 02 00 00 31 ff e8 fd 44 a3 ff fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 25 01 00 00 49 63 d6 48 8b 4c 24 10 48 2b 0c 24 48
[ 3182.062307] RSP: 0018:ffffba0ec676fe58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 3182.062312] RAX: ffff9c4eff62ae40 RBX: ffffffffa7cb84e8 RCX: 000000000000001f
[ 3182.062316] RDX: 000002e4e1a9011a RSI: 000000003158b0dc RDI: 0000000000000000
[ 3182.062319] RBP: ffffda0ebf600318 R08: 0000000000000002 R09: 000000000002a680
[ 3182.062321] R10: 00006f8f1954c6f4 R11: ffff9c4eff629b44 R12: 0000000000000003
[ 3182.062324] R13: ffffffffa7cb8380 R14: 0000000000000003 R15: 0000000000000003
[ 3182.062330] ? cpuidle_enter_state+0xb5/0x3d0
[ 3182.062339] cpuidle_enter+0x2c/0x40
[ 3182.062346] do_idle+0x264/0x2c0
[ 3182.062357] cpu_startup_entry+0x6f/0x80
[ 3182.062362] start_secondary+0x1a6/0x1e0
[ 3182.062375] secondary_startup_64_no_verify+0xc2/0xcb
[ 3182.062386] ---[ end trace b4ade325b481dd6b ]---
[ 3208.316863] watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [swapper/17:0]
[ 3208.316865] Modules linked in: ocs_fc_scst(OE) scst_vdisk(OE) scst(OE) scsi_transport_fc xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate wmi_bmof dell_smbios iTCO_wdt dell_wmi_descriptor iTCO_vendor_support dcdbas ipmi_ssif intel_uncore pcspkr dlm mei_me joydev i2c_i801 lpc_ich mei wmi acpi_ipmi ipmi_si acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm crc32c_intel tg3 libata megaraid_sas i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: scsi_transport_fc]
[ 3208.316899] CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G W IOE --------- - - 4.18.0-372.9.1.el8.x86_64 #1
[ 3208.316900] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS 2.12.2 07/09/2021
[ 3208.316901] RIP: 0010:__local_bh_enable_ip+0x37/0x50
[ 3208.316905] Code: 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 35 89 22 f2 59 65 8b 05 82 22 f2 59 a9 00 ff ff 00 74 0c 65 ff 0d 74 22 f2 59 c3 0f 0b da 65 66 8b 05 5f 6c f3 59 66 85 c0 74 e7 e8 65 ff ff ff eb e0
[ 3208.316907] RSP: 0018:ffffba0ec6ca0e78 EFLAGS: 00010206 ORIG_RAX: ffffffffffffff13
[ 3208.316908] RAX: 0000000038f7a900 RBX: ffffffffc0ee014e RCX: ffff9c3f4639c200
[ 3208.316910] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffffc0ee014e
[ 3208.316911] RBP: 0000000000000002 R08: 0000000000000000 R09: c0000000ffff7fff
[ 3208.316911] R10: 0000000000000001 R11: ffffba0ec6ca0c90 R12: 0000000400110011
[ 3208.316912] R13: ffffffffc0f16d6d R14: 000000000000000a R15: 0000000000000002
[ 3208.316913] FS: 0000000000000000(0000) GS:ffff9c4eff600000(0000) knlGS:0000000000000000
[ 3208.316914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3208.316915] CR2: 000055fe044be0a0 CR3: 0000000e02210002 CR4: 00000000007706e0
[ 3208.316916] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3208.316917] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3208.316918] PKRU: 55555554
[ 3208.316918] Call Trace:
[ 3208.316920]
[ 3208.316921] scst_process_active_cmd+0x1d8b/0x2050 [scst]
[ 3208.316954] scst_cmd_tasklet+0x9e/0x130 [scst]
[ 3208.316975] tasklet_action_common.isra.17+0x5a/0x100
[ 3208.316977] __do_softirq+0xd7/0x2c4
[ 3208.316979] irq_exit_rcu+0xcb/0xd0
[ 3208.316981] irq_exit+0xa/0x10
[ 3208.316982] do_IRQ+0x7f/0xd0
[ 3208.316983] common_interrupt+0xf/0xf
[ 3208.316985]
[ 3208.316985] RIP: 0010:cpuidle_enter_state+0xda/0x3d0
[ 3208.316988] Code: e8 0b 71 9c ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 a6 02 00 00 31 ff e8 fd 44 a3 ff fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 25 01 00 00 49 63 d6 48 8b 4c 24 10 48 2b 0c 24 48
[ 3208.316989] RSP: 0018:ffffba0ec676fe58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 3208.316990] RAX: ffff9c4eff62ae40 RBX: ffffffffa7cb84e8 RCX: 000000000000001f
[ 3208.316991] RDX: 000002e4e1a9011a RSI: 000000003158b0dc RDI: 0000000000000000
[ 3208.316992] RBP: ffffda0ebf600318 R08: 0000000000000002 R09: 000000000002a680
[ 3208.316993] R10: 00006f8f1954c6f4 R11: ffff9c4eff629b44 R12: 0000000000000003
[ 3208.316993] R13: ffffffffa7cb8380 R14: 0000000000000003 R15: 0000000000000003
[ 3208.316995] ? cpuidle_enter_state+0xb5/0x3d0
[ 3208.316997] cpuidle_enter+0x2c/0x40
[ 3208.316999] do_idle+0x264/0x2c0
[ 3208.317000] cpu_startup_entry+0x6f/0x80

Crash log:

[ 3351.461407] ------------[ cut here ]------------
[ 3351.461408] kernel BUG at arch/x86/kernel/nmi.c:507!
[ 3351.461409] invalid opcode: 0000 [#1] SMP NOPTI
[ 3351.461410] CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G W IOEL --------- - - 4.18.0-372.9.1.el8.x86_64 #1
[ 3351.461410] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS 2.12.2 07/09/2021
[ 3351.461411] RIP: 0010:do_nmi+0x1f9/0x220
[ 3351.461412] Code: ff ff 31 ff e8 f8 71 1d 00 e9 15 ff ff ff e8 4e d5 96 00 e9 27 fe ff ff 65 c7 05 be db fe 59 02 00 00 00 5b 5d 41 5c 41 5d c3 <0f> 0b 0f 0b 65 48 8b 3d a3 db fe 59 e8 a6 62 04 00 66 90 e9 21 ff
[ 3351.461412] RSP: 0018:fffffe000036ced0 EFLAGS: 00010046
[ 3351.461413] RAX: 0000000000f00000 RBX: 0000000000000007 RCX: 0000000000000000
[ 3351.461414] RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000007
[ 3351.461414] RBP: fffffe000036cef8 R08: 0000000000000000 R09: 0000000000000000
[ 3351.461415] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
[ 3351.461415] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3351.461415] FS: 0000000000000000(0000) GS:ffff9c4eff600000(0000) knlGS:0000000000000000
[ 3351.461416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3351.461416] CR2: 000055fe044be0a0 CR3: 0000000e02210002 CR4: 00000000007706e0
[ 3351.461417] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3351.461417] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3351.461417] PKRU: 55555554
[ 3351.461417] Call Trace:
[ 3351.461418]
[ 3351.461418] end_repeat_nmi+0x16/0x6f
[ 3351.461418] RIP: 0010:report_bug+0x42/0xd0
[ 3351.461419] Code: 00 00 00 49 c7 c0 00 a7 db a7 49 81 f8 d8 d9 dc a7 73 29 48 63 05 4e e0 43 01 4c 01 c0 48 39 c3 75 0d eb 23 49 63 00 4c 01 c0 <48> 39 c3 74 18 49 83 c0 0c 49 81 f8 d8 d9 dc a7 72 e8 48 89 df e8
[ 3351.461419] RSP: 0018:ffffba0ec6ca0d40 EFLAGS: 00000093
[ 3351.461420] RAX: ffffffffa602a4e3 RBX: ffffffffa60f3995 RCX: 0000000000000000
[ 3351.461420] RDX: 0000000000000b01 RSI: ffffffffa60f3997 RDI: ffffba0ec6ca0d30
[ 3351.461420] RBP: ffffba0ec6ca0dc8 R08: ffffffffa7dbaef8 R09: 0000000000000002
[ 3351.461421] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa70c0497
[ 3351.461421] R13: 0000000000000000 R14: 0000000000000006 R15: ffffffffa60f3995
[ 3351.461421] ? __local_bh_enable_ip+0x35/0x50
[ 3351.461421] ? __local_bh_enable_ip+0x35/0x50
[ 3351.461422] ? force_hpet_resume+0x1c3/0x1ca
[ 3351.461422] ? __local_bh_enable_ip+0x37/0x50
[ 3351.461422] ? report_bug+0x42/0xd0
[ 3351.461422] ? report_bug+0x42/0xd0
[ 3351.461423]
[ 3351.461423]
[ 3351.461423] do_error_trap+0x9e/0xd0
[ 3351.461423] do_invalid_op+0x36/0x40
[ 3351.461424] ? __local_bh_enable_ip+0x35/0x50
[ 3351.461424] invalid_op+0x14/0x20
[ 3351.461424] RIP: 0010:__local_bh_enable_ip+0x35/0x50
[ 3351.461425] Code: 59 a9 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 35 89 22 f2 59 65 8b 05 82 22 f2 59 a9 00 ff ff 00 74 0c 65 ff 0d 74 22 f2 59 c3 <0f> 0b eb da 65 66 8b 05 5f 6c f3 59 66 85 c0 74 e7 e8 65 ff ff ff
[ 3351.461425] RSP: 0018:ffffba0ec6ca0e78 EFLAGS: 00010206
[ 3351.461426] RAX: 0000000026f66f00 RBX: ffffffffc0ee014e RCX: ffff9c3f4639c200
[ 3351.461427] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffffc0ee014e
[ 3351.461427] RBP: 0000000000000002 R08: 0000000000000000 R09: c0000000ffff7fff
[ 3351.461428] R10: 0000000000000001 R11: ffffba0ec6ca0c90 R12: 0000000400110011
[ 3351.461428] R13: ffffffffc0f16d6d R14: 000000000000000a R15: 0000000000000002
[ 3351.461428] ? scst_process_active_cmd+0x1d7e/0x2050 [scst]
[ 3351.461429] ? scst_process_active_cmd+0x1d7e/0x2050 [scst]
[ 3351.461429] scst_process_active_cmd+0x1d8b/0x2050 [scst]
[ 3351.461429] scst_cmd_tasklet+0x9e/0x130 [scst]
[ 3351.461430] tasklet_action_common.isra.17+0x5a/0x100
[ 3351.461430] __do_softirq+0xd7/0x2c4
[ 3351.461430] irq_exit_rcu+0xcb/0xd0
[ 3351.461430] irq_exit+0xa/0x10
[ 3351.461431] do_IRQ+0x7f/0xd0
[ 3351.461431] common_interrupt+0xf/0xf
[ 3351.461431]
[ 3351.461431] RIP: 0010:cpuidle_enter_state+0xda/0x3d0
[ 3351.461432] Code: e8 0b 71 9c ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 a6 02 00 00 31 ff e8 fd 44 a3 ff fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 25 01 00 00 49 63 d6 48 8b 4c 24 10 48 2b 0c 24 48
[ 3351.461432] RSP: 0018:ffffba0ec676fe58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 3351.461433] RAX: ffff9c4eff62ae40 RBX: ffffffffa7cb84e8 RCX: 000000000000001f
[ 3351.461433] RDX: 000002e4e1a9011a RSI: 000000003158b0dc RDI: 0000000000000000
[ 3351.461434] RBP: ffffda0ebf600318 R08: 0000000000000002 R09: 000000000002a680
[ 3351.461434] R10: 00006f8f1954c6f4 R11: ffff9c4eff629b44 R12: 0000000000000003
[ 3351.461435] R13: ffffffffa7cb8380 R14: 0000000000000003 R15: 0000000000000003
[ 3351.461435] ? cpuidle_enter_state+0xb5/0x3d0
[ 3351.461435] cpuidle_enter+0x2c/0x40
[ 3351.461435] do_idle+0x264/0x2c0
[ 3351.461436] cpu_startup_entry+0x6f/0x80
[ 3351.461436] start_secondary+0x1a6/0x1e0
[ 3351.461436] secondary_startup_64_no_verify+0xc2/0xcb
[ 3351.461437] Modules linked in: ocs_fc_scst(OE) scst_vdisk(OE) scst(OE) scsi_transport_fc xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate wmi_bmof dell_smbios iTCO_wdt dell_wmi_descriptor iTCO_vendor_support dcdbas ipmi_ssif intel_uncore pcspkr dlm mei_me joydev i2c_i801 lpc_ich mei wmi acpi_ipmi ipmi_si acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea ahci sysfillrect sysimgblt libahci fb_sys_fops drm crc32c_intel tg3 libata megaraid_sas i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: scsi_transport_fc]

@lnocturno
Copy link
Contributor

Hi,

Thank you for the report!

As far as I understand, you have built SCST with make 2debug.
Could you re-test the issue with the SCST with make 2release?

I have an idea about this issue, but I need more time to dive deep into it.

Thanks,
Gleb.

@naresh1609
Copy link
Author

Hi,

Thank you for the report!

As far as I understand, you have built SCST with make 2debug. Could you re-test the issue with the SCST with make 2release?

I have an idea about this issue, but I need more time to dive deep into it.

Thanks, Gleb.

Hi Gleb,

Thanks a lot for your quick reply. We used just "make and make install". Will try with make 2release and will update you the results.

Thanks!!
Naresh

@naresh1609
Copy link
Author

Hi Gleb,

You are right. We tried with 2release and it work fine.

@lnocturno
Copy link
Contributor

Hi,

Could you provide your SCST config? You can obtain it via scstadmin --write_config /dev/stdout.

Thanks,
Gleb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants