2024-06-01 09:37:11

by Thorsten Leemhuis

[permalink] [raw]
Subject: [regression] Bug 218902 - Kernel crash/reboot on cifs copy

Hi, Thorsten here, the Linux kernel's regression tracker.

Steve, I noticed a report about a CIFS regression in bugzilla.kernel.org
you might be interested in.

As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to write this mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=218902 :

> Computer crashs when copying files from a local drive to a cifs mounted files system.
>
> This bug exists since kernel 6.8.4 and is always present in last kernel 6.9.
>
> Step to reproduce: on AlmaLinux 9
>
> Boot with kernel 6.9.1-2 or 6.8.4
>
> Open a terminal and copy some files on the cifs mounted directory. (Copy can be performed either by cp * ~/COMMUN or using Thunar)
>
> --> The computer freezes and reboots producing an /boot/initram.....kdump.img file
>
>
> The bug is not present in kernel 5.14.0-427 nor in 6.7.9
> The bug does not affect read operations from the cifs mounted directory
>
> Mount options:
> //192.168.1.112/commun /home/olivier/COMMUN cifs noauto,x-systemd.automount,user,nosuid,gid=100,uid=1026,credentials=/home/olivier/passwd 0 0
> -----------------------------------------------------------
>
> The /var/log/messages contains the following:
> May 19 09:41:12 deyme18 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cifsd:1959]
> [...]
> May 19 09:41:12 deyme18 kernel: CPU: 2 PID: 1959 Comm: cifsd Kdump: loaded Tainted: G E 6.8.4-1.el9.elrepo.x86_64 #1
> May 19 09:41:12 deyme18 kernel: Hardware name: PC Specialist LTD N2x0WU /N2x0WU , BIOS 1.07.18 02/15/2019
> May 19 09:41:12 deyme18 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x72/0x2d0
> May 19 09:41:12 deyme18 kernel: Code: 08 0f 92 c2 8b 45 00 0f b6 d2 c1 e2 08 30 e4 09 d0 a9 00 01 ff ff 0f 85 f2 01 00 00 85 c0 74 12 0f b6 45 00 84 c0 74 0a f3 90 <0f> b6 45 00 84 c0 75 f6 b8 01 00 00 00 66 89 45 00 5b 5d 41 5c 41
> May 19 09:41:12 deyme18 kernel: RSP: 0018:ffffad7d40d3bd60 EFLAGS: 00000202
> May 19 09:41:12 deyme18 kernel: RAX: 0000000000000001 RBX: ffffffffc1ede988 RCX: 0000000000000000
> May 19 09:41:12 deyme18 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc1ede988
> May 19 09:41:12 deyme18 kernel: RBP: ffffffffc1ede988 R08: 0000000000000000 R09: ffff9f5a4b100b40
> May 19 09:41:12 deyme18 kernel: R10: ffff9f5a259eb1c8 R11: 01d9e3d51959a6a5 R12: ffff9f5a03db0038
> May 19 09:41:12 deyme18 kernel: R13: ffff9f5a516f4ec0 R14: ffff9f5a259eb000 R15: 0000000000000000
> May 19 09:41:12 deyme18 kernel: FS: 0000000000000000(0000) GS:ffff9f5d5ec80000(0000) knlGS:0000000000000000
> May 19 09:41:12 deyme18 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> May 19 09:41:12 deyme18 kernel: CR2: 000055fa37ada5c0 CR3: 00000002d7a1e001 CR4: 00000000003706f0
> May 19 09:41:12 deyme18 kernel: Call Trace:
> May 19 09:41:12 deyme18 kernel: <IRQ>
> May 19 09:41:12 deyme18 kernel: ? watchdog_timer_fn+0x261/0x2f0
> May 19 09:41:12 deyme18 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
> May 19 09:41:12 deyme18 kernel: ? __hrtimer_run_queues+0x10f/0x2b0
> May 19 09:41:12 deyme18 kernel: ? hrtimer_interrupt+0x106/0x240
> May 19 09:41:12 deyme18 kernel: ? __sysvec_apic_timer_interrupt+0x6b/0x180
> May 19 09:41:12 deyme18 kernel: ? sysvec_apic_timer_interrupt+0x9d/0xd0
> May 19 09:41:12 deyme18 kernel: </IRQ>
> May 19 09:41:12 deyme18 kernel: <TASK>
> May 19 09:41:12 deyme18 kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> May 19 09:41:12 deyme18 kernel: ? native_queued_spin_lock_slowpath+0x72/0x2d0
> May 19 09:41:12 deyme18 kernel: _raw_spin_lock+0x30/0x40
> May 19 09:41:12 deyme18 kernel: __cifs_put_smb_ses+0x53/0x440 [cifs]
> May 19 09:41:12 deyme18 kernel: smb2_find_smb_tcon+0x61/0xd0 [cifs]
> May 19 09:41:12 deyme18 kernel: smb2_handle_cancelled_mid+0x42/0x90 [cifs]
> May 19 09:41:12 deyme18 kernel: __release_mid+0x8a/0xb0 [cifs]
> May 19 09:41:12 deyme18 kernel: cifs_demultiplex_thread+0x2fc/0x790 [cifs]
> May 19 09:41:12 deyme18 kernel: ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> May 19 09:41:12 deyme18 kernel: kthread+0xee/0x120
> May 19 09:41:12 deyme18 kernel: ? __pfx_kthread+0x10/0x10
> May 19 09:41:12 deyme18 kernel: ret_from_fork+0x2d/0x50
> May 19 09:41:12 deyme18 kernel: ? __pfx_kthread+0x10/0x10
> May 19 09:41:12 deyme18 kernel: ret_from_fork_asm+0x1b/0x30
> May 19 09:41:12 deyme18 kernel: </TASK>
> May 19 09:41:23 deyme18 systemd-logind[719]: The system will reboot now!
> May 19 09:41:23 deyme18 systemd-logind[719]: System is rebooting.

6.7.y worked fine; I asked for a bisection; a second reporter ran into
the same problem; apparently they all use something RHEL9-like.

See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"

P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: v6.7..v6.8
#regzbot title: smb: client:
#regzbot from: Dufgrinder
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218902
#regzbot ignore-activity