2022-11-04 22:40:30

by Mirsad Todorovac

[permalink] [raw]
Subject: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak

Dear all,

When investigating thermald kmemleak, it occurred that the "cat
/sys/kernel/debug/kmemleak"
and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount
of time.

Dmesg output showed expedited stalls that the commands caused NMIs and
NMI backtraces:

[ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
[ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
[ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
[ 8123.263528] NMI backtrace for cpu 0
[ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
[ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
03/08/2022
[ 8123.263557] RIP: 0010:kmemleak_seq_start+0x41/0x80
[ 8123.263579] Code: 55 04 a6 00 4c 63 e0 85 c0 78 40 e8 b9 80 db ff 48
8b 05 92 fb 88 01 4c 8d 60 f8 48 3d 30
                                    62 63 92 75 17 eb 32 49 8b 44 24 08
<48> 83 eb 01 4c 8d 60 f8 48 3d 30 62 63 92 74 1d
                                    48 85 db 7f e6 4c
[ 8123.263588] RSP: 0018:ffff9968e400fc30 EFLAGS: 00000206
[ 8123.263598] RAX: ffff8963b7005b58 RBX: 0000000000011cb8 RCX:
0000000000000001
[ 8123.263604] RDX: ffff8963b09c4000 RSI: ffff8963856de028 RDI:
ffffffff926361c0
[ 8123.263608] RBP: ffff9968e400fc40 R08: 0000000000020000 R09:
ffff896380592a80
[ 8123.263613] R10: 0000000000020000 R11: 0000000000000000 R12:
ffff8964114c2390
[ 8123.263617] R13: ffff89639fa25b00 R14: ffff8963856de000 R15:
ffff9968e400fe30
[ 8123.263622] FS:  00007ff217c15740(0000) GS:ffff896528800000(0000)
knlGS:0000000000000000
[ 8123.263630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8123.263636] CR2: 00007f19bd6a8000 CR3: 000000028e392001 CR4:
0000000000770ef0
[ 8123.263642] PKRU: 55555554
[ 8123.263646] Call Trace:
[ 8123.263649]  <TASK>
[ 8123.263656]  seq_read_iter+0x169/0x420
[ 8123.263671]  seq_read+0xad/0xe0
[ 8123.263685]  full_proxy_read+0x59/0x90
[ 8123.263701]  vfs_read+0xb2/0x2e0
[ 8123.263718]  ksys_read+0x61/0xe0
[ 8123.263730]  __x64_sys_read+0x1a/0x20
[ 8123.263741]  do_syscall_64+0x58/0x80
[ 8123.263754]  ? do_syscall_64+0x67/0x80
[ 8123.263767]  ? exit_to_user_mode_prepare+0x15d/0x190
[ 8123.263781]  ? syscall_exit_to_user_mode+0x1b/0x30
[ 8123.263791]  ? do_syscall_64+0x67/0x80
[ 8123.263804]  ? syscall_exit_to_user_mode+0x1b/0x30
[ 8123.263813]  ? do_syscall_64+0x67/0x80
[ 8123.263823]  ? do_syscall_64+0x67/0x80
[ 8123.263833]  ? do_syscall_64+0x67/0x80
[ 8123.263844]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 8123.263857] RIP: 0033:0x7ff217914992
[ 8123.263867] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d
02 00 0f 1f 44 00 00 f3 0f
                                    1e fa 64 8b 04 25 18 00 00 00 85 c0
75 10 0f 05 <48> 3d 00 f0 ff ff 77 56
                                    c3 0f 1f 44 00 00 48 83 ec 28 48 89
54 24
[ 8123.263875] RSP: 002b:00007ffdfcbc3e28 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 8123.263894] RAX: ffffffffffffffda RBX: 0000000000020000 RCX:
00007ff217914992
[ 8123.263899] RDX: 0000000000020000 RSI: 00007ff217b9d000 RDI:
0000000000000003
[ 8123.263904] RBP: 00007ff217b9d000 R08: 00007ff217b9c010 R09:
00007ff217b9c010
[ 8123.263909] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000022000
[ 8123.263914] R13: 0000000000000003 R14: 0000000000020000 R15:
0000000000020000
[ 8123.263927]  </TASK>

To reproduce:

Enable CONFIG_DEBUG_KMEMLEAK=y in linux_stable 6.1.0-rc3 build.

Then, a stress test on a service known from the previous report is required:

for a in {1..1000}; doRIP: 0010:kmemleak_seq_start+0x41/0x80
         echo $a
         systemctl stop thermald
         sleep 0.5
         systemctl start thermald
         sleep 0.5
done

After that, /sys/kernel/debug/kmemleak indicated 1413 unreferenced objects.

However, dmesg had shown the stalls on CPUs while executing "cat" or
"tail -40" of /sys/kernel/debug/kmemleak.

I've read the
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt, but I was
unable to understand
what is going on except to locate the stall to "RIP:
0010:kmemleak_seq_start+0x41/0x80" and probable
maintainer/developer.

Please find attached the kernel build config, the output of "cat
/sys/kernel/debug/kmemleak" and
dmesg ouput about the expedited stalls. The number of jiffies is not so
large, but IMHO they possibly indicate
greater problems or deadlocks in RCU code in kmemleak.

NOTE:
Please add:

Reported-By: Mirsad Goran Todorovac <[email protected]>

if (or when) the bug is fixed.

Thank you.

-mt

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


Attachments:
config-6.1.0-rc3.gz (64.17 kB)
expedited_stalls_cat_tail_kmemleak.dmesg.gz (24.21 kB)
kmemleak-thermald.txt.gz (30.93 kB)
Download all attachments

2022-11-04 23:05:02

by srinivas pandruvada

[permalink] [raw]
Subject: Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak


On Fri, 2022-11-04 at 23:01 +0100, Mirsad Goran Todorovac wrote:
> Dear all,
>
> When investigating thermald kmemleak, it occurred that the "cat
> /sys/kernel/debug/kmemleak"
> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual
> amount
> of time.
>
This memleak occurred because of bad element in the BIOS configuration
of _OSC method.
Unfortunately fixing this in ACPI interpreter can have other side
effects as this is not expected. So the ACPICA team
(https://acpica.org/community) is looking further.

Thanks,
Srinivas

> Dmesg output showed expedited stalls that the commands caused NMIs
> and
> NMI backtraces:
>
> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU
> debug):
> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
> [ 8123.263528] NMI backtrace for cpu 0
> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
> 03/08/2022
> [ 8123.263557] RIP: 0010:kmemleak_seq_start+0x41/0x80
> [ 8123.263579] Code: 55 04 a6 00 4c 63 e0 85 c0 78 40 e8 b9 80 db ff
> 48
> 8b 05 92 fb 88 01 4c 8d 60 f8 48 3d 30
>                                      62 63 92 75 17 eb 32 49 8b 44 24
> 08
> <48> 83 eb 01 4c 8d 60 f8 48 3d 30 62 63 92 74 1d
>                                      48 85 db 7f e6 4c
> [ 8123.263588] RSP: 0018:ffff9968e400fc30 EFLAGS: 00000206
> [ 8123.263598] RAX: ffff8963b7005b58 RBX: 0000000000011cb8 RCX:
> 0000000000000001
> [ 8123.263604] RDX: ffff8963b09c4000 RSI: ffff8963856de028 RDI:
> ffffffff926361c0
> [ 8123.263608] RBP: ffff9968e400fc40 R08: 0000000000020000 R09:
> ffff896380592a80
> [ 8123.263613] R10: 0000000000020000 R11: 0000000000000000 R12:
> ffff8964114c2390
> [ 8123.263617] R13: ffff89639fa25b00 R14: ffff8963856de000 R15:
> ffff9968e400fe30
> [ 8123.263622] FS:  00007ff217c15740(0000) GS:ffff896528800000(0000)
> knlGS:0000000000000000
> [ 8123.263630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8123.263636] CR2: 00007f19bd6a8000 CR3: 000000028e392001 CR4:
> 0000000000770ef0
> [ 8123.263642] PKRU: 55555554
> [ 8123.263646] Call Trace:
> [ 8123.263649]  <TASK>
> [ 8123.263656]  seq_read_iter+0x169/0x420
> [ 8123.263671]  seq_read+0xad/0xe0
> [ 8123.263685]  full_proxy_read+0x59/0x90
> [ 8123.263701]  vfs_read+0xb2/0x2e0
> [ 8123.263718]  ksys_read+0x61/0xe0
> [ 8123.263730]  __x64_sys_read+0x1a/0x20
> [ 8123.263741]  do_syscall_64+0x58/0x80
> [ 8123.263754]  ? do_syscall_64+0x67/0x80
> [ 8123.263767]  ? exit_to_user_mode_prepare+0x15d/0x190
> [ 8123.263781]  ? syscall_exit_to_user_mode+0x1b/0x30
> [ 8123.263791]  ? do_syscall_64+0x67/0x80
> [ 8123.263804]  ? syscall_exit_to_user_mode+0x1b/0x30
> [ 8123.263813]  ? do_syscall_64+0x67/0x80
> [ 8123.263823]  ? do_syscall_64+0x67/0x80
> [ 8123.263833]  ? do_syscall_64+0x67/0x80
> [ 8123.263844]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 8123.263857] RIP: 0033:0x7ff217914992
> [ 8123.263867] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5
> 1d
> 02 00 0f 1f 44 00 00 f3 0f
>                                      1e fa 64 8b 04 25 18 00 00 00 85
> c0
> 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56
>                                      c3 0f 1f 44 00 00 48 83 ec 28 48
> 89
> 54 24
> [ 8123.263875] RSP: 002b:00007ffdfcbc3e28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000000
> [ 8123.263894] RAX: ffffffffffffffda RBX: 0000000000020000 RCX:
> 00007ff217914992
> [ 8123.263899] RDX: 0000000000020000 RSI: 00007ff217b9d000 RDI:
> 0000000000000003
> [ 8123.263904] RBP: 00007ff217b9d000 R08: 00007ff217b9c010 R09:
> 00007ff217b9c010
> [ 8123.263909] R10: 0000000000000022 R11: 0000000000000246 R12:
> 0000000000022000
> [ 8123.263914] R13: 0000000000000003 R14: 0000000000020000 R15:
> 0000000000020000
> [ 8123.263927]  </TASK>
>
> To reproduce:
>
> Enable CONFIG_DEBUG_KMEMLEAK=y in linux_stable 6.1.0-rc3 build.
>
> Then, a stress test on a service known from the previous report is
> required:
>
> for a in {1..1000}; doRIP: 0010:kmemleak_seq_start+0x41/0x80
>           echo $a
>           systemctl stop thermald
>           sleep 0.5
>           systemctl start thermald
>           sleep 0.5
> done
>
> After that, /sys/kernel/debug/kmemleak indicated 1413 unreferenced
> objects.
>
> However, dmesg had shown the stalls on CPUs while executing "cat" or
> "tail -40" of /sys/kernel/debug/kmemleak.
>
> I've read the
> https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt, but I was
> unable to understand
> what is going on except to locate the stall to "RIP:
> 0010:kmemleak_seq_start+0x41/0x80" and probable
> maintainer/developer.
>
> Please find attached the kernel build config, the output of "cat
> /sys/kernel/debug/kmemleak" and
> dmesg ouput about the expedited stalls. The number of jiffies is not
> so
> large, but IMHO they possibly indicate
> greater problems or deadlocks in RCU code in kmemleak.
>
> NOTE:
> Please add:
>
> Reported-By: Mirsad Goran Todorovac <[email protected]>
>
> if (or when) the bug is fixed.
>
> Thank you.
>
> -mt
>
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu


2022-11-07 10:58:28

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak

On 05. 11. 2022. 00:39, Mirsad Goran Todorovac wrote:

> On 05. 11. 2022. 00:00, srinivas pandruvada wrote:
>
>> On Fri, 2022-11-04 at 23:01 +0100, Mirsad Goran Todorovac wrote:
>>> Dear all,
>>>
>>> When investigating thermald kmemleak, it occurred that the "cat
>>> /sys/kernel/debug/kmemleak"
>>> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual
>>> amount
>>> of time.
>>>
>> This memleak occurred because of bad element in the BIOS configuration
>> of _OSC method.
>> Unfortunately fixing this in ACPI interpreter can have other side
>> effects as this is not expected. So the ACPICA team
>> (https://acpica.org/community) is looking further.
>>
>> Thanks,
>> Srinivas
>
> Dear Mr. Srinivas,
>
> Thank you for the update.
>
> However, the investigation on thermald memleak had lead to discovery
> of the RCU stalls
> in the kmemleak itself. I felt it correct to include you in Cc:
> because thermald caused the
> leaks in question. I suspected myself it is the problem specific to my
> Lenovo Ideapad 3 Intel core
> i5 laptop's BIOS, because I couldn't reproduce it on the other
> (desktop Lenovo) computer.
>
> Thank you very much,
> Mirsad
>
I felt it prudent to report that the issue is still present in 6.1.0-rc4
kernel:

Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004752] rcu:
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 5-.... } 6
jiffies s: 1021 root: 0x20/.
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004777] rcu:
blocking rcu_node structures (internal RCU debug):
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004782] Sending
NMI from CPU 3 to CPUs 5:
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004796] NMI
backtrace for cpu 5
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004802] CPU: 5
PID: 11912 Comm: tail Not tainted 6.1.0-rc4 #1
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004808] Hardware
name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW 03/08/2022
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004810] RIP:
0010:kmemleak_seq_start+0x41/0x80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004821] Code: e5
f5 a5 00 4c 63 e0 85 c0 78 40 e8 09 81 db ff 48 8b 05 d2 fe 88 01 4c 8d
60 f8 48 3d b0 63 03 ad 75 17 eb 32 49 8b 44 24 08 <48> 83 eb 01 4c 8d
60 f8 48 3d b0 63 03 ad 74 1d 48 85 db 7f e6 4c
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004825] RSP:
0018:ffffa7e621bc7cd8 EFLAGS: 00000202
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004829] RAX:
ffff8efb493ae008 RBX: 00000000000418cd RCX: 0000000000000001
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004833] RDX:
ffff8efc709da000 RSI: ffff8efb415937a8 RDI: ffffffffad036340
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004835] RBP:
ffffa7e621bc7ce8 R08: 0000000000002000 R09: ffff8efb44502f80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004837] R10:
0000000000020000 R11: 0000000000000000 R12: ffff8efb493af0a0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004839] R13:
ffff8efb5c6d6200 R14: ffff8efb41593780 R15: ffffa7e621bc7ed8
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004841] FS: 
00007f55020df740(0000) GS:ffff8efce8940000(0000) knlGS:0000000000000000
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004844] CS:  0010
DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004847] CR2:
00001d8300d0c000 CR3: 0000000266aaa003 CR4: 0000000000770ee0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004850] PKRU:
55555554
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004851] Call Trace:
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004853] <TASK>
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004856]
seq_read_iter+0x169/0x420
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004862]  ?
seq_read+0xad/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004867]
seq_read+0xad/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004871]
full_proxy_read+0x59/0x90
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004880]
vfs_read+0xb2/0x2e0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004887]
ksys_read+0x61/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004893]
__x64_sys_read+0x1a/0x20
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004898]
do_syscall_64+0x58/0x80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004905]  ?
common_interrupt+0x54/0xa0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004911]
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004917] RIP:
0033:0x7f5501f14992
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004922] Code: c0
e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff
77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004925] RSP:
002b:00007ffef4b4ce18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004929] RAX:
ffffffffffffffda RBX: 0000000000002000 RCX: 00007f5501f14992
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004931] RDX:
0000000000002000 RSI: 000055f43da40160 RDI: 0000000000000003
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004933] RBP:
000055f43da40160 R08: 000055f43da40fba R09: 000055f43da42180
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004935] R10:
0000000000000077 R11: 0000000000000246 R12: 0000000000000e5a
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004937] R13:
0000000000000003 R14: 000055f43da42180 R15: 0000000000000003
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004943] </TASK>

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


2022-11-09 11:51:54

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 04.11.22 23:01, Mirsad Goran Todorovac wrote:
>
> When investigating thermald kmemleak, it occurred that the "cat
> /sys/kernel/debug/kmemleak"
> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount
> of time.
>
> Dmesg output showed expedited stalls that the commands caused NMIs and
> NMI backtraces:
>
> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
> [ 8123.263528] NMI backtrace for cpu 0
> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
> 03/08/2022

#regzbot ^introduced 28b3ae426598e
#regzbot title rcu: thermald causes a kmemleak
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

2022-11-29 10:04:01

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot

On 09.11.22 11:48, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]
>
> On 04.11.22 23:01, Mirsad Goran Todorovac wrote:
>>
>> When investigating thermald kmemleak, it occurred that the "cat
>> /sys/kernel/debug/kmemleak"
>> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount
>> of time.
>>
>> Dmesg output showed expedited stalls that the commands caused NMIs and
>> NMI backtraces:
>>
>> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
>> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
>> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
>> [ 8123.263528] NMI backtrace for cpu 0
>> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
>> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
>> 03/08/2022
>
> #regzbot ^introduced 28b3ae426598e
> #regzbot title rcu: thermald causes a kmemleak
> #regzbot ignore-activity

#regzbot invalid: likely not a regression and just a odd config

For details see the discussion ending here:
https://lore.kernel.org/all/20221124201552.GV4001@paulmck-ThinkPad-P17-Gen-1/

Ciao, Thorsten