LinuxLists.cc - [tip:sched/core] [sched] af7f588d8f: WARNING:at_kernel/sched/core.c:#sched_mm_cid_after

2022-12-30 07:01:40

Subject: [tip:sched/core] [sched] af7f588d8f: WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve

Greeting,

FYI, we noticed WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve due to commit (built with gcc-11):

commit: af7f588d8f7355bc4298dd1962d7826358fc95f0 ("sched: Introduce per-memory-map concurrency ID")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

[ 17.705597][ T48] ------------[ cut here ]------------
[ 17.706795][ T48] WARNING: CPU: 0 PID: 48 at kernel/sched/core.c:11344 sched_mm_cid_after_execve (??:?)
[ 17.708842][ T48] Modules linked in:
[ 17.709685][ T48] CPU: 0 PID: 48 Comm: kworker/u4:0 Tainted: G T 6.2.0-rc1-00009-gaf7f588d8f73 #1
[ 17.725504][ T48] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 17.727337][ T48] RIP: 0010:sched_mm_cid_after_execve (??:?)
[ 17.728520][ T48] Code: 00 20 75 17 4d 85 ed 75 09 48 ff 05 b9 e6 9e 04 eb 09 48 ff 05 b8 e6 9e 04 eb 20 48 ff 05 b7 e6 9e 04 90 48 ff 05 b7 e6 9e 04 <0f> 0b 48 ff 05 b6 e6 9e 04 90 48 ff 05 b6 e6 9e 04 9c 58 48 ff 05
All code
========
0: 00 20 add %ah,(%rax)
2: 75 17 jne 0x1b
4: 4d 85 ed test %r13,%r13
7: 75 09 jne 0x12
9: 48 ff 05 b9 e6 9e 04 incq 0x49ee6b9(%rip) # 0x49ee6c9
10: eb 09 jmp 0x1b
12: 48 ff 05 b8 e6 9e 04 incq 0x49ee6b8(%rip) # 0x49ee6d1
19: eb 20 jmp 0x3b
1b: 48 ff 05 b7 e6 9e 04 incq 0x49ee6b7(%rip) # 0x49ee6d9
22: 90 nop
23: 48 ff 05 b7 e6 9e 04 incq 0x49ee6b7(%rip) # 0x49ee6e1
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 ff 05 b6 e6 9e 04 incq 0x49ee6b6(%rip) # 0x49ee6e9
33: 90 nop
34: 48 ff 05 b6 e6 9e 04 incq 0x49ee6b6(%rip) # 0x49ee6f1
3b: 9c pushfq
3c: 58 pop %rax
3d: 48 rex.W
3e: ff .byte 0xff
3f: 05 .byte 0x5

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 ff 05 b6 e6 9e 04 incq 0x49ee6b6(%rip) # 0x49ee6bf
9: 90 nop
a: 48 ff 05 b6 e6 9e 04 incq 0x49ee6b6(%rip) # 0x49ee6c7
11: 9c pushfq
12: 58 pop %rax
13: 48 rex.W
14: ff .byte 0xff
15: 05 .byte 0x5
[ 17.732461][ T48] RSP: 0000:ffffc900001afea8 EFLAGS: 00010202
[ 17.733671][ T48] RAX: fffffffffffffffe RBX: ffff88810d0fc000 RCX: 0000000000000000
[ 17.735287][ T48] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88810d0fc000
[ 17.736888][ T48] RBP: ffffc900001afec0 R08: 0000000000000000 R09: 0000000000000000
[ 17.738459][ T48] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810d0fc000
[ 17.740095][ T48] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810d0fc000
[ 17.741661][ T48] FS: 0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[ 17.743440][ T48] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 17.744623][ T48] CR2: ffff88843ffff000 CR3: 0000000003e24000 CR4: 00000000000406f0
[ 17.746241][ T48] Call Trace:
[ 17.746912][ T48] <TASK>
[ 17.747520][ T48] bprm_execve (exec.c:?)
[ 17.748358][ T48] ? call_usermodehelper_exec_work (umh.c:?)
[ 17.749462][ T48] kernel_execve (??:?)
[ 17.750332][ T48] call_usermodehelper_exec_async (umh.c:?)
[ 17.751363][ T48] ? call_usermodehelper_exec_work (umh.c:?)
[ 17.752163][ T48] ret_from_fork (??:?)
[ 17.752648][ T48] </TASK>
[ 17.752951][ T48] irq event stamp: 395
[ 17.753354][ T48] hardirqs last enabled at (403): __up_console_sem (printk.c:?)
[ 17.754946][ T48] hardirqs last disabled at (410): __up_console_sem (printk.c:?)
[ 17.756385][ T48] softirqs last enabled at (278): __do_softirq (??:?)
[ 17.757317][ T48] softirqs last disabled at (273): __irq_exit_rcu (softirq.c:?)
[ 17.758540][ T48] ---[ end trace 0000000000000000 ]---

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]

To reproduce:

# build kernel
cd linux
cp config-6.2.0-rc1-00009-gaf7f588d8f73 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

Attachments:

(No filename) (5.21 kB)
config-6.2.0-rc1-00009-gaf7f588d8f73 (153.42 kB)
job-script (4.75 kB)
dmesg.xz (23.19 kB)
Download all attachments

2022-12-30 13:54:55

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: [tip:sched/core] [sched] af7f588d8f: WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve

On 2022-12-30 01:48, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve due to commit (built with gcc-11):
>
[...]

> [ 17.747520][ T48] bprm_execve (exec.c:?)
> [ 17.748358][ T48] ? call_usermodehelper_exec_work (umh.c:?)
> [ 17.749462][ T48] kernel_execve (??:?)
> [ 17.750332][ T48] call_usermodehelper_exec_async (umh.c:?)
> [ 17.751363][ T48] ? call_usermodehelper_exec_work (umh.c:?)
> [ 17.752163][ T48] ret_from_fork (??:?)

I suspect this check:

void sched_mm_cid_after_execve(struct task_struct *t)
{
struct mm_struct *mm = t->mm;
unsigned long flags;

WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);

is too strict. AFAIU the usermodehelper thread is a kernel thread, which
happens to have a non-NULL mm after execve. We want to allow
usermodehelper threads to use rseq, so I think the appropriate approach
here would be to just warn if !t->mm:

WARN_ON_ONCE(!t->mm);

We should probably apply a similar change to the warning in
sched_mm_cid_fork() as well.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

2023-01-02 14:39:27

by Borislav Petkov

[permalink] [raw]

Subject: Re: [tip:sched/core] [sched] af7f588d8f: WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve

On Fri, Dec 30, 2022 at 08:46:25AM -0500, Mathieu Desnoyers wrote:
> void sched_mm_cid_after_execve(struct task_struct *t)
> {
> struct mm_struct *mm = t->mm;
> unsigned long flags;
>
> WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);

Yeah, it is that check and it reproduces here trivially in my guest so much so
so that I can't even boot current tip/master in it due to the constant flood
from it.

Also, there's a null ptr deref there:

[ 1.694051] Initialise system trusted keyrings
[ 1.694915] ------------[ cut here ]------------
[ 1.695689] BUG: kernel NULL pointer dereference, address: 000000000000005c
[ 1.695714] #PF: supervisor write access in kernel mode
[ 1.695721] #PF: error_code(0x0002) - not-present page
[ 1.695728] PGD 0 P4D 0
[ 1.695739] Oops: 0002 [#1] PREEMPT SMP
[ 1.695747] CPU: 0 PID: 126 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #2
[ 1.695754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.695760] RIP: 0010:_raw_spin_lock+0x17/0x30
[ 1.702127] WARNING: CPU: 13 PID: 115 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xd5/0xf0
[ 1.699309] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 c8 ea 64 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 00 00 00 0f 1f 80 00
[ 1.702857] Modules linked in:
[ 1.699309] RSP: 0018:ffffc900004afe78 EFLAGS: 00010046
[ 1.703670]
[ 1.699309]
[ 1.704665] CPU: 13 PID: 115 Comm: kworker/u32:0 Not tainted 6.2.0-rc2+ #2
[ 1.699309] RAX: 0000000000000000 RBX: ffff88800d323d00 RCX: 0000000000000000
[ 1.699309] RDX: 0000000000000001 RSI: ffff88800d323d00 RDI: 000000000000005c
[ 1.699309] RBP: 000000000000005c R08: 0000000000000064 R09: ffffc900004afb30
[ 1.699309] R10: 0000000000000000 R11: fffffffffffffffe R12: 0000000000000246
[ 1.699309] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800d323d00
[ 1.699309] FS: 0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[ 1.699309] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.699309] CR2: 000000000000005c CR3: 000000000220a000 CR4: 00000000003506f0
[ 1.699309] Call Trace:
[ 1.699309] <TASK>
[ 1.699309] sched_mm_cid_after_execve+0x52/0xf0
[ 1.706650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.699309] bprm_execve+0x323/0x600
[ 1.707390] RIP: 0010:sched_mm_cid_after_execve+0xd5/0xf0
[ 1.699309] kernel_execve+0x15f/0x1c0
[ 1.707967] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 e9 65 ff ff ff 41 be ff ff ff ff eb 9d 66 66 2e 0f 1f 84
[ 1.699309] call_usermodehelper_exec_async+0xd1/0x190
[ 1.708882] RSP: 0018:ffffc90000457e80 EFLAGS: 00010246
[ 1.699309] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[ 1.709839]
[ 1.699309] ret_from_fork+0x2c/0x50
[ 1.710739] RAX: fffffffffffffffe RBX: ffff88800cad8f40 RCX: 0000000000000000
[ 1.699309] </TASK>
[ 1.714247] RDX: ffffc90000457dc8 RSI: ffff88800cad8f40 RDI: ffff88800cad8f40
[ 1.699309] Modules linked in:
[ 1.715270] RBP: ffff88800dd35400 R08: 0000000000000064 R09: ffffc90000457b30
[ 1.699309] CR2: 000000000000005c
[ 1.699309] ---[ end trace 0000000000000000 ]---

... flood of the above...

> is too strict. AFAIU the usermodehelper thread is a kernel thread, which
> happens to have a non-NULL mm after execve. We want to allow usermodehelper
> threads to use rseq, so I think the appropriate approach here would be to
> just warn if !t->mm:
>
> WARN_ON_ONCE(!t->mm);

You need at least this to avoid the null ptr deref too:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 048ec2417990..5c920c94a6b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11340,10 +11340,13 @@ void sched_mm_cid_before_execve(struct task_struct *t)

void sched_mm_cid_after_execve(struct task_struct *t)
{
- struct mm_struct *mm = t->mm;
+ struct mm_struct *mm;
unsigned long flags;

- WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);
+ if (WARN_ON_ONCE(!t->mm))
+ return;
+
+ mm = t->mm;

local_irq_save(flags);
t->mm_cid = mm_cid_get(mm);
---

which gives the below.

I'm not sure though how the rules are about those kworker threads and them
having a ->mm...

[ 1.734104] ------------[ cut here ]------------
[ 1.734144] Initialise system trusted keyrings
[ 1.734553] WARNING: CPU: 9 PID: 109 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xcb/0xe0
[ 1.752756] workingset: timestamp_bits=61 max_order=19 bucket_order=0
[ 1.754187] Modules linked in:
[ 1.768160] 9p: Installing v9fs 9p2000 file system support
[ 1.768640]
[ 1.768876] Key type asymmetric registered
[ 1.769048] CPU: 9 PID: 109 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #9
[ 1.769207] Asymmetric key parser 'x509' registered
[ 1.769397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.769651] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[ 1.769833] RIP: 0010:sched_mm_cid_after_execve+0xcb/0xe0
[ 1.770162] io scheduler mq-deadline registered
[ 1.770462] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 eb d9 41 be ff ff ff ff eb a0 0f 1f 84 00 00 00 00 00 90
[ 1.810713] RSP: 0018:ffffc90000427e80 EFLAGS: 00010246
[ 1.823527] RAX: fffffffffffffffe RBX: ffff88800cb88000 RCX: 0000000000000000
[ 1.824425] RDX: ffffc90000427dc8 RSI: ffff88800cb88000 RDI: ffff88800cb88000
[ 1.825266] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
[ 1.825564] RBP: ffff88800d2d8200 R08: 0000000000000064 R09: ffffc90000427b30
[ 1.825914] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 1.826068] R10: 0000000000000000 R11: fffffffffffffffe R12: fffffffffffffffe
[ 1.839784] ACPI: button: Power Button [PWRF]
[ 1.840327] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800cb88000
[ 1.855532] FS: 0000000000000000(0000) GS:ffff88807dc40000(0000) knlGS:0000000000000000
[ 1.856681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.857403] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000003506e0
[ 1.858264] Call Trace:
[ 1.858643] <TASK>
[ 1.871528] bprm_execve+0x323/0x600
[ 1.872027] kernel_execve+0x15f/0x1c0
[ 1.872505] call_usermodehelper_exec_async+0xd1/0x190
[ 1.873120] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[ 1.873800] ret_from_fork+0x2c/0x50
[ 1.874259] </TASK>
[ 1.874582] ---[ end trace 0000000000000000 ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-02 16:45:56

by tip-bot2 for Haifeng Xu

[permalink] [raw]

Subject: [tip: sched/core] sched/rseq: Fix concurrency ID handling of usermodehelper kthreads

The following commit has been merged into the sched/core branch of tip:

Commit-ID: bbd0b031509b880b4e9a880bb27ca2a30ad081ab
Gitweb: https://git.kernel.org/tip/bbd0b031509b880b4e9a880bb27ca2a30ad081ab
Author: Mathieu Desnoyers <[email protected]>
AuthorDate: Mon, 02 Jan 2023 10:12:16 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 02 Jan 2023 16:34:12 +01:00

sched/rseq: Fix concurrency ID handling of usermodehelper kthreads

sched_mm_cid_after_execve() does not expect NULL t->mm, but it may happen
if a usermodehelper kthread fails when attempting to execute a binary.

sched_mm_cid_fork() can be issued from a usermodehelper kthread, which
has t->flags PF_KTHREAD set.

Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Reported-by: kernel test robot <[email protected]>
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 048ec24..f99ee69 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11343,8 +11343,8 @@ void sched_mm_cid_after_execve(struct task_struct *t)
struct mm_struct *mm = t->mm;
unsigned long flags;

- WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);
-
+ if (!mm)
+ return;
local_irq_save(flags);
t->mm_cid = mm_cid_get(mm);
t->mm_cid_active = 1;
@@ -11354,7 +11354,7 @@ void sched_mm_cid_after_execve(struct task_struct *t)

void sched_mm_cid_fork(struct task_struct *t)
{
- WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm || t->mm_cid != -1);
+ WARN_ON_ONCE(!t->mm || t->mm_cid != -1);
t->mm_cid_active = 1;
}
#endif