2024-03-04 14:31:25

by Sam Sun

[permalink] [raw]
Subject: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

Dear developers and maintainers,

We encountered a kernel warning with our modified Syzkaller. It is
tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
this email. Bug report is listed below.

```
ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240
arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
Modules linked in:
CPU: 0 PID: 8120 Comm: syz-executor447 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 4c 48 8b 14 dd a0
aa 10 8b 41 56 4c 89 e6 48 c7 c7 a0 9d 10 8b e8 81 32 ef fc 90 <0f> 0b
90 90 58 83 05 28 9a bd 0a 01 48 83 c4 18 5b 5d 41 5c 41 5d
RSP: 0018:ffffc90002aef9a0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff814c00fa
RDX: ffff888108431940 RSI: ffffffff814c0107 RDI: 0000000000000001
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8b10a380
R13: ffffffff8aaf6240 R14: ffffffff81334bf0 R15: ffff88801305ada8
FS: 000055555673c3c0(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 000000010635d000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
__debug_object_init+0x1cb/0x2a0 lib/debugobjects.c:651
debug_timer_init kernel/time/timer.c:777 [inline]
debug_init kernel/time/timer.c:825 [inline]
init_timer_key+0x31/0x1e0 kernel/time/timer.c:869
__mcheck_cpu_init_timer+0x70/0xf0 arch/x86/kernel/cpu/mce/core.c:2074
mce_cpu_restart arch/x86/kernel/cpu/mce/core.c:2373 [inline]
mce_cpu_restart+0x96/0xb0 arch/x86/kernel/cpu/mce/core.c:2367
csd_do_func kernel/smp.c:133 [inline]
smp_call_function_many_cond+0x121f/0x1570 kernel/smp.c:846
on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
on_each_cpu include/linux/smp.h:71 [inline]
mce_restart arch/x86/kernel/cpu/mce/core.c:2380 [inline]
set_bank+0x22a/0x370 arch/x86/kernel/cpu/mce/core.c:2450
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f2d0b70277d
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe0266ce18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000003e RCX: 00007f2d0b70277d
RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000000f4240 R08: 00007f2d0b757b75 R09: 00007f2d0b757b75
R10: 00007f2d0b757b75 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe0266d078 R14: 00007ffe0266ce40 R15: 00007ffe0266ce30
</TASK>
```

It seems that when restarting MCE the timer has been initialized
already. If you have any questions, please contact us.

Reported by: Yue Sun <[email protected]>
Reported by: xingwei lee <[email protected]>

Best Regards,
Yue


Attachments:
config (240.82 kB)
mcheck_cpu_init_timer.c (3.76 kB)
Download all attachments

2024-03-04 17:38:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

On Mon, Mar 04, 2024 at 10:26:28PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
>
> We encountered a kernel warning with our modified Syzkaller. It is
> tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
> this email. Bug report is listed below.

Thanks for the report - I started looking but am seeing more fail so
it'll take a while before I get to fixing it properly.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-13 14:52:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

On Mon, Mar 04, 2024 at 10:26:28PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
>
> We encountered a kernel warning with our modified Syzkaller. It is
> tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
> this email. Bug report is listed below.

See if that fixes it.

Thx.

---
From: "Borislav Petkov (AMD)" <[email protected]>
Date: Wed, 13 Mar 2024 14:48:27 +0100
Subject: [PATCH] x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported is done over

/sys/devices/system/machinecheck/
├── machinecheck0
│   ├── bank0
│   ├── bank1
│   ├── bank10
│   ├── bank11
...

sysfs nodes by writing the new bit mask of events to enable.

When the write is accepted, the kernel deletes all current timers and
reinits all banks.

Doing that in parallel can lead to initializing a timer which is already
armed and in the timer wheel, i.e., in use already:

ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514

Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
does.

Reported by: Yue Sun <[email protected]>
Reported by: xingwei lee <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b5cc557cfc37..84d41be6d06b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2500,12 +2500,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
return -EINVAL;

b = &per_cpu(mce_banks_array, s->id)[bank];
-
if (!b->init)
return -ENODEV;

b->ctl = new;
+
+ mutex_lock(&mce_sysfs_mutex);
mce_restart();
+ mutex_unlock(&mce_sysfs_mutex);

return size;
}
--
2.43.0

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-13 16:32:43

by Sam Sun

[permalink] [raw]
Subject: Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

On Wed, Mar 13, 2024 at 10:52 PM Borislav Petkov <[email protected]> wrote:
>
> On Mon, Mar 04, 2024 at 10:26:28PM +0800, Sam Sun wrote:
> > Dear developers and maintainers,
> >
> > We encountered a kernel warning with our modified Syzkaller. It is
> > tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
> > this email. Bug report is listed below.
>
> See if that fixes it.
>
> Thx.

I applied this patch on the latest kernel mainline commit, and the C
repro could not trigger this bug. I think this bug is fixed by this
patch.

Best Regards,
Yue

>
> ---
> From: "Borislav Petkov (AMD)" <[email protected]>
> Date: Wed, 13 Mar 2024 14:48:27 +0100
> Subject: [PATCH] x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Modifying a MCA bank's MCA_CTL bits which control which error types to
> be reported is done over
>
> /sys/devices/system/machinecheck/
> ├── machinecheck0
> │ ├── bank0
> │ ├── bank1
> │ ├── bank10
> │ ├── bank11
> ...
>
> sysfs nodes by writing the new bit mask of events to enable.
>
> When the write is accepted, the kernel deletes all current timers and
> reinits all banks.
>
> Doing that in parallel can lead to initializing a timer which is already
> armed and in the timer wheel, i.e., in use already:
>
> ODEBUG: init active (active state 0) object: ffff888063a28000 object
> type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
> WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
> debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
>
> Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
> does.
>
> Reported by: Yue Sun <[email protected]>
> Reported by: xingwei lee <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Cc: <[email protected]>
> Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%[email protected]
> ---
> arch/x86/kernel/cpu/mce/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index b5cc557cfc37..84d41be6d06b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -2500,12 +2500,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
> return -EINVAL;
>
> b = &per_cpu(mce_banks_array, s->id)[bank];
> -
> if (!b->init)
> return -ENODEV;
>
> b->ctl = new;
> +
> + mutex_lock(&mce_sysfs_mutex);
> mce_restart();
> + mutex_unlock(&mce_sysfs_mutex);
>
> return size;
> }
> --
> 2.43.0
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2024-03-13 17:52:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

On Thu, Mar 14, 2024 at 12:32:20AM +0800, Sam Sun wrote:
> I applied this patch on the latest kernel mainline commit, and the C
> repro could not trigger this bug. I think this bug is fixed by this
> patch.

Yap, it doesn't trigger anymore here too with your reproducer but thanks
for confirming. I'll queue it after the merge window is over.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette