2024-06-12 09:12:01

by Borislav Petkov

[permalink] [raw]
Subject: WARNING: CPU: 7 PID: 0 at kernel/time/timer_migration.c:1332 tmigr_inactive_up+0xd2/0x190

Hi,

one of our testing machines hit this today:

...

resctrl: SMBA allocation detected
resctrl: L3 monitoring detected
IPI shorthand broadcast: enabled
sched_clock: Marking stable (5784002478, 2856728882)->(8961215448, -320484088)
Timer migration: 2 hierarchy levels; 8 children per group; 2 crossnode level
------------[ cut here ]------------
registered taskstats version 1
WARNING: CPU: 7 PID: 0 at kernel/time/timer_migration.c:1332 tmigr_inactive_up+0xd2/0x190
Modules linked in:
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.10.0-rc3-1718152260480 #1
RIP: 0010:tmigr_inactive_up+0xd2/0x190
Code: cf 74 0f 45 84 c9 75 0a 41 0f b6 44 24 60 41 88 45 18 48 b8 ff ff ff ff ff ff ff 7f 49 39 45 08 74 0a 49 83 7c 24 08 00 74 02 <0f> 0b 66 90 48 83 c4 18 44 89 c8 5b 41 5c 41 5d 41 5e 41 5f 5d e9
RSP: 0018:ff53ce854029fd00 EFLAGS: 00010086
RAX: 7fffffffffffffff RBX: 000000000010ff00 RCX: 0000000000000000
RDX: 00000001701e4800 RSI: 0000000000000000 RDI: ff44640202b1e800
RBP: ff53ce854029fd40 R08: ff4464110bda61c0 R09: 0000000000000000
R10: 0000000000000080 R11: 00000000063c38d0 R12: ff44640202b1e800
R13: ff53ce854029fd50 R14: 0000000000000000 R15: ff44640202b1e850
FS: 0000000000000000(0000) GS:ff4464110bd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007403c001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
? show_regs+0x6d/0x80
? __warn+0x8c/0x140
? tmigr_inactive_up+0xd2/0x190
? report_bug+0x193/0x1a0
? handle_bug+0x46/0x80
? exc_invalid_op+0x1d/0x80
? asm_exc_invalid_op+0x1f/0x30
? tmigr_inactive_up+0xd2/0x190
? __pfx_hrtimer_get_next_event+0x10/0x10
tmigr_cpu_deactivate+0xba/0x180
__get_next_timer_interrupt+0x1e8/0x310
timer_base_try_to_set_idle+0x42/0x60
? srso_alias_return_thunk+0x5/0xfbef5
tick_nohz_idle_stop_tick+0xda/0x380
do_idle+0x1cd/0x240
? complete+0x71/0x80
cpu_startup_entry+0x30/0x40
start_secondary+0x12b/0x160
common_startup_64+0x13e/0x141
</TASK>
---[ end trace 0000000000000000 ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2024-06-13 14:41:17

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: WARNING: CPU: 7 PID: 0 at kernel/time/timer_migration.c:1332 tmigr_inactive_up+0xd2/0x190

Hi,

Borislav Petkov <[email protected]> writes:

> Hi,
>
> one of our testing machines hit this today:
>

thanks for the report. Is it reproducible? If yes, might it be possible
to enable the timer_migration tracepoints and provide the trace?

I have a vague guess, but a trace output would definitely help.

Thanks,

Anna-Maria


2024-06-13 15:02:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: WARNING: CPU: 7 PID: 0 at kernel/time/timer_migration.c:1332 tmigr_inactive_up+0xd2/0x190

Hi Anna-Maria,

On Thu, Jun 13, 2024 at 04:40:33PM +0200, Anna-Maria Behnsen wrote:
> thanks for the report. Is it reproducible?

Narasimhan just tells me that he was NOT able to reproduce it in today's run.
I guess we can wait and see.

> If yes, might it be possible to enable the timer_migration tracepoints and
> provide the trace?

Can you pls give Narasimhan exact instructions what to do the next time it
happens?

@Narasimhan: when you see it next time, you should try to run Anna-Maria's trace
with the same exact kernel and send results.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-13 15:52:11

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: WARNING: CPU: 7 PID: 0 at kernel/time/timer_migration.c:1332 tmigr_inactive_up+0xd2/0x190

Hi Narasimhan,

Borislav Petkov <[email protected]> writes:

> Hi Anna-Maria,
>
> On Thu, Jun 13, 2024 at 04:40:33PM +0200, Anna-Maria Behnsen wrote:
>> thanks for the report. Is it reproducible?
>
> Narasimhan just tells me that he was NOT able to reproduce it in today's run.
> I guess we can wait and see.
>
>> If yes, might it be possible to enable the timer_migration tracepoints and
>> provide the trace?
>
> Can you pls give Narasimhan exact instructions what to do the next time it
> happens?

Please add the following to the kernel command line before boot:

traceoff_on_warning trace_event=timer_migration:*

And then everything I would need should be part of the trace
file. Please extract it after the warning occurs with

cat /sys/kernel/debug/tracing/trace > trace.txt

and provide me this file.

Thanks a lot for your help!

Anna-Maria