2023-07-05 16:48:51

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing.

On Wed, Jul 05, 2023 at 06:36:43PM +0200, Richard Henderson wrote:
> No, I thought it would be the fix for 8.0.0.

Thanks for the suggestion anyway.

Am I right in thinking that tb_invalidate_phys_page() ought to be
called when the kernel self-modifies its text segment? If there's
some function that we expect to be called in this case then I can
instrument it.

Rich.

> r~
>
> On Wed, 5 July 2023, 18:35 Richard W.M. Jones, <[email protected]> wrote:
>
> On Wed, Jul 05, 2023 at 06:32:30PM +0200, Richard Henderson wrote:
> > https://gitlab.com/qemu-project/qemu/-/commit/
> > 3307e08c6f142bb3d2406cfbc0ee19359748b51a
>
> I've got that patch in my qemu tree.? Do you suspect it's
> wrong and want me to try reverting it?
>
> Rich.
>
> > r~
> >
> > On Wed, 5 July 2023, 18:28 Richard W.M. Jones, <[email protected]> wrote:
> >
> >? ? ?On Tue, Jul 04, 2023 at 08:46:20AM +0100, Richard W.M. Jones wrote:
> >? ? ?> We have been having the same sort of problem
> >? ? ?> (https://bugzilla.redhat.com/show_bug.cgi?id=2216496).? It's
> another
> >? ? ?> of those bugs that requires hundreds or thousands of boot
> iterations
> >? ? ?> before you can see it.? There is a test in comment 27 but it
> requires
> >? ? ?> guestfish and some hacking to work.? I'll try to come up with a
> >? ? ?> cleaner test later.
> >? ? ?>
> >? ? ?> We see stack traces like:
> >? ? ?>
> >? ? ?> [? ? 3.081939] clocksource: acpi_pm: mask: 0xffffff max_cycles:
> 0xffffff,
> >? ? ?max_idle_ns: 2085701024 ns
> >? ? ?> [? ? 3.082266] clocksource: Switched to clocksource acpi_pm
> >? ? ?> [? ? 3.090201] NET: Registered PF_INET protocol family
> >? ? ?> [? ? 3.093098] int3: 0000 [#1] PREEMPT SMP NOPTI
> >? ? ?> [? ? 3.093098] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
> >? ? ?6.4.0-10173-ga901a3568fd2 #8
> >? ? ?> [? ? 3.093098] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS
> >? ? ?rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> >? ? ?> [? ? 3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
> >? ? ?> [? ? 3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89
> ff e8
> >? ? ?f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f
> 20 66
> >? ? ?<90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
> >? ? ?> [? ? 3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
> >? ? ?> [? ? 3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> >? ? ?0000000000280003
> >? ? ?> [? ? 3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
> >? ? ?0000000000000000
> >? ? ?> [? ? 3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] FS:? 0000000000000000(0000) GS:ffff9aa90fd80000
> (0000)
> >? ? ?knlGS:0000000000000000
> >? ? ?> [? ? 3.093098] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >? ? ?> [? ? 3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
> >? ? ?0000000000750ee0
> >? ? ?> [? ? 3.093098] PKRU: 55555554
> >? ? ?> [? ? 3.093098] Call Trace:
> >? ? ?> [? ? 3.093098]? <TASK>
> >? ? ?> [? ? 3.093098]? ? die+0x31/0x80
> >? ? ?> [? ? 3.093098]? ? exc_int3+0x10e/0x120
> >? ? ?> [? ? 3.093098]? ? asm_exc_int3+0x39/0x40
> >? ? ?> [? ? 3.093098]? ? __mod_timer+0x1c3/0x370
> >? ? ?> [? ? 3.093098]? ? __mod_timer+0x1c3/0x370
> >? ? ?> [? ? 3.093098]? queue_delayed_work_on+0x23/0x30
> >? ? ?> [? ? 3.093098]? neigh_table_init+0x1bb/0x2e0
> >? ? ?> [? ? 3.093098]? arp_init+0x12/0x50
> >? ? ?> [? ? 3.093098]? inet_init+0x15b/0x2f0
> >? ? ?> [? ? 3.093098]? ? __pfx_inet_init+0x10/0x10
> >? ? ?> [? ? 3.093098]? do_one_initcall+0x58/0x230
> >? ? ?> [? ? 3.093098]? kernel_init_freeable+0x199/0x2d0
> >? ? ?> [? ? 3.093098]? ? __pfx_kernel_init+0x10/0x10
> >? ? ?> [? ? 3.093098]? kernel_init+0x15/0x1b0
> >? ? ?> [? ? 3.093098]? ret_from_fork+0x2c/0x50
> >? ? ?> [? ? 3.093098]? </TASK>
> >? ? ?> [? ? 3.093098] Modules linked in:
> >? ? ?> [? ? 3.093098] ---[ end trace 0000000000000000 ]---
> >? ? ?> [? ? 3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
> >? ? ?> [? ? 3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89
> ff e8
> >? ? ?f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f
> 20 66
> >? ? ?<90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
> >? ? ?> [? ? 3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
> >? ? ?> [? ? 3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> >? ? ?0000000000280003
> >? ? ?> [? ? 3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
> >? ? ?0000000000000000
> >? ? ?> [? ? 3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
> >? ? ?ffffffff8441e4b8
> >? ? ?> [? ? 3.093098] FS:? 0000000000000000(0000) GS:ffff9aa90fd80000
> (0000)
> >? ? ?knlGS:0000000000000000
> >? ? ?> [? ? 3.093098] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >? ? ?> [? ? 3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
> >? ? ?0000000000750ee0
> >? ? ?> [? ? 3.093098] PKRU: 55555554
> >? ? ?> [? ? 3.093098] Kernel panic - not syncing: Fatal exception in
> interrupt
> >? ? ?>
> >? ? ?> There are many variations, but the common pattern seems to be
> >? ? ?> <something in the clock or timer code> -> int3 exception
> >? ? ?>
> >? ? ?> It only happens under qemu TCG (software emulation).
> >? ? ?>
> >? ? ?> It goes away if we recompile qemu without MTTCG support.
> >? ? ?>
> >? ? ?> It only happens with -smp enabled, we are using qemu -smp 4
> >? ? ?>
> >? ? ?> We are using qemu-system-x86_64 full system emulation on x86_64
> host
> >? ? ?> (ie. forcing KVM off).
> >? ? ?>
> >? ? ?> It happens with the latest upstream kernel and qemu, compiled from
> >? ? ?> source.
> >
> >? ? ?I got a bit further on this one.
> >
> >? ? ?The bug happens in the code that updates the static branch used for
> at
> >? ? ?least these two keys:
> >
> >? ? ?? static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
> >? ? ?? DEFINE_STATIC_KEY_FALSE(timers_migration_enabled);
> >
> >? ? ?There are probably others (it seems a generic problem with how static
> >? ? ?branches are handled by TCG), but I only see the bug for those two.
> >
> >? ? ?When the static branch is updated we end up calling
> >? ? ?arch/x86/kernel/alternative.c:text_poke_bp_batch().? It's best to
> read
> >? ? ?the description of that function to see where int3 is used:
> >
> >? ? ?? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> tree/
> >? ? ?arch/x86/kernel/alternative.c#n2086
> >
> >? ? ?I modified the qemu TCG int3 helper so it dumps the code at %rip when
> >? ? ?the interrupt fires, and I can actually see the changes in the above
> >? ? ?function happen, first int3 being written, then the end of the nop,
> >? ? ?then the int3 being overwritten with the first byte of the nop.
> >
> >? ? ?Unfortunately the int3 still fires after the code has been completely
> >? ? ?rewritten to its final (ie nop) value.
> >
> >? ? ?This seems to indicate to me that neither the self-write to the
> kernel
> >? ? ?text segment, nor sync_core (implemented by a "iret to self" trick)
> >? ? ?invalidates the qemu TCG translation block containing the old int3
> >? ? ?helper call.? Thus we (qemu) never "see" the new nop, we keep
> >? ? ?emulating int3, and that causes the kernel to crash.
> >
> >? ? ?I added print statements inside tb_invalidate_phys_page() and this
> >? ? ?function seems never to be called at all.? It's my understanding that
> >? ? ?at least the kernel writing to its text segment ought to cause
> >? ? ?tb_invalidate_phys_page() to be called, but I'm not super-familiar
> >? ? ?with this qemu code.
> >
> >? ? ?Richard Henderson - do you have any suggestions?
> >
> >? ? ?Rich.
> >
> >? ? ?--
> >? ? ?Richard Jones, Virtualization Group, Red Hat http://people.redhat.com
> /
> >? ? ?~rjones
> >? ? ?Read my programming and virtualization blog: http://
> rwmj.wordpress.com
> >? ? ?libguestfs lets you edit virtual machines.? Supports shell scripting,
> >? ? ?bindings from many languages.? http://libguestfs.org
> >
> >
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/
> ~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-top is 'top' for virtual machines.? Tiny program with many
> powerful monitoring features, net stats, disk stats, logging, etc.
> http://people.redhat.com/~rjones/virt-top
>
>

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



2023-07-06 06:26:54

by Richard Henderson

[permalink] [raw]
Subject: Re: qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing.

On 7/5/23 17:40, Richard W.M. Jones wrote:
> On Wed, Jul 05, 2023 at 06:36:43PM +0200, Richard Henderson wrote:
>> No, I thought it would be the fix for 8.0.0.
>
> Thanks for the suggestion anyway.
>
> Am I right in thinking that tb_invalidate_phys_page() ought to be
> called when the kernel self-modifies its text segment? If there's
> some function that we expect to be called in this case then I can
> instrument it.

All such go through notdirty_write(), which has a convenient tracepoint. Recall there are
migration and vga dirty pages as well, which also go through here, so disabling vga if
possible may help.

Code stuff should then go through tb_invalidate_phys_range_fast(), so not
tb_invalidate_phys_page() at all.


r~