Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a regression in bugzilla.kernel.org.
Thomas, I wonder if it's caused by your topology changes. But it's just
a wild guess and I might be totally wrong there, so feel free to ignore
this mail. I already asked for a bit more log output and a bisection in
the ticket.
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=218698
> Environment:
>
> Host OS: CentOS 9
> Host kernel: 6.9.0-rc1
> KVM commit: 9bc60f73
> Qemu commit: e5c6528d
> Guest kernel: 6.9-rc2
> Guest commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
> Guest repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
>
> Bug detail description:
>
> When hot adding a vCPU to the guest, the guest happens Call Trace and reboot.
>
> Latest successful guest kernel version: 6.8.0-rc7 (commit: 90d35da658da8cff0d4ecbb5113f5fac9d00eb72).
>
>
> Reproduce steps:
>
> 1. Create guest:
>
> qemu-system-x86_64 -accel kvm -cpu host -smp 4,maxcpus=128 -drive file=/share/xvs/var/tmp-img_vcpu_hot_add_1712412537,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -m 4096 -monitor pty -daemonize -vnc :16147 -device virtio-net-pci,netdev=nic0,mac=00:c0:82:16:fa:b0 -netdev tap,id=nic0,br=virbr0,helper=/usr/local/libexec/qemu-bridge-helper,vhost=on
>
> 2. Add vCPU to guest
>
> echo 'device_add driver=host-x86_64-cpu,socket-id=0,core-id=4,thread-id=0' > /dev/pts/2
>
> cat /dev/pts/2
>
>
> Error log:
>
> [ 49.782913] Call Trace:
> [ 49.783039] <TASK>
> [ 49.783147] ? __die+0x24/0x70
> [ 49.783309] ? page_fault_oops+0x82/0x150
> [ 49.783518] ? kernelmode_fixup_or_oops+0x84/0x110
> [ 49.783753] ? exc_page_fault+0xb9/0x160
> [ 49.783948] ? asm_exc_page_fault+0x26/0x30
> [ 49.784144] ? cpu_update_apic+0x1c/0x70
> [ 49.784327] generic_processor_info+0x7e/0x160
> [ 49.784541] acpi_register_lapic+0x19/0x80
> [ 49.784732] acpi_map_cpu+0x26/0x90
> [ 49.784896] acpi_processor_get_info+0x256/0x490
> [ 49.785344] acpi_processor_add+0xb9/0x1f0
> [ 49.785760] acpi_bus_attach+0x13b/0x220
> [ 49.786158] acpi_bus_scan+0x7e/0x1e0
> [ 49.786548] acpi_device_hotplug+0x198/0x2b0
> [ 49.786963] acpi_hotplug_work_fn+0x1e/0x30
> [ 49.787363] process_one_work+0x159/0x370
> [ 49.787790] worker_thread+0x302/0x420
> [ 49.788184] ? __pfx_worker_thread+0x10/0x10
> [ 49.788592] kthread+0xe3/0x120
> [ 49.788955] ? __pfx_kthread+0x10/0x10
> [ 49.789335] ret_from_fork+0x31/0x50
> [ 49.789720] ? __pfx_kthread+0x10/0x10
> [ 49.790100] ret_from_fork_asm+0x1b/0x30
> [ 49.790491] </TASK>
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: v6.8-rc7..v6.9-rc2
#regzbot title: Call Trace when adding vCPU to guest
#regzbot from: "Ma, XiangfeiX" <[email protected]>
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218698
#regzbot ignore-activity
On Wed, Apr 10 2024 at 09:34, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
>
> I noticed a report about a regression in bugzilla.kernel.org.
>
> Thomas, I wonder if it's caused by your topology changes. But it's just
> a wild guess and I might be totally wrong there, so feel free to ignore
> this mail. I already asked for a bit more log output and a bisection in
> the ticket.
>
> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=218698
>
>> Environment:
>>
>> Host OS: CentOS 9
>> Host kernel: 6.9.0-rc1
>> KVM commit: 9bc60f73
>> Qemu commit: e5c6528d
>> Guest kernel: 6.9-rc2
>> Guest commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
>> Guest repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>>
>> Bug detail description:
>>
>> When hot adding a vCPU to the guest, the guest happens Call Trace and reboot.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
It'll be in rc4
Thanks,
tglx
On 10.04.24 15:38, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 09:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=218698
> [...]
>>>
>>> When hot adding a vCPU to the guest, the guest happens Call Trace and reboot.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
>
> It'll be in rc4
Ahh, splendid, thx for replying! Ciao, Thorsten
#regzbot fix: a9025cd1c673a8d6eefc79d911075b8b452eb
On Wed, Apr 10 2024 at 15:48, Thorsten Leemhuis wrote:
> On 10.04.24 15:38, Thomas Gleixner wrote:
>> On Wed, Apr 10 2024 at 09:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=218698
>> [...]
>>>>
>>>> When hot adding a vCPU to the guest, the guest happens Call Trace and reboot.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
>>
>> It'll be in rc4
>
> Ahh, splendid, thx for replying! Ciao, Thorsten
>
> #regzbot fix: a9025cd1c673a8d6eefc79d911075b8b452eb
Ooops. No!
I just read back and noticed that this is a report against some whatever
kernel:
> [ 49.782913] Call Trace:
> [ 49.783039] <TASK>
> [ 49.783147] ? __die+0x24/0x70
> [ 49.783309] ? page_fault_oops+0x82/0x150
> [ 49.783518] ? kernelmode_fixup_or_oops+0x84/0x110
> [ 49.783753] ? exc_page_fault+0xb9/0x160
> [ 49.783948] ? asm_exc_page_fault+0x26/0x30
> [ 49.784144] ? cpu_update_apic+0x1c/0x70
> [ 49.784327] generic_processor_info+0x7e/0x160
> [ 49.784541] acpi_register_lapic+0x19/0x80
# cd linus/linux
# gcur
master 2c71fdf02a95: Merge tag 'drm-fixes-2024-04-09' of https://gitlab.freedesktop.org/drm/kernel
# git grep generic_processor_info
#
generic_processor_info() was removed during the 6.9 merge window with
the topology rework before v6.9-rc1.
So the guest kernel _cannot_ be v6.9-rc2 at all.
Thanks,
tglx
On 10.04.24 16:52, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Thorsten Leemhuis wrote:
>> On 10.04.24 15:38, Thomas Gleixner wrote:
>>> On Wed, Apr 10 2024 at 09:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=218698
>>> [...]
>>>>> When hot adding a vCPU to the guest, the guest happens Call Trace and reboot.
>>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
>>> It'll be in rc4
>> Ahh, splendid, thx for replying! Ciao, Thorsten
>>
>> #regzbot fix: a9025cd1c673a8d6eefc79d911075b8b452eb
>
> Ooops. No!
> [...]
Ha, happens. :-D
> generic_processor_info() was removed during the 6.9 merge window with
> the topology rework before v6.9-rc1.
>
> So the guest kernel _cannot_ be v6.9-rc2 at all.
Ma, XiangfeiX (the reporter) is CCed and might be able to clarify.
Meanwhile Dongli Zhang (now CCed, too) also joined the bug report and
added this comment:
"""
I can reproduce as well. But the callstack is different. It finally
reaches at topo_set_cpuids().
/home/zhang/kvm/qemu-8.2.0/build/qemu-system-x86_64 \
-hda disk.qcow2 -m 8G -smp 4,maxcpus=128 -vnc :5 -enable-kvm -cpu host \
-netdev user,id=user0,hostfwd=tcp::5025-:22 \
-device virtio-net-pci,netdev=user0,id=net0,mac=12:14:10:12:14:16,bus=pci.0,addr=0x3 \
-kernel /home/zhang/img/debug/mainline-linux/arch/x86_64/boot/bzImage \
-append "root=/dev/sda3 init=/sbin/init text loglevel=7 console=ttyS0" \
-monitor stdio
(qemu) device_add driver=host-x86_64-cpu,socket-id=0,core-id=4,thread-id=0
[ 27.060885] BUG: unable to handle page fault for address: ffffffff83a69778
[ 27.061954] #PF: supervisor write access in kernel mode
[ 27.062604] #PF: error_code(0x0003) - permissions violation
[ 27.063286] PGD 40c49067 P4D 40c49067 PUD 40c4a063 PMD 102213063 PTE 8000000040a69021
[ 27.064273] Oops: 0003 [#1] PREEMPT SMP PTI
[ 27.064799] CPU: 2 PID: 39 Comm: kworker/u256:1 Not tainted 6.9.0-rc3 #1
[ 27.065611] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 27.066992] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 27.067669] RIP: 0010:topo_set_cpuids+0x26/0x70
[ 27.068242] Code: 90 90 90 90 48 8b 05 d9 bd da 01 48 85 c0 74 31 89 ff 48 8d 04 b8 89 30 48 8b 05 bd bd da 01 48 85 c0 74 3c 48 8d 04 b8 89 10 <f0> 48 0f ab 3d 79 9e 97 01 f0 48 0f ab 3d 40 03 df 01 c3 cc cc cc
[ 27.070471] RSP: 0018:ffffc3980034bc28 EFLAGS: 00010286
[ 27.071130] RAX: ffffa0bbb6f15160 RBX: 0000000000000004 RCX: 0000000000000040
[ 27.072004] RDX: 0000000000000004 RSI: 0000000000000004 RDI: 0000000000000004
[ 27.072858] RBP: ffffa0ba80d68540 R08: 000000000001d4c0 R09: 0000000000000001
[ 27.073713] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000004
[ 27.074565] R13: ffffa0ba883b6c10 R14: ffffa0ba809a9040 R15: 0000000000000000
[ 27.075418] FS: 0000000000000000(0000) GS:ffffa0bbb6e80000(0000) knlGS:0000000000000000
[ 27.076424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.077121] CR2: ffffffff83a69778 CR3: 000000010f946006 CR4: 0000000000370ef0
[ 27.077976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 27.078830] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 27.079685] Call Trace:
[ 27.080031] <TASK>
[ 27.080341] ? __die+0x1f/0x70
[ 27.080755] ? page_fault_oops+0x17b/0x490
[ 27.081305] ? search_exception_tables+0x37/0x50
[ 27.081897] ? exc_page_fault+0xba/0x160
[ 27.082402] ? asm_exc_page_fault+0x26/0x30
[ 27.082929] ? topo_set_cpuids+0x26/0x70
[ 27.083432] topology_hotplug_apic+0x54/0xa0
[ 27.083979] acpi_map_cpu+0x1c/0x80
[ 27.084437] acpi_processor_add+0x361/0x630
[ 27.084968] acpi_bus_attach+0x151/0x230
[ 27.085473] ? __pfx_acpi_dev_for_one_check+0x10/0x10
[ 27.086091] device_for_each_child+0x68/0xb0
[ 27.086638] acpi_dev_for_each_child+0x37/0x60
[ 27.087197] ? __pfx_acpi_bus_attach+0x10/0x10
[ 27.087757] acpi_bus_attach+0x89/0x230
[ 27.088251] acpi_bus_scan+0x77/0x1f0
[ 27.088753] acpi_scan_rescan_bus+0x3c/0x70
[ 27.089300] acpi_device_hotplug+0x3a3/0x480
[ 27.089840] acpi_hotplug_work_fn+0x19/0x30
[ 27.090369] process_one_work+0x14c/0x360
[ 27.090880] worker_thread+0x2c5/0x3d0
[ 27.091387] ? __pfx_worker_thread+0x10/0x10
[ 27.091941] kthread+0xd3/0x100
[ 27.092361] ? __pfx_kthread+0x10/0x10
[ 27.092843] ret_from_fork+0x2f/0x50
[ 27.093309] ? __pfx_kthread+0x10/0x10
[ 27.093788] ret_from_fork_asm+0x1a/0x30
[ 27.094293] </TASK>
[ 27.094601] Modules linked in:
[ 27.095007] CR2: ffffffff83a69778
[ 27.095444] ---[ end trace 0000000000000000 ]---
[ 27.096018] RIP: 0010:topo_set_cpuids+0x26/0x70
[ 27.096590] Code: 90 90 90 90 48 8b 05 d9 bd da 01 48 85 c0 74 31 89 ff 48 8d 04 b8 89 30 48 8b 05 bd bd da 01 48 85 c0 74 3c 48 8d 04 b8 89 10 <f0> 48 0f ab 3d 79 9e 97 01 f0 48 0f ab 3d 40 03 df 01 c3 cc cc cc
[ 27.098808] RSP: 0018:ffffc3980034bc28 EFLAGS: 00010286
[ 27.099452] RAX: ffffa0bbb6f15160 RBX: 0000000000000004 RCX: 0000000000000040
[ 27.100305] RDX: 0000000000000004 RSI: 0000000000000004 RDI: 0000000000000004
[ 27.101153] RBP: ffffa0ba80d68540 R08: 000000000001d4c0 R09: 0000000000000001
[ 27.102141] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000004
[ 27.102995] R13: ffffa0ba883b6c10 R14: ffffa0ba809a9040 R15: 0000000000000000
[ 27.103851] FS: 0000000000000000(0000) GS:ffffa0bbb6e80000(0000) knlGS:0000000000000000
[ 27.104857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.105559] CR2: ffffffff83a69778 CR3: 000000010f946006 CR4: 0000000000370ef0
[ 27.106411] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 27.107264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------------
I am not able to reproduce with the below:
x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
"""
Ciao, Thorsten (who hopes that people will continue on the list now,
as playing man-in-the middle is tedious)
On Wed, Apr 10 2024 at 18:25, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 10.04.24 16:52, Thomas Gleixner wrote:
> I can reproduce as well. But the callstack is different. It finally
> reaches at topo_set_cpuids().
That looks about right.
> I am not able to reproduce with the below:
>
> x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a9025cd1c673a8d6eefc79d911075b8b452eba8f
Yes, that's the fix for the crash you saw.