2023-06-16 05:51:04

by Hamza Mahfooz

[permalink] [raw]
Subject: KVM page-fault on Kernel 6.3.8

I am seeing the following page-fault on the latest stable kernel:

BUG: unable to handle page fault for address: ffffb4ff0cd20034
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 10002a067 P4D 10002a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 2675 Comm: CPU 7/KVM Not tainted 6.3.8-arch1-1 #1
a1d299e746aebdb27c523dd3bd94aba6f54915c7
Hardware name: ASUS System Product Name/ProArt X670E-CREATOR WIFI, BIOS
1303 04/27/2023
RIP: 0010:try_grab_folio+0x14f/0x370
Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60 4c 8b
63 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44
24 34 85 c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x172/0x180
? asm_exc_page_fault+0x26/0x30
? try_grab_folio+0x14f/0x370
internal_get_user_pages_fast+0x883/0x1150
__iov_iter_get_pages_alloc+0xdd/0x780
? kmem_cache_alloc+0x16f/0x330
? bio_associate_blkg_from_css+0xcd/0x340
iov_iter_get_pages+0x1d/0x40
bio_iov_iter_get_pages+0xa1/0x480
__blkdev_direct_IO_async+0xc5/0x1b0
blkdev_read_iter+0x127/0x1d0
aio_read+0x132/0x210
? io_submit_one+0x46a/0x8b0
io_submit_one+0x46a/0x8b0
? kvm_arch_vcpu_put+0x128/0x190 [kvm
711ceda1c40511ce22d1f99f4e9e574def76b25e]
? kvm_arch_vcpu_ioctl_run+0x579/0x1770 [kvm
711ceda1c40511ce22d1f99f4e9e574def76b25e]
__x64_sys_io_submit+0xad/0x190
do_syscall_64+0x5d/0x90
? __x64_sys_ioctl+0xac/0xd0
? syscall_exit_to_user_mode+0x1b/0x40
? do_syscall_64+0x6c/0x90
? syscall_exit_to_user_mode+0x1b/0x40
? do_syscall_64+0x6c/0x90
? syscall_exit_to_user_mode+0x1b/0x40
? do_syscall_64+0x6c/0x90
? syscall_exit_to_user_mode+0x1b/0x40
? do_syscall_64+0x6c/0x90
? do_syscall_64+0x6c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f57ac0912ed
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 3b 7a 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f5427ab97b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
RAX: ffffffffffffffda RBX: 00007f548a7fc1d0 RCX: 00007f57ac0912ed
RDX: 00007f5427ab9800 RSI: 0000000000000001 RDI: 00007f57a9d24000
RBP: 00007f57a9d24000 R08: 0000000000000001 R09: 0000000000000001
R10: 00007f54740044f0 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000004 R14: 00007f5427ab9800 R15: 000000000000000e
</TASK>
Modules linked in: hid_playstation led_class_multicolor ff_memless tun
snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nf_tables libcrc32c nfnetlink bridge stp llc vfat fat
snd_hda_codec_realtek snd_hda_codec_generic mt7921e snd_hda_codec_hdmi
mt7921_common snd_usb_audio intel_rapl_msr mt76_connac_lib
snd_hda_intel intel_rapl_common snd_intel_dspcfg mt76 snd_usbmidi_lib
btusb snd_intel_sdw_acpi edac_mce_amd snd_rawmidi btrtl snd_hda_codec
btbcm snd_seq_device btintel snd_hda_core kvm_amd mc snd_hwdep
eeepc_wmi btmtk snd_pcm asus_wmi kvm mac80211 bluetooth ledtrig_audio
atlantic snd_timer i8042 sparse_keymap libarc4 ecdh_generic rapl
platform_profile serio intel_wmi_thunderbolt i2c_piix4 wmi_bmof pcspkr
k10temp thunderbolt snd igc ucsi_acpi macsec soundcore cfg80211
typec_ucsi mousedev joydev typec roles rfkill gpio_amdpt acpi_cpufreq
gpio_generic mac_hid dm_multipath
crypto_user fuse loop bpf_preload ip_tables x_tables ext4
crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted
asn1_encoder tee dm_mod hid_logitech_hidpp hid_logitech_dj usbhid
amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
polyval_generic i2c_algo_bit drm_ttm_helper gf128mul nvme
ghash_clmulni_intel ttm sha512_ssse3 drm_buddy aesni_intel gpu_sched
crypto_simd nvme_core drm_display_helper cryptd ccp xhci_pci cec
nvme_common xhci_pci_renesas video wmi vfio_pci vfio_pci_core irqbypass
vfio_iommu_type1 vfio iommufd
CR2: ffffb4ff0cd20034
---[ end trace 0000000000000000 ]---
RIP: 0010:try_grab_folio+0x14f/0x370
Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60 4c 8b
63 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44
24 34 85 c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
PKRU: 55555554
note: CPU 7/KVM[2675] exited with irqs disabled

It seems to appear randomly, so bisecting it would probably be
difficult. Also, as far as I can tell it seems to be a recent
regression (i.e. it was introduced in one of the 6.3.y releases).




2023-06-16 13:28:21

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: KVM page-fault on Kernel 6.3.8

On Fri, Jun 16, 2023 at 01:25:33AM -0400, Hamza Mahfooz wrote:
> I am seeing the following page-fault on the latest stable kernel:
>
> BUG: unable to handle page fault for address: ffffb4ff0cd20034
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 10002a067 P4D 10002a067 PUD 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 PID: 2675 Comm: CPU 7/KVM Not tainted 6.3.8-arch1-1 #1
> a1d299e746aebdb27c523dd3bd94aba6f54915c7
> Hardware name: ASUS System Product Name/ProArt X670E-CREATOR WIFI, BIOS 1303
> 04/27/2023
> RIP: 0010:try_grab_folio+0x14f/0x370
> Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60 4c 8b 63
> 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44 24 34 85
> c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
> RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
> RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
> RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
> R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
> R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
> FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __die+0x23/0x70
> ? page_fault_oops+0x171/0x4e0
> ? exc_page_fault+0x172/0x180
> ? asm_exc_page_fault+0x26/0x30
> ? try_grab_folio+0x14f/0x370
> internal_get_user_pages_fast+0x883/0x1150
> __iov_iter_get_pages_alloc+0xdd/0x780
> ? kmem_cache_alloc+0x16f/0x330
> ? bio_associate_blkg_from_css+0xcd/0x340
> iov_iter_get_pages+0x1d/0x40
> bio_iov_iter_get_pages+0xa1/0x480
> __blkdev_direct_IO_async+0xc5/0x1b0
> blkdev_read_iter+0x127/0x1d0
> aio_read+0x132/0x210
> ? io_submit_one+0x46a/0x8b0
> io_submit_one+0x46a/0x8b0
> ? kvm_arch_vcpu_put+0x128/0x190 [kvm
> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
> ? kvm_arch_vcpu_ioctl_run+0x579/0x1770 [kvm
> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
> __x64_sys_io_submit+0xad/0x190
> do_syscall_64+0x5d/0x90
> ? __x64_sys_ioctl+0xac/0xd0
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? do_syscall_64+0x6c/0x90
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7f57ac0912ed
> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 3b 7a 0d 00 f7 d8 64 89 01 48
> RSP: 002b:00007f5427ab97b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> RAX: ffffffffffffffda RBX: 00007f548a7fc1d0 RCX: 00007f57ac0912ed
> RDX: 00007f5427ab9800 RSI: 0000000000000001 RDI: 00007f57a9d24000
> RBP: 00007f57a9d24000 R08: 0000000000000001 R09: 0000000000000001
> R10: 00007f54740044f0 R11: 0000000000000246 R12: 0000000000000001
> R13: 0000000000000004 R14: 00007f5427ab9800 R15: 000000000000000e
> </TASK>
> Modules linked in: hid_playstation led_class_multicolor ff_memless tun
> snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack
> ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nfnetlink
> bridge stp llc vfat fat snd_hda_codec_realtek snd_hda_codec_generic mt7921e
> snd_hda_codec_hdmi mt7921_common snd_usb_audio intel_rapl_msr
> mt76_connac_lib snd_hda_intel intel_rapl_common snd_intel_dspcfg mt76
> snd_usbmidi_lib btusb snd_intel_sdw_acpi edac_mce_amd snd_rawmidi btrtl
> snd_hda_codec btbcm snd_seq_device btintel snd_hda_core kvm_amd mc snd_hwdep
> eeepc_wmi btmtk snd_pcm asus_wmi kvm mac80211 bluetooth ledtrig_audio
> atlantic snd_timer i8042 sparse_keymap libarc4 ecdh_generic rapl
> platform_profile serio intel_wmi_thunderbolt i2c_piix4 wmi_bmof pcspkr
> k10temp thunderbolt snd igc ucsi_acpi macsec soundcore cfg80211 typec_ucsi
> mousedev joydev typec roles rfkill gpio_amdpt acpi_cpufreq gpio_generic
> mac_hid dm_multipath
> crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic
> crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee
> dm_mod hid_logitech_hidpp hid_logitech_dj usbhid amdgpu crct10dif_pclmul
> crc32_pclmul crc32c_intel polyval_clmulni polyval_generic i2c_algo_bit
> drm_ttm_helper gf128mul nvme ghash_clmulni_intel ttm sha512_ssse3 drm_buddy
> aesni_intel gpu_sched crypto_simd nvme_core drm_display_helper cryptd ccp
> xhci_pci cec nvme_common xhci_pci_renesas video wmi vfio_pci vfio_pci_core
> irqbypass vfio_iommu_type1 vfio iommufd
> CR2: ffffb4ff0cd20034
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:try_grab_folio+0x14f/0x370
> Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60 4c 8b 63
> 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44 24 34 85
> c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
> RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
> RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
> RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
> R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
> R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
> FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
> PKRU: 55555554
> note: CPU 7/KVM[2675] exited with irqs disabled
>
> It seems to appear randomly, so bisecting it would probably be
> difficult. Also, as far as I can tell it seems to be a recent
> regression (i.e. it was introduced in one of the 6.3.y releases).
>
>

So v6.2.y looks fine (doesn't have this regression)?

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (6.08 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-16 13:42:21

by Hamza Mahfooz

[permalink] [raw]
Subject: Re: KVM page-fault on Kernel 6.3.8


On Fri, Jun 16 2023 at 07:49:08 PM +07:00:00, Bagas Sanjaya
<[email protected]> wrote:
> On Fri, Jun 16, 2023 at 01:25:33AM -0400, Hamza Mahfooz wrote:
>> I am seeing the following page-fault on the latest stable kernel:
>>
>> BUG: unable to handle page fault for address: ffffb4ff0cd20034
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 10002a067 P4D 10002a067 PUD 0
>> Oops: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 7 PID: 2675 Comm: CPU 7/KVM Not tainted 6.3.8-arch1-1 #1
>> a1d299e746aebdb27c523dd3bd94aba6f54915c7
>> Hardware name: ASUS System Product Name/ProArt X670E-CREATOR WIFI,
>> BIOS 1303
>> 04/27/2023
>> RIP: 0010:try_grab_folio+0x14f/0x370
>> Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60
>> 4c 8b 63
>> 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44
>> 24 34 85
>> c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
>> RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
>> RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
>> RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
>> RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
>> R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
>> R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
>> FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> ? __die+0x23/0x70
>> ? page_fault_oops+0x171/0x4e0
>> ? exc_page_fault+0x172/0x180
>> ? asm_exc_page_fault+0x26/0x30
>> ? try_grab_folio+0x14f/0x370
>> internal_get_user_pages_fast+0x883/0x1150
>> __iov_iter_get_pages_alloc+0xdd/0x780
>> ? kmem_cache_alloc+0x16f/0x330
>> ? bio_associate_blkg_from_css+0xcd/0x340
>> iov_iter_get_pages+0x1d/0x40
>> bio_iov_iter_get_pages+0xa1/0x480
>> __blkdev_direct_IO_async+0xc5/0x1b0
>> blkdev_read_iter+0x127/0x1d0
>> aio_read+0x132/0x210
>> ? io_submit_one+0x46a/0x8b0
>> io_submit_one+0x46a/0x8b0
>> ? kvm_arch_vcpu_put+0x128/0x190 [kvm
>> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
>> ? kvm_arch_vcpu_ioctl_run+0x579/0x1770 [kvm
>> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
>> __x64_sys_io_submit+0xad/0x190
>> do_syscall_64+0x5d/0x90
>> ? __x64_sys_ioctl+0xac/0xd0
>> ? syscall_exit_to_user_mode+0x1b/0x40
>> ? do_syscall_64+0x6c/0x90
>> ? syscall_exit_to_user_mode+0x1b/0x40
>> ? do_syscall_64+0x6c/0x90
>> ? syscall_exit_to_user_mode+0x1b/0x40
>> ? do_syscall_64+0x6c/0x90
>> ? syscall_exit_to_user_mode+0x1b/0x40
>> ? do_syscall_64+0x6c/0x90
>> ? do_syscall_64+0x6c/0x90
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>> RIP: 0033:0x7f57ac0912ed
>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8
>> 48 89 f7
>> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
>> f0 ff ff
>> 73 01 c3 48 8b 0d 3b 7a 0d 00 f7 d8 64 89 01 48
>> RSP: 002b:00007f5427ab97b8 EFLAGS: 00000246 ORIG_RAX:
>> 00000000000000d1
>> RAX: ffffffffffffffda RBX: 00007f548a7fc1d0 RCX: 00007f57ac0912ed
>> RDX: 00007f5427ab9800 RSI: 0000000000000001 RDI: 00007f57a9d24000
>> RBP: 00007f57a9d24000 R08: 0000000000000001 R09: 0000000000000001
>> R10: 00007f54740044f0 R11: 0000000000000246 R12: 0000000000000001
>> R13: 0000000000000004 R14: 00007f5427ab9800 R15: 000000000000000e
>> </TASK>
>> Modules linked in: hid_playstation led_class_multicolor ff_memless
>> tun
>> snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE
>> xt_conntrack
>> ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c
>> nfnetlink
>> bridge stp llc vfat fat snd_hda_codec_realtek snd_hda_codec_generic
>> mt7921e
>> snd_hda_codec_hdmi mt7921_common snd_usb_audio intel_rapl_msr
>> mt76_connac_lib snd_hda_intel intel_rapl_common snd_intel_dspcfg
>> mt76
>> snd_usbmidi_lib btusb snd_intel_sdw_acpi edac_mce_amd snd_rawmidi
>> btrtl
>> snd_hda_codec btbcm snd_seq_device btintel snd_hda_core kvm_amd mc
>> snd_hwdep
>> eeepc_wmi btmtk snd_pcm asus_wmi kvm mac80211 bluetooth
>> ledtrig_audio
>> atlantic snd_timer i8042 sparse_keymap libarc4 ecdh_generic rapl
>> platform_profile serio intel_wmi_thunderbolt i2c_piix4 wmi_bmof
>> pcspkr
>> k10temp thunderbolt snd igc ucsi_acpi macsec soundcore cfg80211
>> typec_ucsi
>> mousedev joydev typec roles rfkill gpio_amdpt acpi_cpufreq
>> gpio_generic
>> mac_hid dm_multipath
>> crypto_user fuse loop bpf_preload ip_tables x_tables ext4
>> crc32c_generic
>> crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder
>> tee
>> dm_mod hid_logitech_hidpp hid_logitech_dj usbhid amdgpu
>> crct10dif_pclmul
>> crc32_pclmul crc32c_intel polyval_clmulni polyval_generic
>> i2c_algo_bit
>> drm_ttm_helper gf128mul nvme ghash_clmulni_intel ttm sha512_ssse3
>> drm_buddy
>> aesni_intel gpu_sched crypto_simd nvme_core drm_display_helper
>> cryptd ccp
>> xhci_pci cec nvme_common xhci_pci_renesas video wmi vfio_pci
>> vfio_pci_core
>> irqbypass vfio_iommu_type1 vfio iommufd
>> CR2: ffffb4ff0cd20034
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:try_grab_folio+0x14f/0x370
>> Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60
>> 4c 8b 63
>> 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44
>> 24 34 85
>> c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
>> RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
>> RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
>> RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
>> RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
>> R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
>> R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
>> FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
>> PKRU: 55555554
>> note: CPU 7/KVM[2675] exited with irqs disabled
>>
>> It seems to appear randomly, so bisecting it would probably be
>> difficult. Also, as far as I can tell it seems to be a recent
>> regression (i.e. it was introduced in one of the 6.3.y releases).
>>
>>
>
> So v6.2.y looks fine (doesn't have this regression)?

Yes, I didn't see this issue on v6.2.y.

>
> --
> An old man doll... just what I always wanted! - Clara



2023-06-17 00:36:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: KVM page-fault on Kernel 6.3.8

On Fri, Jun 16, 2023, Hamza Mahfooz wrote:
> I am seeing the following page-fault on the latest stable kernel:

What makes you think this is KVM related? I don't see anything KVM related in
the splat. The !PRESENT #PF is coming from aio_read(), not from KVM. The
?kvm_arch_vcpu_put line is just mispeculation from the unwinder.

> BUG: unable to handle page fault for address: ffffb4ff0cd20034
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 10002a067 P4D 10002a067 PUD 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 PID: 2675 Comm: CPU 7/KVM Not tainted 6.3.8-arch1-1 #1
> a1d299e746aebdb27c523dd3bd94aba6f54915c7
> Hardware name: ASUS System Product Name/ProArt X670E-CREATOR WIFI, BIOS 1303
> 04/27/2023
> RIP: 0010:try_grab_folio+0x14f/0x370
> Code: 83 f8 04 75 6f 44 89 ee 4c 89 e7 e8 6b bc 0b 00 84 c0 74 60 4c 8b 63
> 08 41 f6 c4 01 0f 85 b0 01 00 00 0f 1f 44 00 00 49 89 dc <41> 8b 44 24 34 85
> c0 0f 88 f8 00 00 00 41 8b 44 24 34 85 c0 74 58
> RSP: 0018:ffff9fa98504b948 EFLAGS: 00010086
> RAX: 0000000000000002 RBX: fffff4ff0cd21480 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: 0000000000000001 RDI: fffff4ff0cd21480
> RBP: 0000000000000000 R08: ffff8b2edb510980 R09: 00007f5624253000
> R10: 80000003348008e7 R11: 00007f5624253000 R12: ffffb4ff0cd20000
> R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000001
> FS: 00007f548a7fc6c0(0000) GS:ffff8b35f83c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffb4ff0cd20034 CR3: 0000000113e70000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __die+0x23/0x70
> ? page_fault_oops+0x171/0x4e0
> ? exc_page_fault+0x172/0x180
> ? asm_exc_page_fault+0x26/0x30
> ? try_grab_folio+0x14f/0x370
> internal_get_user_pages_fast+0x883/0x1150
> __iov_iter_get_pages_alloc+0xdd/0x780
> ? kmem_cache_alloc+0x16f/0x330
> ? bio_associate_blkg_from_css+0xcd/0x340
> iov_iter_get_pages+0x1d/0x40
> bio_iov_iter_get_pages+0xa1/0x480
> __blkdev_direct_IO_async+0xc5/0x1b0
> blkdev_read_iter+0x127/0x1d0
> aio_read+0x132/0x210
> ? io_submit_one+0x46a/0x8b0
> io_submit_one+0x46a/0x8b0
> ? kvm_arch_vcpu_put+0x128/0x190 [kvm
> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
> ? kvm_arch_vcpu_ioctl_run+0x579/0x1770 [kvm
> 711ceda1c40511ce22d1f99f4e9e574def76b25e]
> __x64_sys_io_submit+0xad/0x190
> do_syscall_64+0x5d/0x90
> ? __x64_sys_ioctl+0xac/0xd0
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? do_syscall_64+0x6c/0x90
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7f57ac0912ed
> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 3b 7a 0d 00 f7 d8 64 89 01 48
> RSP: 002b:00007f5427ab97b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> RAX: ffffffffffffffda RBX: 00007f548a7fc1d0 RCX: 00007f57ac0912ed
> RDX: 00007f5427ab9800 RSI: 0000000000000001 RDI: 00007f57a9d24000
> RBP: 00007f57a9d24000 R08: 0000000000000001 R09: 0000000000000001
> R10: 00007f54740044f0 R11: 0000000000000246 R12: 0000000000000001
> R13: 0000000000000004 R14: 00007f5427ab9800 R15: 000000000000000e

2023-06-17 20:14:50

by Hamza Mahfooz

[permalink] [raw]
Subject: Re: KVM page-fault on Kernel 6.3.8


On Fri, Jun 16 2023 at 04:59:36 PM -07:00:00, Sean Christopherson
<[email protected]> wrote:
> What makes you think this is KVM related? I don't see anything KVM
> related in

Well, it corresponds a QEMU crash.

> the splat. The !PRESENT #PF is coming from aio_read(), not from KVM.
> The
> ?kvm_arch_vcpu_put line is just mispeculation from the unwinder.