2022-11-19 09:03:39

by Sander Eikelenboom

[permalink] [raw]
Subject: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0

Hi Yu / Juergen,

This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable and a Linux-6.1.0-rc5 kernel.
I did enable the new and shiny MGLRU, could this be related ?

--
Sander


Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page fault for address: ffff8880083374d0
Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write access in kernel mode
Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) - permissions violation
Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT SMP NOPTI
Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1
Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022
Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP: e030:pmdp_test_and_clear_young+0x25/0x40
Nov 19 06:30:11 serveerstertje kernel: [68959.760294] Code: 00 00 00 66 90 48 b9 ff ff 1f 00 00 00 f0 ff 48 8b 02 48 be ff 0f 00 00 00 00 f0 ff a8 80 48 0f 44 ce 48 21 c8 83 e0 20 74 0c <f0> 48 0f ba 32 05 0f 92 c0 0f b6 c0 c3 cc cc c
Nov 19 06:30:11 serveerstertje kernel: [68959.787908] RSP: e02b:ffffc9000161f940 EFLAGS: 00010202
Nov 19 06:30:11 serveerstertje kernel: [68959.801637] RAX: 0000000000000020 RBX: 0000000000000000 RCX: fff0000000000fff
Nov 19 06:30:11 serveerstertje kernel: [68959.815243] RDX: ffff8880083374d0 RSI: fff0000000000fff RDI: ffff888010f41000
Nov 19 06:30:11 serveerstertje kernel: [68959.828683] RBP: ffffc9000161fa70 R08: 000ffffffffff000 R09: 00005654134b5000
Nov 19 06:30:11 serveerstertje kernel: [68959.842026] R10: 000000000000689e R11: 0000000000000000 R12: ffff8880083374d0
Nov 19 06:30:11 serveerstertje kernel: [68959.855214] R13: ffff88807fc1a000 R14: ffff8880083374d0 R15: 0000000000000000
Nov 19 06:30:11 serveerstertje kernel: [68959.868118] FS: 0000000000000000(0000) GS:ffff8880801c0000(0000) knlGS:0000000000000000
Nov 19 06:30:11 serveerstertje kernel: [68959.880689] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 19 06:30:11 serveerstertje kernel: [68959.893457] CR2: ffff8880083374d0 CR3: 000000000f33c000 CR4: 0000000000050660
Nov 19 06:30:11 serveerstertje kernel: [68959.906377] Call Trace:
Nov 19 06:30:11 serveerstertje kernel: [68959.919219] <TASK>
Nov 19 06:30:11 serveerstertje kernel: [68959.931844] walk_pmd_range_locked.isra.87+0x2e9/0x4e0
Nov 19 06:30:11 serveerstertje kernel: [68959.944840] walk_pud_range+0x69c/0x980
Nov 19 06:30:11 serveerstertje kernel: [68959.957562] walk_pgd_range+0xe9/0x810
Nov 19 06:30:11 serveerstertje kernel: [68959.970161] ? mt_find+0x1f8/0x3c0
Nov 19 06:30:11 serveerstertje kernel: [68959.982808] __walk_page_range+0x17b/0x180
Nov 19 06:30:11 serveerstertje kernel: [68959.995440] walk_page_range+0x106/0x170
Nov 19 06:30:11 serveerstertje kernel: [68960.008014] try_to_inc_max_seq+0x40a/0x9e0
Nov 19 06:30:11 serveerstertje kernel: [68960.020262] lru_gen_age_node+0x1d3/0x280
Nov 19 06:30:11 serveerstertje kernel: [68960.032222] ? shrink_node+0x294/0x710
Nov 19 06:30:11 serveerstertje kernel: [68960.044129] balance_pgdat+0x1c3/0x650
Nov 19 06:30:11 serveerstertje kernel: [68960.055995] ? prepare_to_wait_event+0x110/0x110
Nov 19 06:30:11 serveerstertje kernel: [68960.068022] kswapd+0x1f0/0x3a0
Nov 19 06:30:11 serveerstertje kernel: [68960.079997] ? prepare_to_


2022-11-21 07:17:34

by Juergen Gross

[permalink] [raw]
Subject: Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0

On 19.11.22 09:28, Sander Eikelenboom wrote:
> Hi Yu / Juergen,
>
> This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable
> and a Linux-6.1.0-rc5 kernel.
> I did enable the new and shiny MGLRU, could this be related ?

It might be related, but I think it could happen independently from it.

> Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page
> fault for address: ffff8880083374d0
> Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write
> access in kernel mode
> Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) -
> permissions violation
> Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067
> PUD 3027067 PMD 7fee5067 PTE 8010000008337065
> Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT
> SMP NOPTI
> Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm:
> kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1
> Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be
> Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022
> Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP:
> e030:pmdp_test_and_clear_young+0x25/0x40

The kernel tired to reset the "accessed" bit in the pmd entry.

It does so only since commit eed9a328aa1ae. Before that
pmdp_test_and_clear_young() could be called only for huge pages, which are
disabled in Xen PV guests.

pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, which
is failing since the hypervisor is emulating pte entry modifications only (pmd
and pud entries can be set via hypercalls only).

Could you please test the attached patch whether it fixes the issue for you?


Juergen


Attachments:
0001-x86-mm-fix-pmdp_test_and_clear_young-for-Xen-PV-gues.patch (1.95 kB)
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-11-21 08:42:15

by Juergen Gross

[permalink] [raw]
Subject: Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0

On 21.11.22 09:18, Yu Zhao wrote:
> On Mon, Nov 21, 2022 at 12:10 AM Juergen Gross <[email protected]> wrote:
>>
>> On 19.11.22 09:28, Sander Eikelenboom wrote:
>>> Hi Yu / Juergen,
>
> Hi Sander / Juergen,
>
> Thanks for the report and the analysis.
>
>>> This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable
>>> and a Linux-6.1.0-rc5 kernel.
>>> I did enable the new and shiny MGLRU, could this be related ?
>>
>> It might be related, but I think it could happen independently from it.
>
> Yes, I think it's related.
>
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page
>>> fault for address: ffff8880083374d0
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write
>>> access in kernel mode
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) -
>>> permissions violation
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067
>>> PUD 3027067 PMD 7fee5067 PTE 8010000008337065
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT
>>> SMP NOPTI
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm:
>>> kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be
>>> Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022
>>> Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP:
>>> e030:pmdp_test_and_clear_young+0x25/0x40
>>
>> The kernel tired to reset the "accessed" bit in the pmd entry.
>
> Correct.
>
>> It does so only since commit eed9a328aa1ae. Before that
>> pmdp_test_and_clear_young() could be called only for huge pages, which are
>> disabled in Xen PV guests.
>
> Correct. After that commit, we also can clear the accessed bit in
> non-leaf PMD entries (pointing to PTE tables).
>
>> pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, which
>> is failing since the hypervisor is emulating pte entry modifications only (pmd
>> and pud entries can be set via hypercalls only).
>>
>> Could you please test the attached patch whether it fixes the issue for you?
>
> There is a runtime kill switch for ARCH_HAS_NONLEAF_PMD_YOUNG, since I
> wasn't able to verify this capability on all x86 varieties. The following
> should do it:
>
> # cat /sys/kernel/mm/lru_gen/enabled
> 0x0007
> # echo 3 >/sys/kernel/mm/lru_gen/enabled
>
> Details are in Documentation/admin-guide/mm/multigen_lru.rst.
>
> Alternatively, we could make ARCH_HAS_NONLEAF_PMD_YOUNG a runtime
> check similar to arch_has_hw_pte_young() on arm64.

I like this idea.

The patch should be rather trivial. Let me have a try ...


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-11-21 08:54:33

by Yu Zhao

[permalink] [raw]
Subject: Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0

On Mon, Nov 21, 2022 at 12:10 AM Juergen Gross <[email protected]> wrote:
>
> On 19.11.22 09:28, Sander Eikelenboom wrote:
> > Hi Yu / Juergen,

Hi Sander / Juergen,

Thanks for the report and the analysis.

> > This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable
> > and a Linux-6.1.0-rc5 kernel.
> > I did enable the new and shiny MGLRU, could this be related ?
>
> It might be related, but I think it could happen independently from it.

Yes, I think it's related.

> > Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page
> > fault for address: ffff8880083374d0
> > Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write
> > access in kernel mode
> > Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) -
> > permissions violation
> > Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067
> > PUD 3027067 PMD 7fee5067 PTE 8010000008337065
> > Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT
> > SMP NOPTI
> > Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm:
> > kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1
> > Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be
> > Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022
> > Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP:
> > e030:pmdp_test_and_clear_young+0x25/0x40
>
> The kernel tired to reset the "accessed" bit in the pmd entry.

Correct.

> It does so only since commit eed9a328aa1ae. Before that
> pmdp_test_and_clear_young() could be called only for huge pages, which are
> disabled in Xen PV guests.

Correct. After that commit, we also can clear the accessed bit in
non-leaf PMD entries (pointing to PTE tables).

> pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, which
> is failing since the hypervisor is emulating pte entry modifications only (pmd
> and pud entries can be set via hypercalls only).
>
> Could you please test the attached patch whether it fixes the issue for you?

There is a runtime kill switch for ARCH_HAS_NONLEAF_PMD_YOUNG, since I
wasn't able to verify this capability on all x86 varieties. The following
should do it:

# cat /sys/kernel/mm/lru_gen/enabled
0x0007
# echo 3 >/sys/kernel/mm/lru_gen/enabled

Details are in Documentation/admin-guide/mm/multigen_lru.rst.

Alternatively, we could make ARCH_HAS_NONLEAF_PMD_YOUNG a runtime
check similar to arch_has_hw_pte_young() on arm64.