Not all pages may apply to pgtable check. One example is ZONE_DEVICE
pages: they map PFNs directly, and they don't allocate page_ext at all even
if there's struct page around. One may reference devm_memremap_pages().
When both ZONE_DEVICE and page-table-check enabled, then try to map some
dax memories, one can trigger kernel bug constantly now when the kernel was
trying to inject some pfn maps on the dax device:
kernel BUG at mm/page_table_check.c:55!
While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
fault resolutions, skip all the checks if page_ext doesn't even exist in
pgtable checker, which applies to ZONE_DEVICE but maybe more.
Cc: Dan Williams <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/page_table_check.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 4169576bed72..509c6ef8de40 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+ if (!page_ext)
+ return;
+
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@@ -110,6 +113,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+ if (!page_ext)
+ return;
+
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@@ -140,7 +146,10 @@ void __page_table_check_zero(struct page *page, unsigned int order)
BUG_ON(PageSlab(page));
page_ext = page_ext_get(page);
- BUG_ON(!page_ext);
+
+ if (!page_ext)
+ return;
+
for (i = 0; i < (1ul << order); i++) {
struct page_table_check *ptc = get_page_table_check(page_ext);
--
2.45.0
On Wed, 5 Jun 2024 17:21:46 -0400 Peter Xu <[email protected]> wrote:
> Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> pages: they map PFNs directly, and they don't allocate page_ext at all even
> if there's struct page around. One may reference devm_memremap_pages().
>
> When both ZONE_DEVICE and page-table-check enabled, then try to map some
> dax memories, one can trigger kernel bug constantly now when the kernel was
> trying to inject some pfn maps on the dax device:
>
> kernel BUG at mm/page_table_check.c:55!
>
> While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> fault resolutions, skip all the checks if page_ext doesn't even exist in
> pgtable checker, which applies to ZONE_DEVICE but maybe more.
Do we have a Reported-by: for this one?
And a Fixes? It looks like df4e817b7108?
[ add Alistair ]
Peter Xu wrote:
> Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> pages: they map PFNs directly, and they don't allocate page_ext at all even
> if there's struct page around. One may reference devm_memremap_pages().
>
> When both ZONE_DEVICE and page-table-check enabled, then try to map some
> dax memories, one can trigger kernel bug constantly now when the kernel was
> trying to inject some pfn maps on the dax device:
>
> kernel BUG at mm/page_table_check.c:55!
>
> While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> fault resolutions, skip all the checks if page_ext doesn't even exist in
> pgtable checker, which applies to ZONE_DEVICE but maybe more.
This looks correct to me, and needed in the near term. You can add:
Reviewed-by: Dan Williams <[email protected]>
In the long term, the page_ext check may not be needed. I.e. the reason
I added Alistair was in case his work to make DAX pages behave like
typical pages for reference counting would also make them behave the
same for the presence of page_ext.
Dan Williams <[email protected]> writes:
> [ add Alistair ]
>
> Peter Xu wrote:
>> Not all pages may apply to pgtable check. One example is ZONE_DEVICE
>> pages: they map PFNs directly, and they don't allocate page_ext at all even
>> if there's struct page around. One may reference devm_memremap_pages().
>>
>> When both ZONE_DEVICE and page-table-check enabled, then try to map some
>> dax memories, one can trigger kernel bug constantly now when the kernel was
>> trying to inject some pfn maps on the dax device:
>>
>> kernel BUG at mm/page_table_check.c:55!
>>
>> While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
>> fault resolutions, skip all the checks if page_ext doesn't even exist in
>> pgtable checker, which applies to ZONE_DEVICE but maybe more.
>
> This looks correct to me, and needed in the near term. You can add:
>
> Reviewed-by: Dan Williams <[email protected]>
>
> In the long term, the page_ext check may not be needed. I.e. the reason
> I added Alistair was in case his work to make DAX pages behave like
> typical pages for reference counting would also make them behave the
> same for the presence of page_ext.
It doesn't currently. However I did run into this bug while I was
developing those so please add:
Reviewed-by: Alistair Popple <[email protected]>
On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <[email protected]> wrote:
>
> Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> pages: they map PFNs directly, and they don't allocate page_ext at all even
> if there's struct page around. One may reference devm_memremap_pages().
>
> When both ZONE_DEVICE and page-table-check enabled, then try to map some
> dax memories, one can trigger kernel bug constantly now when the kernel was
> trying to inject some pfn maps on the dax device:
>
> kernel BUG at mm/page_table_check.c:55!
>
> While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> fault resolutions, skip all the checks if page_ext doesn't even exist in
> pgtable checker, which applies to ZONE_DEVICE but maybe more.
Thank you for reporting this bug. A few comments below:
>
> Cc: Dan Williams <[email protected]>
> Cc: Pasha Tatashin <[email protected]>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/page_table_check.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 4169576bed72..509c6ef8de40 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
> page = pfn_to_page(pfn);
> page_ext = page_ext_get(page);
>
> + if (!page_ext)
> + return;
I would replace the above with the following, here and in other places:
if (!page_ext) {
WARN_ONCE(!is_zone_device_page(page),
"page_ext is missing for a non-device page\n");
return;
}
> +
> BUG_ON(PageSlab(page));
> anon = PageAnon(page);
>
> @@ -110,6 +113,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
> page = pfn_to_page(pfn);
> page_ext = page_ext_get(page);
>
> + if (!page_ext)
> + return;
> +
> BUG_ON(PageSlab(page));
> anon = PageAnon(page);
>
> @@ -140,7 +146,10 @@ void __page_table_check_zero(struct page *page, unsigned int order)
> BUG_ON(PageSlab(page));
>
> page_ext = page_ext_get(page);
> - BUG_ON(!page_ext);
> +
> + if (!page_ext)
> + return;
> +
> for (i = 0; i < (1ul << order); i++) {
> struct page_table_check *ptc = get_page_table_check(page_ext);
>
> --
> 2.45.0
>
Pasha Tatashin wrote:
> On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <[email protected]> wrote:
> >
> > Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> > pages: they map PFNs directly, and they don't allocate page_ext at all even
> > if there's struct page around. One may reference devm_memremap_pages().
> >
> > When both ZONE_DEVICE and page-table-check enabled, then try to map some
> > dax memories, one can trigger kernel bug constantly now when the kernel was
> > trying to inject some pfn maps on the dax device:
> >
> > kernel BUG at mm/page_table_check.c:55!
> >
> > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> > fault resolutions, skip all the checks if page_ext doesn't even exist in
> > pgtable checker, which applies to ZONE_DEVICE but maybe more.
>
> Thank you for reporting this bug. A few comments below:
>
> >
> > Cc: Dan Williams <[email protected]>
> > Cc: Pasha Tatashin <[email protected]>
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > mm/page_table_check.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> > index 4169576bed72..509c6ef8de40 100644
> > --- a/mm/page_table_check.c
> > +++ b/mm/page_table_check.c
> > @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
> > page = pfn_to_page(pfn);
> > page_ext = page_ext_get(page);
> >
> > + if (!page_ext)
> > + return;
>
> I would replace the above with the following, here and in other places:
>
> if (!page_ext) {
> WARN_ONCE(!is_zone_device_page(page),
> "page_ext is missing for a non-device page\n");
> return;
> }
Hmm, but this function is silent for the !pfn_valid(@pfn) case, and the
old cold has BUG_ON(!page_ext). So we know the caller is not being
careful about @pfn, and existing code is likely avoiding the BUG_ON().
The justification for the WARN_ONCE(), or maybe VM_WARN_ONCE(), would
be if there is a high likelihood that ongoing kernel changes introduce
more pfn_valid() but not page_ext covered pages? Is that a realistic
scenario?
On Wed, Jun 5, 2024 at 8:20 PM Dan Williams <[email protected]> wrote:
>
> Pasha Tatashin wrote:
> > On Wed, Jun 5, 2024 at 5:21 PM Peter Xu <[email protected]> wrote:
> > >
> > > Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> > > pages: they map PFNs directly, and they don't allocate page_ext at all even
> > > if there's struct page around. One may reference devm_memremap_pages().
> > >
> > > When both ZONE_DEVICE and page-table-check enabled, then try to map some
> > > dax memories, one can trigger kernel bug constantly now when the kernel was
> > > trying to inject some pfn maps on the dax device:
> > >
> > > kernel BUG at mm/page_table_check.c:55!
> > >
> > > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> > > fault resolutions, skip all the checks if page_ext doesn't even exist in
> > > pgtable checker, which applies to ZONE_DEVICE but maybe more.
> >
> > Thank you for reporting this bug. A few comments below:
> >
> > >
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Pasha Tatashin <[email protected]>
> > > Signed-off-by: Peter Xu <[email protected]>
> > > ---
> > > mm/page_table_check.c | 11 ++++++++++-
> > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> > > index 4169576bed72..509c6ef8de40 100644
> > > --- a/mm/page_table_check.c
> > > +++ b/mm/page_table_check.c
> > > @@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
> > > page = pfn_to_page(pfn);
> > > page_ext = page_ext_get(page);
> > >
> > > + if (!page_ext)
> > > + return;
> >
> > I would replace the above with the following, here and in other places:
> >
> > if (!page_ext) {
> > WARN_ONCE(!is_zone_device_page(page),
> > "page_ext is missing for a non-device page\n");
> > return;
> > }
>
> Hmm, but this function is silent for the !pfn_valid(@pfn) case, and the
> old cold has BUG_ON(!page_ext). So we know the caller is not being
> careful about @pfn, and existing code is likely avoiding the BUG_ON().
>
> The justification for the WARN_ONCE(), or maybe VM_WARN_ONCE(), would
> be if there is a high likelihood that ongoing kernel changes introduce
> more pfn_valid() but not page_ext covered pages? Is that a realistic
> scenario?
Good point, it is unlikely we will have scenarios without page_ext.
Reviewed-by: Pasha Tatashin <[email protected]>
On Wed, Jun 05, 2024 at 03:05:43PM -0700, Andrew Morton wrote:
> On Wed, 5 Jun 2024 17:21:46 -0400 Peter Xu <[email protected]> wrote:
>
> > Not all pages may apply to pgtable check. One example is ZONE_DEVICE
> > pages: they map PFNs directly, and they don't allocate page_ext at all even
> > if there's struct page around. One may reference devm_memremap_pages().
> >
> > When both ZONE_DEVICE and page-table-check enabled, then try to map some
> > dax memories, one can trigger kernel bug constantly now when the kernel was
> > trying to inject some pfn maps on the dax device:
> >
> > kernel BUG at mm/page_table_check.c:55!
> >
> > While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
> > fault resolutions, skip all the checks if page_ext doesn't even exist in
> > pgtable checker, which applies to ZONE_DEVICE but maybe more.
>
> Do we have a Reported-by: for this one?
Nop, I just hit that when I started to look at the dax issues.
>
> And a Fixes? It looks like df4e817b7108?
Yes that commit should be proper.
Thanks,
--
Peter Xu