2021-07-13 15:21:56

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches

(This v2 is because I didn't refresh the patches from my git tree properly
before sending, sorry for the noise)

This series is some fixes that would have likely have been included in
the 5.14-rc1 merge window if they were on time. Mail indicates that some
may already be picked up for mmotm but the tree is not up to date yet so
I'm including them just in case.

Three are fixes to the bulk memory allocator and one is a fallout from
cleaning up warnings that trips BTF that expected a symbol to be global.

mm/page_alloc.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)

--
2.26.2


2021-07-13 15:22:37

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held

Syzbot is reporting potential deadlocks due to pagesets.lock when
PAGE_OWNER is enabled. One example from Desmond Cheong Zhi Xi is
as follows

__alloc_pages_bulk()
local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
prep_new_page():
post_alloc_hook():
set_page_owner():
__set_page_owner():
save_stack():
stack_depot_save():
alloc_pages():
alloc_page_interleave():
__alloc_pages():
get_page_from_freelist():
rm_queue():
rm_queue_pcplist():
local_lock_irqsave(&pagesets.lock, flags);
*** DEADLOCK ***

Zhang, Qiang also reported

BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
.....
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
__alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147
alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270
stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
save_stack+0x15e/0x1e0 mm/page_owner.c:120
__set_page_owner+0x50/0x290 mm/page_owner.c:181
prep_new_page mm/page_alloc.c:2445 [inline]
__alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline]
vm_area_alloc_pages mm/vmalloc.c:2775 [inline]
__vmalloc_area_node mm/vmalloc.c:2845 [inline]
__vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947
__vmalloc_node mm/vmalloc.c:2996 [inline]
vzalloc+0x67/0x80 mm/vmalloc.c:3066

There are a number of ways it could be fixed. The page owner code could
be audited to strip GFP flags that allow sleeping but it'll impair the
functionality of PAGE_OWNER if allocations fail. The bulk allocator
could add a special case to release/reacquire the lock for prep_new_page
and lookup PCP after the lock is reacquired at the cost of performance.
The pages requiring prep could be tracked using the least significant
bit and looping through the array although it is more complicated for
the list interface. The options are relatively complex and the second
one still incurs a performance penalty when PAGE_OWNER is active so this
patch takes the simple approach -- disable bulk allocation if PAGE_OWNER is
active. The caller will be forced to allocate one page at a time incurring
a performance penalty but PAGE_OWNER is already a performance penalty.

Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
Reported-by: Desmond Cheong Zhi Xi <[email protected]>
Reported-by: "Zhang, Qiang" <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
---
mm/page_alloc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b97e17806be..6ef86f338151 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5239,6 +5239,18 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
if (nr_pages - nr_populated == 1)
goto failed;

+#ifdef CONFIG_PAGE_OWNER
+ /*
+ * PAGE_OWNER may recurse into the allocator to allocate space to
+ * save the stack with pagesets.lock held. Releasing/reacquiring
+ * removes much of the performance benefit of bulk allocation so
+ * force the caller to allocate one page at a time as it'll have
+ * similar performance to added complexity to the bulk allocator.
+ */
+ if (static_branch_unlikely(&page_owner_inited))
+ goto failed;
+#endif
+
/* May set ALLOC_NOFRAGMENT, fragmentation will return 1 page. */
gfp &= gfp_allowed_mask;
alloc_gfp = gfp;
--
2.26.2

2021-07-13 15:23:13

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 2/4] mm/page_alloc: correct return value when failing at preparing

From: Yanfei Xu <[email protected]>

If the array passed in is already partially populated, we should
return "nr_populated" even failing at preparing arguments stage.

Signed-off-by: Yanfei Xu <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ef86f338151..803414ce9264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5255,7 +5255,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
gfp &= gfp_allowed_mask;
alloc_gfp = gfp;
if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
- return 0;
+ return nr_populated;
gfp = alloc_gfp;

/* Find an allowed local zone that meets the low watermark. */
--
2.26.2

2021-07-13 15:23:38

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"

From: Matteo Croce <[email protected]>

This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.

Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:

LD vmlinux
BTFIDS vmlinux
FAILED unresolved symbol should_fail_alloc_page
make: *** [Makefile:1199: vmlinux] Error 255
make: *** Deleting file 'vmlinux'

Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static")
Signed-off-by: Matteo Croce <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c66f1e6204c2..3e97e68aef7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3820,7 +3820,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)

#endif /* CONFIG_FAIL_PAGE_ALLOC */

-static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
{
return __should_fail_alloc_page(gfp_mask, order);
}
--
2.26.2

2021-07-13 15:24:12

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value

From: Chuck Lever <[email protected]>

The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
bounds check after checking populated elements") was possibly
confused by the mixture of return values throughout the function.

The API contract is clear that the function "Returns the number of
pages on the list or array." It does not list zero as a unique
return value with a special meaning. Therefore zero is a plausible
return value only if @nr_pages is zero or less.

Clean up the return logic to make it clear that the returned value
is always the total number of pages in the array/list, not the
number of pages that were allocated during this call.

The only change in behavior with this patch is the value returned
if prepare_alloc_pages() fails. To match the API contract, the
number of pages currently in the array/list is returned in this
case.

The call site in __page_pool_alloc_pages_slow() also seems to be
confused on this matter. It should be attended to by someone who
is familiar with that code.

[[email protected]: Return nr_populated if 0 pages are requested]
Signed-off-by: Chuck Lever <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 803414ce9264..c66f1e6204c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5221,9 +5221,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
unsigned int alloc_flags = ALLOC_WMARK_LOW;
int nr_populated = 0, nr_account = 0;

- if (unlikely(nr_pages <= 0))
- return 0;
-
/*
* Skip populated array elements to determine if any pages need
* to be allocated before disabling IRQs.
@@ -5231,9 +5228,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
while (page_array && nr_populated < nr_pages && page_array[nr_populated])
nr_populated++;

+ /* No pages requested? */
+ if (unlikely(nr_pages <= 0))
+ goto out;
+
/* Already populated array? */
if (unlikely(page_array && nr_pages - nr_populated == 0))
- return nr_populated;
+ goto out;

/* Use the single page allocator for one page. */
if (nr_pages - nr_populated == 1)
@@ -5255,7 +5256,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
gfp &= gfp_allowed_mask;
alloc_gfp = gfp;
if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
- return nr_populated;
+ goto out;
gfp = alloc_gfp;

/* Find an allowed local zone that meets the low watermark. */
@@ -5323,6 +5324,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);

+out:
return nr_populated;

failed_irq:
@@ -5338,7 +5340,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nr_populated++;
}

- return nr_populated;
+ goto out;
}
EXPORT_SYMBOL_GPL(__alloc_pages_bulk);

--
2.26.2

2021-07-13 15:37:19

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value



> On Jul 13, 2021, at 11:20 AM, Mel Gorman <[email protected]> wrote:
>
> From: Chuck Lever <[email protected]>
>
> The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
> bounds check after checking populated elements") was possibly
> confused by the mixture of return values throughout the function.
>
> The API contract is clear that the function "Returns the number of
> pages on the list or array." It does not list zero as a unique
> return value with a special meaning. Therefore zero is a plausible
> return value only if @nr_pages is zero or less.
>
> Clean up the return logic to make it clear that the returned value
> is always the total number of pages in the array/list, not the
> number of pages that were allocated during this call.
>
> The only change in behavior with this patch is the value returned
> if prepare_alloc_pages() fails. To match the API contract, the
> number of pages currently in the array/list is returned in this
> case.
>
> The call site in __page_pool_alloc_pages_slow() also seems to be
> confused on this matter. It should be attended to by someone who
> is familiar with that code.
>
> [[email protected]: Return nr_populated if 0 pages are requested]
> Signed-off-by: Chuck Lever <[email protected]>
> Acked-by: Jesper Dangaard Brouer <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/page_alloc.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 803414ce9264..c66f1e6204c2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5221,9 +5221,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> unsigned int alloc_flags = ALLOC_WMARK_LOW;
> int nr_populated = 0, nr_account = 0;
>
> - if (unlikely(nr_pages <= 0))
> - return 0;
> -
> /*
> * Skip populated array elements to determine if any pages need
> * to be allocated before disabling IRQs.
> @@ -5231,9 +5228,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> while (page_array && nr_populated < nr_pages && page_array[nr_populated])
> nr_populated++;
>
> + /* No pages requested? */
> + if (unlikely(nr_pages <= 0))
> + goto out;
> +
> /* Already populated array? */
> if (unlikely(page_array && nr_pages - nr_populated == 0))
> - return nr_populated;
> + goto out;
>
> /* Use the single page allocator for one page. */
> if (nr_pages - nr_populated == 1)
> @@ -5255,7 +5256,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> gfp &= gfp_allowed_mask;
> alloc_gfp = gfp;
> if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
> - return nr_populated;
> + goto out;

:thumbsup: Thanks!


> gfp = alloc_gfp;
>
> /* Find an allowed local zone that meets the low watermark. */
> @@ -5323,6 +5324,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
> zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
>
> +out:
> return nr_populated;
>
> failed_irq:
> @@ -5338,7 +5340,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> nr_populated++;
> }
>
> - return nr_populated;
> + goto out;
> }
> EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
>
> --
> 2.26.2
>

--
Chuck Lever



2021-07-14 07:10:17

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"

On 7/13/21 8:21 AM, Mel Gorman wrote:
> From: Matteo Croce <[email protected]>
>
> This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
>
> Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
>
> LD vmlinux
> BTFIDS vmlinux
> FAILED unresolved symbol should_fail_alloc_page
> make: *** [Makefile:1199: vmlinux] Error 255
> make: *** Deleting file 'vmlinux'

Yes! I ran into this yesterday. Your patch fixes this build failure
for me, so feel free to add:

Tested-by: John Hubbard <[email protected]>


However, I should add that I'm still seeing another build failure, after
fixing the above:

LD vmlinux
BTFIDS vmlinux
FAILED elf_update(WRITE): no error
make: *** [Makefile:1176: vmlinux] Error 255
make: *** Deleting file 'vmlinux'


...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
who is understands the BTFIDS build step can shed some light on that; I'm
not there yet. :)


thanks,
--
John Hubbard
NVIDIA

>
> Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static")
> Signed-off-by: Matteo Croce <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c66f1e6204c2..3e97e68aef7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3820,7 +3820,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
>
> #endif /* CONFIG_FAIL_PAGE_ALLOC */
>
> -static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
> +noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
> {
> return __should_fail_alloc_page(gfp_mask, order);
> }
>

2021-07-15 10:48:25

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"

Cc. Jiri Olsa + Arnaldo

On 14/07/2021 09.06, John Hubbard wrote:
> On 7/13/21 8:21 AM, Mel Gorman wrote:
>> From: Matteo Croce <[email protected]>
>>
>> This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
>>
>> Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
>>
>>    LD      vmlinux
>>    BTFIDS  vmlinux
>> FAILED unresolved symbol should_fail_alloc_page
>> make: *** [Makefile:1199: vmlinux] Error 255
>> make: *** Deleting file 'vmlinux'
>
> Yes! I ran into this yesterday. Your patch fixes this build failure
> for me, so feel free to add:
>
> Tested-by: John Hubbard <[email protected]>
>
>
> However, I should add that I'm still seeing another build failure, after
> fixing the above:
>
> LD      vmlinux
> BTFIDS  vmlinux
> FAILED elf_update(WRITE): no error

This elf_update(WRITE) error is new to me.

> make: *** [Makefile:1176: vmlinux] Error 255
> make: *** Deleting file 'vmlinux'

It is annoying that vmlinux is deleted in this case, because I usually
give Jiri the output from 'resolve_btfids -v' on vmlinux.

$ ./tools/bpf/resolve_btfids/resolve_btfids -v vmlinux.failed

You can do:
$ git diff
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 3b261b0f74f0..02dec10a7d75 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -302,7 +302,8 @@ cleanup()
rm -f .tmp_symversions.lds
rm -f .tmp_vmlinux*
rm -f System.map
- rm -f vmlinux
+ # rm -f vmlinux
+ mv vmlinux vmlinux.failed
rm -f vmlinux.o
}


>
>
> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
> who is understands the BTFIDS build step can shed some light on that; I'm
> not there yet. :)

I'm just a user/consume of output from the BTFIDS build step, I think
Jiri Olsa own the tool resolve_btfids, and ACME pahole. I've hit a
number of issues in the past that Jiri and ACME help resolve quickly.
The most efficient solution I've found was to upgrade pahole to a newer
version.

What version of pahole does your build system have?

What is your GCC version?

--Jesper

2021-07-16 00:09:42

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"

...
>> LD      vmlinux
>> BTFIDS  vmlinux
>> FAILED elf_update(WRITE): no error
>
> This elf_update(WRITE) error is new to me.
>
>> make: *** [Makefile:1176: vmlinux] Error 255
>> make: *** Deleting file 'vmlinux'
>
> It is annoying that vmlinux is deleted in this case, because I usually give Jiri the output from
> 'resolve_btfids -v' on vmlinux.
>
>  $ ./tools/bpf/resolve_btfids/resolve_btfids -v vmlinux.failed
>
> You can do:
> $ git diff
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 3b261b0f74f0..02dec10a7d75 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -302,7 +302,8 @@ cleanup()
>         rm -f .tmp_symversions.lds
>         rm -f .tmp_vmlinux*
>         rm -f System.map
> -       rm -f vmlinux
> +       # rm -f vmlinux
> +       mv vmlinux vmlinux.failed
>         rm -f vmlinux.o
>  }
>
>
>>
>>
>> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
>> who is understands the BTFIDS build step can shed some light on that; I'm
>> not there yet. :)
>
> I'm just a user/consume of output from the BTFIDS build step, I think Jiri Olsa own the tool
> resolve_btfids, and ACME pahole.  I've hit a number of issues in the past that Jiri and ACME help
> resolve quickly.
> The most efficient solution I've found was to upgrade pahole to a newer version.
>
> What version of pahole does your build system have?
>
> What is your GCC version?
>

Just a quick answer first on the versions: this is an up to date Arch Linux system:

gcc: 11.1.0
pahole: 1.21

I'll try to get the other step done later this evening.

thanks,
--
John Hubbard
NVIDIA

2021-07-16 06:05:55

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"

On 7/15/21 5:04 PM, John Hubbard wrote:
...
>>> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
>>> who is understands the BTFIDS build step can shed some light on that; I'm
>>> not there yet. :)
>>
>> I'm just a user/consume of output from the BTFIDS build step, I think Jiri Olsa own the tool
>> resolve_btfids, and ACME pahole.  I've hit a number of issues in the past that Jiri and ACME help
>> resolve quickly.
>> The most efficient solution I've found was to upgrade pahole to a newer version.
>>
>> What version of pahole does your build system have?
>>
>> What is your GCC version?
>>
>
> Just a quick answer first on the versions: this is an up to date Arch Linux system:
>
> gcc: 11.1.0
> pahole: 1.21
>
> I'll try to get the other step done later this evening.

...and...I've lost the repro completely. The only thing I changed was that I
attempted to update pahole. This caused Arch Linux reinstall pahole, claiming
that 1.21 is already the current version.

It acts as if there was something wrong with the pahole installation. This
seems unlikely, given that the system is merely on a routine update schedule.
However, that's the data I have.

If it ever comes up again I'll be able to run resolve_btfids, using your
steps here, so thanks for posting those!


thanks,
--
John Hubbard
NVIDIA