LinuxLists.cc - [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

2021-07-05 17:07:04

Subject: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

In case of simultaneous vmalloc allocations, for example it is 1GB and
12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
kernel.

<snip>
[ 62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0
[ 62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89
[ 62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206
[ 62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928
[ 62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880
[ 62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff
[ 62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80
[ 62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000
[ 62.512637] FS: 0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000
[ 62.512638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0
[ 62.512641] Call Trace:
[ 62.512646] __vmalloc_node_range+0x11c/0x2d0
[ 62.512649] ? full_fit_alloc_test+0x140/0x140 [test_vmalloc]
[ 62.512654] __vmalloc_node+0x4b/0x70
[ 62.512656] ? fix_size_alloc_test+0x44/0x60 [test_vmalloc]
[ 62.512659] fix_size_alloc_test+0x44/0x60 [test_vmalloc]
[ 62.512662] test_func+0xe7/0x1f0 [test_vmalloc]
[ 62.512666] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
[ 62.512668] kthread+0x11a/0x140
[ 62.512671] ? set_kthread_struct+0x40/0x40
[ 62.512672] ret_from_fork+0x22/0x30
<snip>

To address this issue invoke a bulk-allocator many times until all pages
are obtained, i.e. do batched page requests adding cond_resched() meanwhile
to reschedule. Batched value is hard-coded and is 100 pages per call.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
---
mm/vmalloc.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index aaad569e8963..5297958ac7c5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
* to fails, fallback to a single page allocator that is
* more permissive.
*/
- if (!order)
- nr_allocated = alloc_pages_bulk_array_node(
- gfp, nid, nr_pages, pages);
- else
+ if (!order) {
+ while (nr_allocated < nr_pages) {
+ int nr, nr_pages_request;
+
+ /*
+ * A maximum allowed request is hard-coded and is 100
+ * pages per call. That is done in order to prevent a
+ * long preemption off scenario in the bulk-allocator
+ * so the range is [1:100].
+ */
+ nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
+
+ nr = alloc_pages_bulk_array_node(gfp, nid,
+ nr_pages_request, pages + nr_allocated);
+
+ nr_allocated += nr;
+ cond_resched();
+
+ /*
+ * If zero or pages were obtained partly,
+ * fallback to a single page allocator.
+ */
+ if (nr != nr_pages_request)
+ break;
+ }
+ } else
/*
* Compound pages required for remap_vmalloc_page if
* high-order pages.
--
2.20.1

2021-07-05 17:07:17

by Uladzislau Rezki

[permalink] [raw]

Subject: [PATCH v2 2/2] mm/vmalloc: Remove gfpflags_allow_blocking() check

Get rid of gfpflags_allow_blocking() check from the vmalloc() path
as it is supposed to be sleepable anyway. Thus remove it from the
alloc_vmap_area() as well as from the vm_area_alloc_pages().

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
---
mm/vmalloc.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5297958ac7c5..93a9cbdba905 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1479,6 +1479,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
int node, gfp_t gfp_mask)
{
struct vmap_area *va;
+ unsigned long freed;
unsigned long addr;
int purged = 0;
int ret;
@@ -1542,13 +1543,12 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
goto retry;
}

- if (gfpflags_allow_blocking(gfp_mask)) {
- unsigned long freed = 0;
- blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
- if (freed > 0) {
- purged = 0;
- goto retry;
- }
+ freed = 0;
+ blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
+
+ if (freed > 0) {
+ purged = 0;
+ goto retry;
}

if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
@@ -2834,9 +2834,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
for (i = 0; i < (1U << order); i++)
pages[nr_allocated + i] = page + i;

- if (gfpflags_allow_blocking(gfp))
- cond_resched();
-
+ cond_resched();
nr_allocated += 1U << order;
}

--
2.20.1

2021-07-06 20:28:14

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <[email protected]> wrote:

> In case of simultaneous vmalloc allocations, for example it is 1GB and
> 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> kernel.
>
> <snip>
> ...
>
> are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> to reschedule. Batched value is hard-coded and is 100 pages per call.
>
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

Can we please have a Fixes: for this?

Is this fix important enough for 4.14-rcx? I think so...

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> * to fails, fallback to a single page allocator that is
> * more permissive.
> */
> - if (!order)
> - nr_allocated = alloc_pages_bulk_array_node(
> - gfp, nid, nr_pages, pages);
> - else
> + if (!order) {
> + while (nr_allocated < nr_pages) {
> + int nr, nr_pages_request;
> +
> + /*
> + * A maximum allowed request is hard-coded and is 100
> + * pages per call. That is done in order to prevent a
> + * long preemption off scenario in the bulk-allocator
> + * so the range is [1:100].
> + */
> + nr_pages_request = min(100, (int)(nr_pages - nr_allocated));

Yes, they types are all over the place.

nr_pages: unsigned long
nr_allocated: unsigned int
nr, nr_pages_request: int

Can we please choose the most appropriate type and use that
consistently?

2021-07-07 08:43:15

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

On Tue 06-07-21 13:26:53, Andrew Morton wrote:
> On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <[email protected]> wrote:
>
> > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > kernel.
> >
> > <snip>
> > ...
> >
> > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > to reschedule. Batched value is hard-coded and is 100 pages per call.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
>
> Can we please have a Fixes: for this?

Is this a fix for any actual real life problem? I mean allocating 1GB of
vmalloc space back and forth sounds like a stretch to me.

> Is this fix important enough for 4.14-rcx? I think so...

I do not think so. This is an improvement so that vmalloc behaves more
sanely for those abusers...
--
Michal Hocko
SUSE Labs

2021-07-07 08:52:13

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

On Mon 05-07-21 19:05:36, Uladzislau Rezki (Sony) wrote:
> In case of simultaneous vmalloc allocations, for example it is 1GB and
> 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> kernel.
>
> <snip>
> [ 62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0
> [ 62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89
> [ 62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206
> [ 62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928
> [ 62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880
> [ 62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff
> [ 62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80
> [ 62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000
> [ 62.512637] FS: 0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000
> [ 62.512638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0
> [ 62.512641] Call Trace:
> [ 62.512646] __vmalloc_node_range+0x11c/0x2d0
> [ 62.512649] ? full_fit_alloc_test+0x140/0x140 [test_vmalloc]
> [ 62.512654] __vmalloc_node+0x4b/0x70
> [ 62.512656] ? fix_size_alloc_test+0x44/0x60 [test_vmalloc]
> [ 62.512659] fix_size_alloc_test+0x44/0x60 [test_vmalloc]
> [ 62.512662] test_func+0xe7/0x1f0 [test_vmalloc]
> [ 62.512666] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
> [ 62.512668] kthread+0x11a/0x140
> [ 62.512671] ? set_kthread_struct+0x40/0x40
> [ 62.512672] ret_from_fork+0x22/0x30
> <snip>
>
> To address this issue invoke a bulk-allocator many times until all pages
> are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> to reschedule. Batched value is hard-coded and is 100 pages per call.

Yes, this makes perfect sense to me. I would just be more explicit that
this is an artificially created problem likely not being a problem at
the moment but why not to prepare for a future.

> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

Acked-by: Michal Hocko <[email protected]>

Thanks!
> ---
> mm/vmalloc.c | 30 ++++++++++++++++++++++++++----
> 1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index aaad569e8963..5297958ac7c5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> * to fails, fallback to a single page allocator that is
> * more permissive.
> */
> - if (!order)
> - nr_allocated = alloc_pages_bulk_array_node(
> - gfp, nid, nr_pages, pages);
> - else
> + if (!order) {
> + while (nr_allocated < nr_pages) {
> + int nr, nr_pages_request;
> +
> + /*
> + * A maximum allowed request is hard-coded and is 100
> + * pages per call. That is done in order to prevent a
> + * long preemption off scenario in the bulk-allocator
> + * so the range is [1:100].
> + */
> + nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
> +
> + nr = alloc_pages_bulk_array_node(gfp, nid,
> + nr_pages_request, pages + nr_allocated);
> +
> + nr_allocated += nr;
> + cond_resched();
> +
> + /*
> + * If zero or pages were obtained partly,
> + * fallback to a single page allocator.
> + */
> + if (nr != nr_pages_request)
> + break;
> + }
> + } else
> /*
> * Compound pages required for remap_vmalloc_page if
> * high-order pages.
> --
> 2.20.1

--
Michal Hocko
SUSE Labs

2021-07-07 08:53:10

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mm/vmalloc: Remove gfpflags_allow_blocking() check

On Mon 05-07-21 19:05:37, Uladzislau Rezki (Sony) wrote:
> Get rid of gfpflags_allow_blocking() check from the vmalloc() path
> as it is supposed to be sleepable anyway. Thus remove it from the
> alloc_vmap_area() as well as from the vm_area_alloc_pages().
>
> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/vmalloc.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 5297958ac7c5..93a9cbdba905 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1479,6 +1479,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> int node, gfp_t gfp_mask)
> {
> struct vmap_area *va;
> + unsigned long freed;
> unsigned long addr;
> int purged = 0;
> int ret;
> @@ -1542,13 +1543,12 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> goto retry;
> }
>
> - if (gfpflags_allow_blocking(gfp_mask)) {
> - unsigned long freed = 0;
> - blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
> - if (freed > 0) {
> - purged = 0;
> - goto retry;
> - }
> + freed = 0;
> + blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
> +
> + if (freed > 0) {
> + purged = 0;
> + goto retry;
> }
>
> if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
> @@ -2834,9 +2834,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> for (i = 0; i < (1U << order); i++)
> pages[nr_allocated + i] = page + i;
>
> - if (gfpflags_allow_blocking(gfp))
> - cond_resched();
> -
> + cond_resched();
> nr_allocated += 1U << order;
> }
>
> --
> 2.20.1

--
Michal Hocko
SUSE Labs

2021-07-07 14:52:47

by Uladzislau Rezki

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

On Tue, Jul 06, 2021 at 01:26:53PM -0700, Andrew Morton wrote:
> On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <[email protected]> wrote:
>
> > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > kernel.
> >
> > <snip>
> > ...
> >
> > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > to reschedule. Batched value is hard-coded and is 100 pages per call.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
>
> Can we please have a Fixes: for this?
>
> Is this fix important enough for 4.14-rcx? I think so...
>
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> > * to fails, fallback to a single page allocator that is
> > * more permissive.
> > */
> > - if (!order)
> > - nr_allocated = alloc_pages_bulk_array_node(
> > - gfp, nid, nr_pages, pages);
> > - else
> > + if (!order) {
> > + while (nr_allocated < nr_pages) {
> > + int nr, nr_pages_request;
> > +
> > + /*
> > + * A maximum allowed request is hard-coded and is 100
> > + * pages per call. That is done in order to prevent a
> > + * long preemption off scenario in the bulk-allocator
> > + * so the range is [1:100].
> > + */
> > + nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
>
> Yes, they types are all over the place.
>
> nr_pages: unsigned long
> nr_allocated: unsigned int
> nr, nr_pages_request: int
>
> Can we please choose the most appropriate type and use that
> consistently?
>
Let me think over it to see what i can do.

--
Vlad Rezki

2021-07-07 15:36:53

by Uladzislau Rezki

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] mm/vmalloc: Remove gfpflags_allow_blocking() check

On Wed, Jul 07, 2021 at 10:51:22AM +0200, Michal Hocko wrote:
> On Mon 05-07-21 19:05:37, Uladzislau Rezki (Sony) wrote:
> > Get rid of gfpflags_allow_blocking() check from the vmalloc() path
> > as it is supposed to be sleepable anyway. Thus remove it from the
> > alloc_vmap_area() as well as from the vm_area_alloc_pages().
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>
>

Thanks!

--
Vlad Rezki

2021-07-07 15:36:53

by Uladzislau Rezki

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm/vmalloc: Use batched page requests in bulk-allocator

On Wed, Jul 07, 2021 at 10:42:29AM +0200, Michal Hocko wrote:
> On Tue 06-07-21 13:26:53, Andrew Morton wrote:
> > On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <[email protected]> wrote:
> >
> > > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > > kernel.
> > >
> > > <snip>
> > > ...
> > >
> > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > > to reschedule. Batched value is hard-coded and is 100 pages per call.
> > >
> > > Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
> >
> > Can we please have a Fixes: for this?
>
> Is this a fix for any actual real life problem? I mean allocating 1GB of
> vmalloc space back and forth sounds like a stretch to me.
>
It is not a real scenario. I simulated it by the stress-suite tests. So the
Fixes tag is not needed, IMHO.

> > Is this fix important enough for 4.14-rcx? I think so...
>
> I do not think so. This is an improvement so that vmalloc behaves more
> sanely for those abusers...
>
A bulk-allocator has recently been introduced, so 4.x does not have it,
i.e. this change is not applicable and 4.x kernel does not suffer from
it.

--
Vlad Rezki