LinuxLists.cc - [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

2017-09-06 04:37:59

Subject: [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

From: Joonsoo Kim <[email protected]>

slub uses higher order allocation than it actually needs. In this case,
we don't want to do direct reclaim to make such a high order page since
it causes a big latency to the user. Instead, we would like to fallback
lower order allocation that it actually needs.

However, we also want to get this higher order page in the next time
in order to get the best performance and it would be a role of
the background thread like as kswapd and kcompactd. To wake up them,
we should not clear __GFP_KSWAPD_RECLAIM.

Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
Current unintended code is done by Mel's commit 444eb2a449ef ("mm: thp:
set THP defrag by default to madvise and add a stall-free defrag option")
for slub part. It removes a special case in __alloc_page_slowpath()
where including __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM
effectively means also lacking __GFP_KSWAPD_RECLAIM. However, slub
doesn't use __GFP_THISNODE so it is not the case for this purpose. So,
partially reverting this code in slub doesn't hurt Mel's intention.

Note that this patch does some clean up, too.
__GFP_NOFAIL is cleared twice so remove one.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/slub.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 163352c..45f4a4b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* so we fall-back to the minimum order allocation.
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
- if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
- alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
+ if (oo_order(oo) > oo_order(s->min)) {
+ if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
+ alloc_gfp |= __GFP_NOMEMALLOC;
+ alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
+ }
+ }

page = alloc_slab_page(s, alloc_gfp, node, oo);
if (unlikely(!page)) {
--
2.7.4

2017-09-06 04:38:06

by Joonsoo Kim

[permalink] [raw]

Subject: [PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

From: Joonsoo Kim <[email protected]>

High-order atomic allocation is difficult to succeed since we cannot
reclaim anything in this context. So, we reserves the pageblock for
this kind of request.

In slub, we try to allocate higher-order page more than it actually
needs in order to get the best performance. If this optimistic try is
used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
the pageblock reserved for high-order atomic allocation would be used.
Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
,if succeed, to prepare further request. It would not be good to use
MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
since it unconditionally set a migratetype to request's migratetype
when unreserving the pageblock without considering the migratetype of
used pages in the pageblock.

This is not what we don't intend so fix it by unconditionally masking
out __GFP_ATOMIC in order to not set ALLOC_HARDER.

And, it is also undesirable to use reserved memory for optimistic try
so mask out __GFP_HIGH. This patch also adds __GFP_NOMEMALLOC since
we don't want to use the reserved memory for optimistic try even if
the user has PF_MEMALLOC flag.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/gfp.h | 1 +
mm/page_alloc.c | 8 ++++++++
mm/slub.c | 6 ++----
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718..1f5658e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -568,6 +568,7 @@ extern gfp_t gfp_allowed_mask;

/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
+gfp_t gfp_drop_reserves(gfp_t gfp_mask);

extern void pm_restrict_gfp_mask(void);
extern void pm_restore_gfp_mask(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6dbc49e..0f34356 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3720,6 +3720,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
return !!__gfp_pfmemalloc_flags(gfp_mask);
}

+gfp_t gfp_drop_reserves(gfp_t gfp_mask)
+{
+ gfp_mask &= ~(__GFP_HIGH | __GFP_ATOMIC);
+ gfp_mask |= __GFP_NOMEMALLOC;
+
+ return gfp_mask;
+}
+
/*
* Checks whether it makes sense to retry the reclaim to make a forward progress
* for the given allocation request.
diff --git a/mm/slub.c b/mm/slub.c
index 45f4a4b..3d75d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
if (oo_order(oo) > oo_order(s->min)) {
- if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
- alloc_gfp |= __GFP_NOMEMALLOC;
- alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
- }
+ alloc_gfp = gfp_drop_reserves(alloc_gfp);
+ alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
}

page = alloc_slab_page(s, alloc_gfp, node, oo);
--
2.7.4

2017-09-06 08:07:14

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

On 09/06/2017 06:37 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> slub uses higher order allocation than it actually needs. In this case,
> we don't want to do direct reclaim to make such a high order page since
> it causes a big latency to the user. Instead, we would like to fallback
> lower order allocation that it actually needs.
>
> However, we also want to get this higher order page in the next time
> in order to get the best performance and it would be a role of
> the background thread like as kswapd and kcompactd. To wake up them,
> we should not clear __GFP_KSWAPD_RECLAIM.
>
> Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
> Current unintended code is done by Mel's commit 444eb2a449ef ("mm: thp:
> set THP defrag by default to madvise and add a stall-free defrag option")
> for slub part. It removes a special case in __alloc_page_slowpath()
> where including __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM
> effectively means also lacking __GFP_KSWAPD_RECLAIM. However, slub
> doesn't use __GFP_THISNODE so it is not the case for this purpose. So,
> partially reverting this code in slub doesn't hurt Mel's intention.
>
> Note that this patch does some clean up, too.
> __GFP_NOFAIL is cleared twice so remove one.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/slub.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 163352c..45f4a4b 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> * so we fall-back to the minimum order allocation.
> */
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> - if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
> - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
> + if (oo_order(oo) > oo_order(s->min)) {
> + if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> + alloc_gfp |= __GFP_NOMEMALLOC;
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> + }
> + }
>
> page = alloc_slab_page(s, alloc_gfp, node, oo);
> if (unlikely(!page)) {
>

2017-09-06 08:10:30

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

On 09/06/2017 06:37 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> High-order atomic allocation is difficult to succeed since we cannot
> reclaim anything in this context. So, we reserves the pageblock for
> this kind of request.
>
> In slub, we try to allocate higher-order page more than it actually
> needs in order to get the best performance. If this optimistic try is
> used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> the pageblock reserved for high-order atomic allocation would be used.
> Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> ,if succeed, to prepare further request. It would not be good to use
> MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> since it unconditionally set a migratetype to request's migratetype
> when unreserving the pageblock without considering the migratetype of
> used pages in the pageblock.
>
> This is not what we don't intend so fix it by unconditionally masking
> out __GFP_ATOMIC in order to not set ALLOC_HARDER.
>
> And, it is also undesirable to use reserved memory for optimistic try
> so mask out __GFP_HIGH. This patch also adds __GFP_NOMEMALLOC since
> we don't want to use the reserved memory for optimistic try even if
> the user has PF_MEMALLOC flag.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/gfp.h | 1 +
> mm/page_alloc.c | 8 ++++++++
> mm/slub.c | 6 ++----
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index f780718..1f5658e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -568,6 +568,7 @@ extern gfp_t gfp_allowed_mask;
>
> /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
> bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
> +gfp_t gfp_drop_reserves(gfp_t gfp_mask);
>
> extern void pm_restrict_gfp_mask(void);
> extern void pm_restore_gfp_mask(void);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6dbc49e..0f34356 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3720,6 +3720,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> return !!__gfp_pfmemalloc_flags(gfp_mask);
> }
>
> +gfp_t gfp_drop_reserves(gfp_t gfp_mask)
> +{
> + gfp_mask &= ~(__GFP_HIGH | __GFP_ATOMIC);
> + gfp_mask |= __GFP_NOMEMALLOC;
> +
> + return gfp_mask;
> +}
> +

I think it's wasteful to do a function call for this, inline definition
in header would be better (gfp_pfmemalloc_allowed() is different as it
relies on a rather heavyweight __gfp_pfmemalloc_flags().

> /*
> * Checks whether it makes sense to retry the reclaim to make a forward progress
> * for the given allocation request.
> diff --git a/mm/slub.c b/mm/slub.c
> index 45f4a4b..3d75d30 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> */
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> if (oo_order(oo) > oo_order(s->min)) {
> - if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> - alloc_gfp |= __GFP_NOMEMALLOC;
> - alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> - }
> + alloc_gfp = gfp_drop_reserves(alloc_gfp);
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> }
>
> page = alloc_slab_page(s, alloc_gfp, node, oo);
>

2017-09-06 09:20:20

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

On Wed 06-09-17 10:10:22, Vlastimil Babka wrote:
> On 09/06/2017 06:37 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > High-order atomic allocation is difficult to succeed since we cannot
> > reclaim anything in this context. So, we reserves the pageblock for
> > this kind of request.
> >
> > In slub, we try to allocate higher-order page more than it actually
> > needs in order to get the best performance. If this optimistic try is
> > used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> > the pageblock reserved for high-order atomic allocation would be used.
> > Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> > ,if succeed, to prepare further request. It would not be good to use
> > MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> > since it unconditionally set a migratetype to request's migratetype
> > when unreserving the pageblock without considering the migratetype of
> > used pages in the pageblock.
> >
> > This is not what we don't intend so fix it by unconditionally masking
> > out __GFP_ATOMIC in order to not set ALLOC_HARDER.
> >
> > And, it is also undesirable to use reserved memory for optimistic try
> > so mask out __GFP_HIGH. This patch also adds __GFP_NOMEMALLOC since
> > we don't want to use the reserved memory for optimistic try even if
> > the user has PF_MEMALLOC flag.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > include/linux/gfp.h | 1 +
> > mm/page_alloc.c | 8 ++++++++
> > mm/slub.c | 6 ++----
> > 3 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index f780718..1f5658e 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -568,6 +568,7 @@ extern gfp_t gfp_allowed_mask;
> >
> > /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
> > bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
> > +gfp_t gfp_drop_reserves(gfp_t gfp_mask);
> >
> > extern void pm_restrict_gfp_mask(void);
> > extern void pm_restore_gfp_mask(void);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 6dbc49e..0f34356 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3720,6 +3720,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> > return !!__gfp_pfmemalloc_flags(gfp_mask);
> > }
> >
> > +gfp_t gfp_drop_reserves(gfp_t gfp_mask)
> > +{
> > + gfp_mask &= ~(__GFP_HIGH | __GFP_ATOMIC);
> > + gfp_mask |= __GFP_NOMEMALLOC;
> > +
> > + return gfp_mask;
> > +}
> > +
>
> I think it's wasteful to do a function call for this, inline definition
> in header would be better (gfp_pfmemalloc_allowed() is different as it
> relies on a rather heavyweight __gfp_pfmemalloc_flags().

Agreed. If you do that, feel free to add
Acked-by: Michal Hocko <[email protected]>
--
Michal Hocko
SUSE Labs

2017-09-06 15:55:33

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

On Wed, 6 Sep 2017, Vlastimil Babka wrote:

> I think it's wasteful to do a function call for this, inline definition
> in header would be better (gfp_pfmemalloc_allowed() is different as it
> relies on a rather heavyweight __gfp_pfmemalloc_flags().

Right.

2017-09-06 15:59:12

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

On Wed, 6 Sep 2017, [email protected] wrote:

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> * so we fall-back to the minimum order allocation.
> */
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> - if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
> - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
> + if (oo_order(oo) > oo_order(s->min)) {
> + if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> + alloc_gfp |= __GFP_NOMEMALLOC;
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> + }
> + }
>

Can we come up with another inline function in gfp.h for this as well?

Well and needing these functions to manipulate flags actually indicates
that we may need a cleanup of the GFP flags at some point. There is a buch
of flags that disable things and some that enable things.

2017-09-06 17:21:17

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

On Wed 06-09-17 10:59:09, Cristopher Lameter wrote:
> On Wed, 6 Sep 2017, [email protected] wrote:
>
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > * so we fall-back to the minimum order allocation.
> > */
> > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> > - if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
> > - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
> > + if (oo_order(oo) > oo_order(s->min)) {
> > + if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> > + alloc_gfp |= __GFP_NOMEMALLOC;
> > + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> > + }
> > + }
> >
>
> Can we come up with another inline function in gfp.h for this as well?

What do you mean? The oo_order thing?

> Well and needing these functions to manipulate flags actually indicates
> that we may need a cleanup of the GFP flags at some point. There is a buch
> of flags that disable things and some that enable things.

Good luck with that
--
Michal Hocko
SUSE Labs