From: Joonsoo Kim <[email protected]>
slub uses higher order allocation than it actually needs. In this case,
we don't want to do direct reclaim to make such a high order page since
it causes a big latency to the user. Instead, we would like to fallback
lower order allocation that it actually needs.
However, we also want to get this higher order page in the next time
in order to get the best performance and it would be a role of
the background thread like as kswapd and kcompactd. To wake up them,
we should not clear __GFP_KSWAPD_RECLAIM.
Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
Note that this patch does some clean up, too.
__GFP_NOFAIL is cleared twice so remove one.
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/slub.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 0dc7397..e1e442c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* so we fall-back to the minimum order allocation.
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
- if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
- alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
+ if (oo_order(oo) > oo_order(s->min)) {
+ if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
+ alloc_gfp |= __GFP_NOMEMALLOC;
+ alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
+ }
+ }
page = alloc_slab_page(s, alloc_gfp, node, oo);
if (unlikely(!page)) {
--
2.7.4
From: Joonsoo Kim <[email protected]>
High-order atomic allocation is difficult to succeed since we cannot
reclaim anything in this context. So, we reserves the pageblock for
this kind of request.
In slub, we try to allocate higher-order page more than it actually
needs in order to get the best performance. If this optimistic try is
used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
the pageblock reserved for high-order atomic allocation would be used.
Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
,if succeed, to prepare further request. It would not be good to use
MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
since it unconditionally set a migratetype to request's migratetype
when unreserving the pageblock without considering the migratetype of
used pages in the pageblock.
This is not what we don't intend so fix it by unconditionally setting
__GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/slub.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index e1e442c..fd8dd89 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
if (oo_order(oo) > oo_order(s->min)) {
- if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
- alloc_gfp |= __GFP_NOMEMALLOC;
- alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
- }
+ alloc_gfp |= __GFP_NOMEMALLOC;
+ alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
}
page = alloc_slab_page(s, alloc_gfp, node, oo);
--
2.7.4
On 08/28/2017 03:11 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> slub uses higher order allocation than it actually needs. In this case,
> we don't want to do direct reclaim to make such a high order page since
> it causes a big latency to the user. Instead, we would like to fallback
> lower order allocation that it actually needs.
>
> However, we also want to get this higher order page in the next time
> in order to get the best performance and it would be a role of
> the background thread like as kswapd and kcompactd. To wake up them,
> we should not clear __GFP_KSWAPD_RECLAIM.
>
> Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
>
> Note that this patch does some clean up, too.
> __GFP_NOFAIL is cleared twice so remove one.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
Hm, so this seems to revert Mel's 444eb2a449ef ("mm: thp: set THP defrag
by default to madvise and add a stall-free defrag option") wrt the slub
allocate_slab() part. AFAICS the intention in Mel's patch was that he
removed a special case in __alloc_page_slowpath() where including
__GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM effectively means also
lacking __GFP_KSWAPD_RECLAIM. The commit log claims that slab/slub might
change behavior so he moved the removal of __GFP_KSWAPD_RECLAIM to them.
But AFAICS, only slab uses __GFP_THISNODE, while slub doesn't. So your
patch would indeed revert an unintentional change of Mel's commit. Is it
right or do I miss something?
> ---
> mm/slub.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 0dc7397..e1e442c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> * so we fall-back to the minimum order allocation.
> */
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> - if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
> - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
> + if (oo_order(oo) > oo_order(s->min)) {
> + if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> + alloc_gfp |= __GFP_NOMEMALLOC;
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> + }
> + }
>
> page = alloc_slab_page(s, alloc_gfp, node, oo);
> if (unlikely(!page)) {
>
On 08/28/2017 03:11 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> High-order atomic allocation is difficult to succeed since we cannot
> reclaim anything in this context. So, we reserves the pageblock for
> this kind of request.
>
> In slub, we try to allocate higher-order page more than it actually
> needs in order to get the best performance. If this optimistic try is
> used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> the pageblock reserved for high-order atomic allocation would be used.
> Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> ,if succeed, to prepare further request. It would not be good to use
> MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> since it unconditionally set a migratetype to request's migratetype
> when unreserving the pageblock without considering the migratetype of
> used pages in the pageblock.
>
> This is not what we don't intend so fix it by unconditionally setting
> __GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
I wonder if it would be more robust to strip GFP_ATOMIC from alloc_gfp.
E.g. __GFP_NOMEMALLOC does seem to prevent ALLOC_HARDER, but not
ALLOC_HIGH. Or maybe we should adjust __GFP_NOMEMALLOC implementation
and document it more thoroughly? CC Michal Hocko
Also, were these 2 patches done via code inspection or you noticed
suboptimal behavior which got fixed? Thanks.
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/slub.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index e1e442c..fd8dd89 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> */
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> if (oo_order(oo) > oo_order(s->min)) {
> - if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> - alloc_gfp |= __GFP_NOMEMALLOC;
> - alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> - }
> + alloc_gfp |= __GFP_NOMEMALLOC;
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> }
>
> page = alloc_slab_page(s, alloc_gfp, node, oo);
>
On Mon 28-08-17 13:29:29, Vlastimil Babka wrote:
> On 08/28/2017 03:11 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > High-order atomic allocation is difficult to succeed since we cannot
> > reclaim anything in this context. So, we reserves the pageblock for
> > this kind of request.
> >
> > In slub, we try to allocate higher-order page more than it actually
> > needs in order to get the best performance. If this optimistic try is
> > used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> > the pageblock reserved for high-order atomic allocation would be used.
> > Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> > ,if succeed, to prepare further request. It would not be good to use
> > MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> > since it unconditionally set a migratetype to request's migratetype
> > when unreserving the pageblock without considering the migratetype of
> > used pages in the pageblock.
> >
> > This is not what we don't intend so fix it by unconditionally setting
> > __GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
>
> I wonder if it would be more robust to strip GFP_ATOMIC from alloc_gfp.
> E.g. __GFP_NOMEMALLOC does seem to prevent ALLOC_HARDER, but not
> ALLOC_HIGH. Or maybe we should adjust __GFP_NOMEMALLOC implementation
> and document it more thoroughly? CC Michal Hocko
Yeah, __GFP_NOMEMALLOC is rather inconsistent. It has been added to
override __GFP_MEMALLOC resp. PF_MEMALLOC AFAIK. In this particular
case I would agree that dropping __GFP_HIGH and __GFP_ATOMIC would
be more precise. I am not sure we want to touch the existing semantic of
__GFP_NOMEMALLOC though. This would require auditing all the existing
users (something tells me that quite some of those will be incorrect...)
> Also, were these 2 patches done via code inspection or you noticed
> suboptimal behavior which got fixed? Thanks.
The patch description is not very clear to me either but I guess that
Joonsoo sees to many larger order pages to back slab objects when the
system is not under heavy memory pressure and that increases internal
fragmentation?
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > mm/slub.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index e1e442c..fd8dd89 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > */
> > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> > if (oo_order(oo) > oo_order(s->min)) {
> > - if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
> > - alloc_gfp |= __GFP_NOMEMALLOC;
> > - alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> > - }
> > + alloc_gfp |= __GFP_NOMEMALLOC;
> > + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> > }
> >
> > page = alloc_slab_page(s, alloc_gfp, node, oo);
> >
--
Michal Hocko
SUSE Labs
On Mon, Aug 28, 2017 at 12:04:41PM +0200, Vlastimil Babka wrote:
> On 08/28/2017 03:11 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > slub uses higher order allocation than it actually needs. In this case,
> > we don't want to do direct reclaim to make such a high order page since
> > it causes a big latency to the user. Instead, we would like to fallback
> > lower order allocation that it actually needs.
> >
> > However, we also want to get this higher order page in the next time
> > in order to get the best performance and it would be a role of
> > the background thread like as kswapd and kcompactd. To wake up them,
> > we should not clear __GFP_KSWAPD_RECLAIM.
> >
> > Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
> >
> > Note that this patch does some clean up, too.
> > __GFP_NOFAIL is cleared twice so remove one.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
>
> Hm, so this seems to revert Mel's 444eb2a449ef ("mm: thp: set THP defrag
> by default to madvise and add a stall-free defrag option") wrt the slub
> allocate_slab() part. AFAICS the intention in Mel's patch was that he
> removed a special case in __alloc_page_slowpath() where including
> __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM effectively means also
> lacking __GFP_KSWAPD_RECLAIM. The commit log claims that slab/slub might
> change behavior so he moved the removal of __GFP_KSWAPD_RECLAIM to them.
>
> But AFAICS, only slab uses __GFP_THISNODE, while slub doesn't. So your
> patch would indeed revert an unintentional change of Mel's commit. Is it
> right or do I miss something?
I didn't look at that patch. What I tried here is just restoring first
intention of this code. I now realize that Mel did it for specific
purpose. Thanks for notifying it.
Anyway, your analysis looks correct and this change doesn't hurt Mel's
intention and restores original behaviour of the code. I will add your
analysis on the commit description and resubmit it. Is it okay to you?
Thanks.
On Mon, Aug 28, 2017 at 03:08:29PM +0200, Michal Hocko wrote:
> On Mon 28-08-17 13:29:29, Vlastimil Babka wrote:
> > On 08/28/2017 03:11 AM, [email protected] wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > High-order atomic allocation is difficult to succeed since we cannot
> > > reclaim anything in this context. So, we reserves the pageblock for
> > > this kind of request.
> > >
> > > In slub, we try to allocate higher-order page more than it actually
> > > needs in order to get the best performance. If this optimistic try is
> > > used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> > > the pageblock reserved for high-order atomic allocation would be used.
> > > Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> > > ,if succeed, to prepare further request. It would not be good to use
> > > MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> > > since it unconditionally set a migratetype to request's migratetype
> > > when unreserving the pageblock without considering the migratetype of
> > > used pages in the pageblock.
> > >
> > > This is not what we don't intend so fix it by unconditionally setting
> > > __GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
> >
> > I wonder if it would be more robust to strip GFP_ATOMIC from alloc_gfp.
> > E.g. __GFP_NOMEMALLOC does seem to prevent ALLOC_HARDER, but not
> > ALLOC_HIGH. Or maybe we should adjust __GFP_NOMEMALLOC implementation
> > and document it more thoroughly? CC Michal Hocko
>
> Yeah, __GFP_NOMEMALLOC is rather inconsistent. It has been added to
> override __GFP_MEMALLOC resp. PF_MEMALLOC AFAIK. In this particular
> case I would agree that dropping __GFP_HIGH and __GFP_ATOMIC would
> be more precise. I am not sure we want to touch the existing semantic of
> __GFP_NOMEMALLOC though. This would require auditing all the existing
> users (something tells me that quite some of those will be incorrect...)
Hmm... now I realize that there is another reason that we need to use
__GFP_NOMEMALLOC. Even if this allocation comes from PF_MEMALLOC user,
this optimistic try should not use the reserved memory below the
watermark. That is, it should not use ALLOC_NO_WATERMARKS. It can
only be accomplished by using __GFP_NOMEMALLOC.
>
> > Also, were these 2 patches done via code inspection or you noticed
> > suboptimal behavior which got fixed? Thanks.
>
> The patch description is not very clear to me either but I guess that
> Joonsoo sees to many larger order pages to back slab objects when the
> system is not under heavy memory pressure and that increases internal
> fragmentation?
Your guess is right. I found this problem when I checked the
fragmentation ratio through the benchmark some months ago. I don't
remember detailed system state in that benchmark.
Thanks.
On 08/29/2017 02:22 AM, Joonsoo Kim wrote:
> On Mon, Aug 28, 2017 at 12:04:41PM +0200, Vlastimil Babka wrote:
>>
>> Hm, so this seems to revert Mel's 444eb2a449ef ("mm: thp: set THP defrag
>> by default to madvise and add a stall-free defrag option") wrt the slub
>> allocate_slab() part. AFAICS the intention in Mel's patch was that he
>> removed a special case in __alloc_page_slowpath() where including
>> __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM effectively means also
>> lacking __GFP_KSWAPD_RECLAIM. The commit log claims that slab/slub might
>> change behavior so he moved the removal of __GFP_KSWAPD_RECLAIM to them.
>>
>> But AFAICS, only slab uses __GFP_THISNODE, while slub doesn't. So your
>> patch would indeed revert an unintentional change of Mel's commit. Is it
>> right or do I miss something?
>
> I didn't look at that patch. What I tried here is just restoring first
> intention of this code. I now realize that Mel did it for specific
> purpose. Thanks for notifying it.
>
> Anyway, your analysis looks correct and this change doesn't hurt Mel's
> intention and restores original behaviour of the code. I will add your
> analysis on the commit description and resubmit it. Is it okay to you?
Yeah, no problem.
> Thanks.
>
On Tue, Aug 29, 2017 at 09:33:44AM +0900, Joonsoo Kim wrote:
> On Mon, Aug 28, 2017 at 03:08:29PM +0200, Michal Hocko wrote:
> > On Mon 28-08-17 13:29:29, Vlastimil Babka wrote:
> > > On 08/28/2017 03:11 AM, [email protected] wrote:
> > > > From: Joonsoo Kim <[email protected]>
> > > >
> > > > High-order atomic allocation is difficult to succeed since we cannot
> > > > reclaim anything in this context. So, we reserves the pageblock for
> > > > this kind of request.
> > > >
> > > > In slub, we try to allocate higher-order page more than it actually
> > > > needs in order to get the best performance. If this optimistic try is
> > > > used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> > > > the pageblock reserved for high-order atomic allocation would be used.
> > > > Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> > > > ,if succeed, to prepare further request. It would not be good to use
> > > > MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> > > > since it unconditionally set a migratetype to request's migratetype
> > > > when unreserving the pageblock without considering the migratetype of
> > > > used pages in the pageblock.
> > > >
> > > > This is not what we don't intend so fix it by unconditionally setting
> > > > __GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
> > >
> > > I wonder if it would be more robust to strip GFP_ATOMIC from alloc_gfp.
> > > E.g. __GFP_NOMEMALLOC does seem to prevent ALLOC_HARDER, but not
> > > ALLOC_HIGH. Or maybe we should adjust __GFP_NOMEMALLOC implementation
> > > and document it more thoroughly? CC Michal Hocko
> >
> > Yeah, __GFP_NOMEMALLOC is rather inconsistent. It has been added to
> > override __GFP_MEMALLOC resp. PF_MEMALLOC AFAIK. In this particular
> > case I would agree that dropping __GFP_HIGH and __GFP_ATOMIC would
> > be more precise. I am not sure we want to touch the existing semantic of
> > __GFP_NOMEMALLOC though. This would require auditing all the existing
> > users (something tells me that quite some of those will be incorrect...)
>
> Hmm... now I realize that there is another reason that we need to use
> __GFP_NOMEMALLOC. Even if this allocation comes from PF_MEMALLOC user,
> this optimistic try should not use the reserved memory below the
> watermark. That is, it should not use ALLOC_NO_WATERMARKS. It can
> only be accomplished by using __GFP_NOMEMALLOC.
Michal, Vlastimil, Any thought?
Thanks.
On Thu 31-08-17 10:42:41, Joonsoo Kim wrote:
> On Tue, Aug 29, 2017 at 09:33:44AM +0900, Joonsoo Kim wrote:
> > On Mon, Aug 28, 2017 at 03:08:29PM +0200, Michal Hocko wrote:
> > > On Mon 28-08-17 13:29:29, Vlastimil Babka wrote:
> > > > On 08/28/2017 03:11 AM, [email protected] wrote:
> > > > > From: Joonsoo Kim <[email protected]>
> > > > >
> > > > > High-order atomic allocation is difficult to succeed since we cannot
> > > > > reclaim anything in this context. So, we reserves the pageblock for
> > > > > this kind of request.
> > > > >
> > > > > In slub, we try to allocate higher-order page more than it actually
> > > > > needs in order to get the best performance. If this optimistic try is
> > > > > used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
> > > > > the pageblock reserved for high-order atomic allocation would be used.
> > > > > Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
> > > > > ,if succeed, to prepare further request. It would not be good to use
> > > > > MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
> > > > > since it unconditionally set a migratetype to request's migratetype
> > > > > when unreserving the pageblock without considering the migratetype of
> > > > > used pages in the pageblock.
> > > > >
> > > > > This is not what we don't intend so fix it by unconditionally setting
> > > > > __GFP_NOMEMALLOC in order to not set ALLOC_HARDER.
> > > >
> > > > I wonder if it would be more robust to strip GFP_ATOMIC from alloc_gfp.
> > > > E.g. __GFP_NOMEMALLOC does seem to prevent ALLOC_HARDER, but not
> > > > ALLOC_HIGH. Or maybe we should adjust __GFP_NOMEMALLOC implementation
> > > > and document it more thoroughly? CC Michal Hocko
> > >
> > > Yeah, __GFP_NOMEMALLOC is rather inconsistent. It has been added to
> > > override __GFP_MEMALLOC resp. PF_MEMALLOC AFAIK. In this particular
> > > case I would agree that dropping __GFP_HIGH and __GFP_ATOMIC would
> > > be more precise. I am not sure we want to touch the existing semantic of
> > > __GFP_NOMEMALLOC though. This would require auditing all the existing
> > > users (something tells me that quite some of those will be incorrect...)
> >
> > Hmm... now I realize that there is another reason that we need to use
> > __GFP_NOMEMALLOC. Even if this allocation comes from PF_MEMALLOC user,
> > this optimistic try should not use the reserved memory below the
> > watermark. That is, it should not use ALLOC_NO_WATERMARKS. It can
> > only be accomplished by using __GFP_NOMEMALLOC.
>
> Michal, Vlastimil, Any thought?
Hmm, I would go with a helper like below and use it in slub
gfp_t gfp_drop_reserves(gfp_t mask)
{
mask &= ~(__GFP_HIGH|__GFP_ATOMIC)
mask |= __GFP_NOMEMALLOC;
return mask;
}
--
Michal Hocko
SUSE Labs