2017-09-26 09:02:47

by Hui Zhu

[permalink] [raw]
Subject: [RFC 0/2] Use HighAtomic against long-term fragmentation

Current HighAtomic just to handle the high atomic page alloc.
But I found that use it handle the normal unmovable continuous page
alloc will help to against long-term fragmentation.

Use highatomic as normal page alloc is odd. But I really got some good
results with our internal test and mmtests.

Do you think it is worth to work on it?

The patches was tested with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.14.0-rc1+ 2 cpus Vbox 1G memory.
orig ch
Minor Faults 45659477 43315623
Major Faults 319 371
Swap Ins 0 0
Swap Outs 0 0
Allocation stalls 0 0
DMA allocs 93518 18345
DMA32 allocs 42395699 40406865
Normal allocs 0 0
Movable allocs 0 0
Direct pages scanned 7056 16232
Kswapd pages scanned 946174 961750
Kswapd pages reclaimed 945077 942821
Direct pages reclaimed 7022 16170
Kswapd efficiency 99% 98%
Kswapd velocity 1576.352 1567.977
Direct efficiency 99% 99%
Direct velocity 11.755 26.464
Percentage direct scans 0% 1%
Zone normal velocity 1588.108 1594.441
Zone dma32 velocity 0.000 0.000
Zone dma velocity 0.000 0.000
Page writes by reclaim 0.000 0.000
Page writes file 0 0
Page writes anon 0 0
Page reclaim immediate 405 16429
Sector Reads 2027848 2109324
Sector Writes 3386260 3299388
Page rescued immediate 0 0
Slabs scanned 867805 877005
Direct inode steals 337 2072
Kswapd inode steals 33911 41777
Kswapd skipped wait 0 0
THP fault alloc 30 84
THP collapse alloc 188 244
THP splits 0 0
THP fault fallback 67 51
THP collapse fail 6 4
Compaction stalls 111 49
Compaction success 81 35
Compaction failures 30 14
Page migrate success 57962 43921
Page migrate failure 67 183
Compaction pages isolated 117473 88823
Compaction migrate scanned 75548 50403
Compaction free scanned 1454638 672310
Compaction cost 62 47
NUMA alloc hit 42129493 40018326
NUMA alloc miss 0 0
NUMA interleave hit 0 0
NUMA alloc local 42129493 40018326
NUMA base PTE updates 0 0
NUMA huge PMD updates 0 0
NUMA page range updates 0 0
NUMA hint faults 0 0
NUMA hint local faults 0 0
NUMA hint local percent 100 100
NUMA pages migrated 0 0
AutoNUMA cost 0% 0%

Hui Zhu (2):
Try to use HighAtomic if try to alloc umovable page that order is not 0
Change limit of HighAtomic from 1% to 10%

page_alloc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)


2017-09-26 09:02:53

by Hui Zhu

[permalink] [raw]
Subject: [RFC 1/2] Try to use HighAtomic if try to alloc umovable page that order is not 0

The page add a new condition to let gfp_to_alloc_flags return
alloc_flags with ALLOC_HARDER if the order is not 0 and migratetype is
MIGRATE_UNMOVABLE.

Then alloc umovable page that order is not 0 will try to use HighAtomic.

Signed-off-by: Hui Zhu <[email protected]>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c841af8..b54e94a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3642,7 +3642,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
}

static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask)
+gfp_to_alloc_flags(gfp_t gfp_mask, int order, int migratetype)
{
unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;

@@ -3671,6 +3671,8 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
alloc_flags &= ~ALLOC_CPUSET;
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
+ else if (order > 0 && migratetype == MIGRATE_UNMOVABLE)
+ alloc_flags |= ALLOC_HARDER;

#ifdef CONFIG_CMA
if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
@@ -3903,7 +3905,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
* kswapd needs to be woken up, and to avoid the cost of setting up
* alloc_flags precisely. So we do that now.
*/
- alloc_flags = gfp_to_alloc_flags(gfp_mask);
+ alloc_flags = gfp_to_alloc_flags(gfp_mask, order, ac->migratetype);

/*
* We need to recalculate the starting point for the zonelist iterator
--
1.9.1

2017-09-26 09:02:54

by Hui Zhu

[permalink] [raw]
Subject: [RFC 2/2] Change limit of HighAtomic from 1% to 10%

After "Try to use HighAtomic if try to alloc umovable page that order
is not 0". The result is still not very well because the the limit of
HighAtomic make kernel cannot reserve more pageblock to HighAtomic.

The patch change max_managed from 1% to 10% make HighAtomic can get more
pageblocks.

Signed-off-by: Hui Zhu <[email protected]>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b54e94a..9322458 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2101,7 +2101,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
* Limit the number reserved to 1 pageblock or roughly 1% of a zone.
* Check is race-prone but harmless.
*/
- max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
+ max_managed = (zone->managed_pages / 10) + pageblock_nr_pages;
if (zone->nr_reserved_highatomic >= max_managed)
return;

--
1.9.1

2017-09-26 09:51:31

by Mel Gorman

[permalink] [raw]
Subject: Re: [RFC 0/2] Use HighAtomic against long-term fragmentation

On Tue, Sep 26, 2017 at 04:46:42PM +0800, Hui Zhu wrote:
> Current HighAtomic just to handle the high atomic page alloc.
> But I found that use it handle the normal unmovable continuous page
> alloc will help to against long-term fragmentation.
>

This is not wise. High-order atomic allocations do not always have a
smooth recovery path such as network drivers with large MTUs that have no
choice but to drop the traffic and hope for a retransmit. That's why they
have the highatomic reserve. If the reserve is used for normal unmovable
allocations then allocation requests that could have waited for reclaim
may cause high-order atomic allocations to fail. Changing it may allow
improve latencies in some limited cases while causing functional failures
in others. If there is a special case where there are a large number of
other high-order allocations then I would suggest increasing min_free_kbytes
instead as a workaround.

--
Mel Gorman
SUSE Labs

2017-09-26 10:04:47

by Hui Zhu

[permalink] [raw]
Subject: Re: [RFC 0/2] Use HighAtomic against long-term fragmentation

2017-09-26 17:51 GMT+08:00 Mel Gorman <[email protected]>:
> On Tue, Sep 26, 2017 at 04:46:42PM +0800, Hui Zhu wrote:
>> Current HighAtomic just to handle the high atomic page alloc.
>> But I found that use it handle the normal unmovable continuous page
>> alloc will help to against long-term fragmentation.
>>
>
> This is not wise. High-order atomic allocations do not always have a
> smooth recovery path such as network drivers with large MTUs that have no
> choice but to drop the traffic and hope for a retransmit. That's why they
> have the highatomic reserve. If the reserve is used for normal unmovable
> allocations then allocation requests that could have waited for reclaim
> may cause high-order atomic allocations to fail. Changing it may allow
> improve latencies in some limited cases while causing functional failures
> in others. If there is a special case where there are a large number of
> other high-order allocations then I would suggest increasing min_free_kbytes
> instead as a workaround.

I think let 0 order unmovable page alloc and other order unmovable pages
alloc use different migrate types will help against long-term
fragmentation.

Do you think kernel can add a special migrate type for big than 0 order
unmovable pages alloc?

Thanks,
Hui

>
> --
> Mel Gorman
> SUSE Labs

2017-09-26 10:43:20

by Mel Gorman

[permalink] [raw]
Subject: Re: [RFC 0/2] Use HighAtomic against long-term fragmentation

On Tue, Sep 26, 2017 at 06:04:04PM +0800, Hui Zhu wrote:
> 2017-09-26 17:51 GMT+08:00 Mel Gorman <[email protected]>:
> > On Tue, Sep 26, 2017 at 04:46:42PM +0800, Hui Zhu wrote:
> >> Current HighAtomic just to handle the high atomic page alloc.
> >> But I found that use it handle the normal unmovable continuous page
> >> alloc will help to against long-term fragmentation.
> >>
> >
> > This is not wise. High-order atomic allocations do not always have a
> > smooth recovery path such as network drivers with large MTUs that have no
> > choice but to drop the traffic and hope for a retransmit. That's why they
> > have the highatomic reserve. If the reserve is used for normal unmovable
> > allocations then allocation requests that could have waited for reclaim
> > may cause high-order atomic allocations to fail. Changing it may allow
> > improve latencies in some limited cases while causing functional failures
> > in others. If there is a special case where there are a large number of
> > other high-order allocations then I would suggest increasing min_free_kbytes
> > instead as a workaround.
>
> I think let 0 order unmovable page alloc and other order unmovable pages
> alloc use different migrate types will help against long-term
> fragmentation.
>

That can already happen through the migratetype fallback lists.

> Do you think kernel can add a special migrate type for big than 0 order
> unmovable pages alloc?
>

Technically, yes but the barrier to entry will be high as you'll have to
explain carefully why it is necessary including information on why order-0
pages cannot be used, back it up with data showing what is improved as a
result and justify why potentially forcing normal workloads to reclaim due
to being unable to use the high-order reserve is ok. If it's a limitation
of a specific driver then it'll be asked why that driver does not have a
dedicated pool (which is functionally similar to having a dedicated reserve).

--
Mel Gorman
SUSE Labs

2017-09-26 10:47:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC 1/2] Try to use HighAtomic if try to alloc umovable page that order is not 0

On Tue 26-09-17 16:46:43, Hui Zhu wrote:
> The page add a new condition to let gfp_to_alloc_flags return
> alloc_flags with ALLOC_HARDER if the order is not 0 and migratetype is
> MIGRATE_UNMOVABLE.

Apart from what Mel has already said this changelog is really lacking
the crucial information. It says what but it doesn't explain why we need
this and why it is safe to do. What kind of workload will benefit from
this change and how much. What about those users who are relying on high
atomic reserves currently and now would need to share it with other
users.

Without knowing all that background and from a quick look this looks
like a very crude hack to me, to be completely honest.

> Then alloc umovable page that order is not 0 will try to use HighAtomic.
>
> Signed-off-by: Hui Zhu <[email protected]>
> ---
> mm/page_alloc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c841af8..b54e94a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3642,7 +3642,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
> }
>
> static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, int order, int migratetype)
> {
> unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>
> @@ -3671,6 +3671,8 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
> alloc_flags &= ~ALLOC_CPUSET;
> } else if (unlikely(rt_task(current)) && !in_interrupt())
> alloc_flags |= ALLOC_HARDER;
> + else if (order > 0 && migratetype == MIGRATE_UNMOVABLE)
> + alloc_flags |= ALLOC_HARDER;
>
> #ifdef CONFIG_CMA
> if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> @@ -3903,7 +3905,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> * kswapd needs to be woken up, and to avoid the cost of setting up
> * alloc_flags precisely. So we do that now.
> */
> - alloc_flags = gfp_to_alloc_flags(gfp_mask);
> + alloc_flags = gfp_to_alloc_flags(gfp_mask, order, ac->migratetype);
>
> /*
> * We need to recalculate the starting point for the zonelist iterator
> --
> 1.9.1
>

--
Michal Hocko
SUSE Labs