When studying page stealing, I noticed some weird looking decisions in
try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following
two patches were driven by evaluation.
Testing was done with stress-highalloc of mmtests, using the
mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often
page stealing occurs for individual migratetypes, and what migratetypes are
used for fallbacks. Arguably, the worst case of page stealing is when
UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation
stealing from MOVABLE allocation is also not ideal, so the goal is to minimize
these two cases.
For some reason, the first patch increased the number of page stealing events
for MOVABLE allocations, and I am still not sure why. In theory these events
are not as bad, and the third patch does more than just to correct this.
Here are the results, baseline (column 26) is 3.17-rc7 with compaction patches
from -mm. First, the results with benchmark set to mimic non-THP-like
whole-pageblock allocations. Discussion below:
stress-highalloc
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-nothp 27-nothp 28-nothp 29-nothp
Success 1 Min 20.00 ( 0.00%) 31.00 (-55.00%) 33.00 (-65.00%) 23.00 (-15.00%)
Success 1 Mean 32.70 ( 0.00%) 39.00 (-19.27%) 39.10 (-19.57%) 35.80 ( -9.48%)
Success 1 Max 42.00 ( 0.00%) 44.00 ( -4.76%) 46.00 ( -9.52%) 45.00 ( -7.14%)
Success 2 Min 20.00 ( 0.00%) 33.00 (-65.00%) 36.00 (-80.00%) 24.00 (-20.00%)
Success 2 Mean 33.90 ( 0.00%) 41.30 (-21.83%) 41.70 (-23.01%) 36.80 ( -8.55%)
Success 2 Max 44.00 ( 0.00%) 49.00 (-11.36%) 49.00 (-11.36%) 45.00 ( -2.27%)
Success 3 Min 84.00 ( 0.00%) 86.00 ( -2.38%) 86.00 ( -2.38%) 85.00 ( -1.19%)
Success 3 Mean 86.40 ( 0.00%) 87.20 ( -0.93%) 87.20 ( -0.93%) 86.80 ( -0.46%)
Success 3 Max 88.00 ( 0.00%) 89.00 ( -1.14%) 89.00 ( -1.14%) 88.00 ( 0.00%)
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-nothp 27-nothp 28-nothp 29-nothp
User 6818.93 6775.23 6759.60 6783.81
System 1055.97 1056.31 1055.37 1057.36
Elapsed 2150.18 2211.63 2196.91 2201.93
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-nothp 27-nothp 28-nothp 29-nothp
Minor Faults 198162003 197936707 197750617 198414323
Major Faults 462 511 533 490
Swap Ins 29 31 42 21
Swap Outs 2142 2225 2616 2276
Allocation stalls 6030 7716 6856 6175
DMA allocs 112 102 128 73
DMA32 allocs 124578777 124503016 124372538 124840569
Normal allocs 59157970 59165895 59160083 59154005
Movable allocs 0 0 0 0
Direct pages scanned 353190 424846 395619 359421
Kswapd pages scanned 2201775 2221571 2223699 2254336
Kswapd pages reclaimed 2196630 2216042 2218175 2242737
Direct pages reclaimed 352402 423989 394801 358321
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 1011.483 1019.369 1016.418 1010.895
Direct efficiency 99% 99% 99% 99%
Direct velocity 162.253 194.941 180.832 161.173
Percentage direct scans 13% 16% 15% 13%
Zone normal velocity 381.505 402.030 393.093 376.382
Zone dma32 velocity 792.218 812.269 804.143 795.679
Zone dma velocity 0.012 0.011 0.014 0.007
Page writes by reclaim 2316.900 2366.600 2791.300 2492.700
Page writes file 174 141 174 216
Page writes anon 2142 2225 2616 2276
Page reclaim immediate 1381 1586 1314 8126
Sector Reads 4703932 4775640 4750501 4747452
Sector Writes 12758092 12720075 12695676 12790100
Page rescued immediate 0 0 0 0
Slabs scanned 1750170 1871811 1847197 1822608
Direct inode steals 14468 14838 14872 14241
Kswapd inode steals 38766 40510 40353 40442
Kswapd skipped wait 0 0 0 0
THP fault alloc 262 221 239 239
THP collapse alloc 506 494 535 491
THP splits 14 12 14 14
THP fault fallback 7 33 10 39
THP collapse fail 17 18 16 18
Compaction stalls 2746 3359 3185 2981
Compaction success 1025 1188 1153 1097
Compaction failures 1721 2170 2032 1884
Page migrate success 3889927 4512417 4340044 4128768
Page migrate failure 14551 17660 17096 14686
Compaction pages isolated 8058458 9337143 8974871 8554984
Compaction migrate scanned 156216179 187390755 178241572 163503245
Compaction free scanned 317797413 388387641 361523988 341521402
Compaction cost 5284 6173 5923 5592
NUMA alloc hit 181314344 181142494 180975258 181531369
NUMA alloc miss 0 0 0 0
NUMA interleave hit 0 0 0 0
NUMA alloc local 181314344 181142494 180975258 181531369
NUMA base PTE updates 0 0 0 0
NUMA huge PMD updates 0 0 0 0
NUMA page range updates 0 0 0 0
NUMA hint faults 0 0 0 0
NUMA hint local faults 0 0 0 0
NUMA hint local percent 100 100 100 100
NUMA pages migrated 0 0 0 0
AutoNUMA cost 0% 0% 0% 0%
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-nothp 27-nothp 28-nothp 29-nothp
Page alloc extfrag event 7223461 10651213 10274135 3785074
Extfrag fragmenting 7221775 10648719 10272431 3782605
Extfrag fragmenting for unmovable 20264 16784 2668 2768
Extfrag fragmenting unmovable stealing from movable 10814 7531 2231 2091
Extfrag fragmenting for reclaimable 1937 1114 1138 1268
Extfrag fragmenting reclaimable stealing from movable 1731 882 914 973
Extfrag fragmenting for movable 7199574 10630821 10268625 3778569
As can be seen, success rates are not very much affected, or perhaps the first
patch improves them slightly. But the reduction of extfrag events is quite
prominent, especially for unmovable allocations polluting (potentially
permanently) movable pageblocks.
For completeness, the results with benchark set to mimic THP allocations are
below. It's not so different, so no extra discussion.
stress-highalloc
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-thp 27-thp 28-thp 29-thp
Success 1 Min 20.00 ( 0.00%) 27.00 (-35.00%) 26.00 (-30.00%) 22.00 (-10.00%)
Success 1 Mean 28.90 ( 0.00%) 33.00 (-14.19%) 31.90 (-10.38%) 29.60 ( -2.42%)
Success 1 Max 36.00 ( 0.00%) 40.00 (-11.11%) 39.00 ( -8.33%) 35.00 ( 2.78%)
Success 2 Min 20.00 ( 0.00%) 28.00 (-40.00%) 30.00 (-50.00%) 23.00 (-15.00%)
Success 2 Mean 31.20 ( 0.00%) 36.70 (-17.63%) 35.20 (-12.82%) 32.50 ( -4.17%)
Success 2 Max 39.00 ( 0.00%) 43.00 (-10.26%) 42.00 ( -7.69%) 43.00 (-10.26%)
Success 3 Min 85.00 ( 0.00%) 86.00 ( -1.18%) 87.00 ( -2.35%) 86.00 ( -1.18%)
Success 3 Mean 86.90 ( 0.00%) 87.30 ( -0.46%) 87.70 ( -0.92%) 87.20 ( -0.35%)
Success 3 Max 88.00 ( 0.00%) 88.00 ( 0.00%) 90.00 ( -2.27%) 89.00 ( -1.14%)
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-thp 27-thp 28-thp 29-thp
User 6819.54 6791.98 6817.78 6780.39
System 1060.01 1061.72 1059.55 1060.22
Elapsed 2143.61 2169.23 2151.94 2164.37
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-thp 27-thp 28-thp 29-thp
Minor Faults 197991650 197731531 197676212 198108344
Major Faults 467 517 485 463
Swap Ins 55 42 55 37
Swap Outs 2743 2628 2848 2423
Allocation stalls 5674 6859 5830 5430
DMA allocs 21 19 18 20
DMA32 allocs 124822788 124717762 124599426 124998427
Normal allocs 58689613 58661322 58715465 58613337
Movable allocs 0 0 0 0
Direct pages scanned 425873 497589 437964 440959
Kswapd pages scanned 2106472 2092938 2123314 2137886
Kswapd pages reclaimed 2100750 2087313 2117523 2124031
Direct pages reclaimed 424875 496616 437006 439572
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 986.439 999.617 1016.928 984.321
Direct efficiency 99% 99% 99% 99%
Direct velocity 199.432 237.656 209.756 203.025
Percentage direct scans 16% 19% 17% 17%
Zone normal velocity 396.728 411.978 412.730 391.261
Zone dma32 velocity 789.143 825.294 813.954 796.086
Zone dma velocity 0.000 0.000 0.000 0.000
Page writes by reclaim 2963.000 2735.600 2981.900 2640.500
Page writes file 219 107 133 217
Page writes anon 2743 2628 2848 2423
Page reclaim immediate 1504 1609 1622 9672
Sector Reads 4638068 4700778 4687436 4690935
Sector Writes 12744701 12689336 12685726 12742547
Page rescued immediate 0 0 0 0
Slabs scanned 1612929 1704964 1659159 1670590
Direct inode steals 15564 17989 16063 17179
Kswapd inode steals 31322 31013 31563 31266
Kswapd skipped wait 0 0 0 0
THP fault alloc 250 227 246 223
THP collapse alloc 517 515 504 487
THP splits 15 13 14 11
THP fault fallback 10 24 5 38
THP collapse fail 17 18 16 18
Compaction stalls 2482 2794 2687 2608
Compaction success 894 1016 995 972
Compaction failures 1588 1778 1692 1636
Page migrate success 2306759 2283240 2298373 2228802
Page migrate failure 10645 12648 10681 10023
Compaction pages isolated 4906442 4878707 4907827 4768580
Compaction migrate scanned 40396525 46362656 44372629 42315303
Compaction free scanned 134008519 146858466 131814222 132434783
Compaction cost 2770 2787 2789 2700
NUMA alloc hit 181150856 180941682 180895401 181254771
NUMA alloc miss 0 0 0 0
NUMA interleave hit 0 0 0 0
NUMA alloc local 181150856 180941682 180895401 181254771
NUMA base PTE updates 0 0 0 0
NUMA huge PMD updates 0 0 0 0
NUMA page range updates 0 0 0 0
NUMA hint faults 0 0 0 0
NUMA hint local faults 0 0 0 0
NUMA hint local percent 100 100 100 100
NUMA pages migrated 0 0 0 0
AutoNUMA cost 0% 0% 0% 0%
3.17-rc7 3.17-rc7 3.17-rc7 3.17-rc7
26-thp 27-thp 28-thp 29-thp
Page alloc extfrag event 4270316 5661910 5018754 2062787
Extfrag fragmenting 4268643 5660158 5016977 2061077
Extfrag fragmenting for unmovable 21632 17627 1985 1984
Extfrag fragmenting unmovable placed with movable 12428 9011 1663 1506
Extfrag fragmenting for reclaimable 1682 1106 1290 1401
Extfrag fragmenting reclaimable placed with movable 1480 917 1072 1132
Extfrag fragmenting for movable 4245329 5641425 5013702 2057692
Vlastimil Babka (3):
mm: when stealing freepages, also take pages created by splitting
buddy page
mm: more aggressive page stealing for UNMOVABLE allocations
mm: always steal split buddies in fallback allocations
mm/page_alloc.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--
2.1.2
When __rmqueue_fallback() is called to allocate a page of order X, it will
find a page of order Y >= X of a fallback migratetype, which is different from
the desired migratetype. With the help of try_to_steal_freepages(), it may
change the migratetype (to the desired one) also of:
1) all currently free pages in the pageblock containing the fallback page
2) the fallback pageblock itself
3) buddy pages created by splitting the fallback page (when Y > X)
These decisions take the order Y into account, as well as the desired
migratetype, with the goal of preventing multiple fallback allocations that
could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
Originally, decision for 1) has implied the decision for 3). Commit
47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
(probably unintentionally) so that the buddy pages in case 3) are always
changed to the desired migratetype, except for CMA pageblocks.
Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
fix a bug") did some refactoring and added a comment that the case of 3) is
intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
pageblock type") removed the comment and tried to restore the original behavior
where 1) implies 3), but due to the previous refactoring, the result is instead
that only 2) implies 3) - and the conditions for 2) are less frequently met
than conditions for 1). This may increase fragmentation in situations where the
code decides to steal all free pages from the pageblock (case 1)), but then
gives back the buddy pages produced by splitting.
This patch restores the original intended logic where 1) implies 3). During
testing with stress-highalloc from mmtests, this has shown to decrease the
number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
pageblocks, which can lead to permanent fragmentation. It has increased the
number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
pageblocks, but these are fixable by sync compaction and thus less harmful.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 616a2c9..548b072 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
/* Claim the whole block if over half of it is free */
if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled) {
-
+ page_group_by_mobility_disabled)
set_pageblock_migratetype(page, start_type);
- return start_type;
- }
+ return start_type;
}
return fallback_type;
--
2.1.2
When allocation falls back to stealing free pages of another migratetype,
it can decide to steal extra pages, or even the whole pageblock in order to
reduce fragmentation, which could happen if further allocation fallbacks
pick a different pageblock. In try_to_steal_freepages(), one of the situations
where extra pages are stolen happens when we are trying to allocate a
MIGRATE_RECLAIMABLE page.
However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
spreading such allocation over multiple fallback pageblocks is arguably even
worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
should minimize the number of such fallbacks, and thus steal as much as is
possible from each fallback pageblock.
This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
extra free pages. When evaluating with stress-highalloc from mmtests, this has
reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 548b072..a14249c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
if (current_order >= pageblock_order / 2 ||
start_type == MIGRATE_RECLAIMABLE ||
+ start_type == MIGRATE_UNMOVABLE ||
page_group_by_mobility_disabled) {
int pages;
--
2.1.2
When allocation falls back to another migratetype, it will steal a page with
highest available order, and (depending on this order and desired migratetype),
it might also steal the rest of free pages from the same pageblock.
Given the preference of highest available order, it is likely that it will be
higher than the desired order, and result in the stolen buddy page being split.
The remaining pages after split are currently stolen only when the rest of the
free pages are stolen. This can however lead to situations where for MOVABLE
allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
order-0 page. Then on the next MOVABLE allocation (which may be batched to
fill the pcplists) we split another order-3 or higher page, etc. By stealing
all pages that we have split, we can avoid further stealing.
This patch therefore adjust the page stealing so that buddy pages created by
split are always stolen. This has effect only on MOVABLE allocations, as
RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
stealing the rest of free pages from the pageblock.
Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
has already performed this change (unintentinally), but was reverted by commit
0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
Neither included evaluation. My evaluation with stress-highalloc from mmtests
shows about 2.5x reduction of page stealing events for MOVABLE allocations,
without affecting the page stealing events for other allocation migratetypes.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a14249c..82096a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
if (pages >= (1 << (pageblock_order-1)) ||
page_group_by_mobility_disabled)
set_pageblock_migratetype(page, start_type);
-
- return start_type;
}
- return fallback_type;
+ return start_type;
}
/* Remove an element from the buddy allocator from the fallback list */
--
2.1.2
On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
>
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
>
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
>
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
>
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
>
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/page_alloc.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 616a2c9..548b072 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>
> /* Claim the whole block if over half of it is free */
> if (pages >= (1 << (pageblock_order-1)) ||
> - page_group_by_mobility_disabled) {
> -
> + page_group_by_mobility_disabled)
> set_pageblock_migratetype(page, start_type);
> - return start_type;
> - }
>
> + return start_type;
> }
>
> return fallback_type;
change_ownership on tracepoint will be wrong with this change.
Thanks.
On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
>
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.
I'm not sure that this change is good. If we steal order 0 pages,
this may be good. But, sometimes, we try to steal high order page
and, in this case, there would be many order 0 freepages and blindly
stealing freepages in that pageblock make the system more fragmented.
MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
it can be reclaimed so excessive migratetype movement doesn't result
in permanent fragmentation.
What I'd like to do to prevent fragmentation is
1) check whether we can steal all or almost freepages and change
migratetype of pageblock.
2) If above condition isn't met, deny allocation and invoke compaction.
Maybe knob to control behaviour would be needed.
How about it?
Thanks.
>
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/page_alloc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>
> if (current_order >= pageblock_order / 2 ||
> start_type == MIGRATE_RECLAIMABLE ||
> + start_type == MIGRATE_UNMOVABLE ||
> page_group_by_mobility_disabled) {
> int pages;
>
> --
> 2.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
>
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
>
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
In fact, CMA also has same problem and this patch skips to fix it.
If movable allocation steals the page on CMA reserved area, remained split
freepages are always linked to original CMA buddy list. And then, next
fallback allocation repeately selects most highorder freepage on CMA
area and split it.
IMO, It'd be better to re-consider whole fragmentation avoidance logic.
Thanks.
>
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/page_alloc.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
> if (pages >= (1 << (pageblock_order-1)) ||
> page_group_by_mobility_disabled)
> set_pageblock_migratetype(page, start_type);
> -
> - return start_type;
> }
>
> - return fallback_type;
> + return start_type;
> }
>
> /* Remove an element from the buddy allocator from the fallback list */
> --
> 2.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to stealing free pages of another migratetype,
>> it can decide to steal extra pages, or even the whole pageblock in order to
>> reduce fragmentation, which could happen if further allocation fallbacks
>> pick a different pageblock. In try_to_steal_freepages(), one of the situations
>> where extra pages are stolen happens when we are trying to allocate a
>> MIGRATE_RECLAIMABLE page.
>>
>> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
>> spreading such allocation over multiple fallback pageblocks is arguably even
>> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
>> should minimize the number of such fallbacks, and thus steal as much as is
>> possible from each fallback pageblock.
>
> I'm not sure that this change is good. If we steal order 0 pages,
> this may be good. But, sometimes, we try to steal high order page
> and, in this case, there would be many order 0 freepages and blindly
> stealing freepages in that pageblock make the system more fragmented.
I don't understand. If we try to steal high order page (current_order >=
pageblock_order / 2), then nothing changes, the condition for extra
stealing is the same.
> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> it can be reclaimed so excessive migratetype movement doesn't result
> in permanent fragmentation.
There's two kinds of "fragmentation" IMHO. First, inside a pageblock,
unmovable allocations can prevent merging of lower orders. This can get
worse if we steal multiple pages from a single pageblock, but the
pageblock itself is not marked as unmovable.
Second kind of fragmentation is when unmovable allocations spread over
multiple pageblocks. Lower order allocations within each such pageblock
might be still possible, but less pageblocks are able to compact to have
whole pageblock free.
I think the second kind is worse, so when do have to pollute a movable
pageblock with unmovable allocation, we better take as much as possible,
so we prevent polluting other pageblocks.
> What I'd like to do to prevent fragmentation is
> 1) check whether we can steal all or almost freepages and change
> migratetype of pageblock.
> 2) If above condition isn't met, deny allocation and invoke compaction.
Could work to some extend, but we need also to prevent excessive compaction.
We could also introduce a new pageblock migratetype, something like
MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
MOVABLE allocations, it's marked as MIXED, until it either becomes
marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is fully
freed. In more detail:
- MIXED is preferred for fallback before any other migratetypes
- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by current
rules), it marks it as MIXED instead.
- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
pageblocks, it will only mark it as MOVABLE if it was fully free.
Otherwise, if current rules would result in marking it as MOVABLE (i.e.
most of it was stolen, but not all) it will mark it as MIXED instead.
This could in theory leave more MOVABLE pageblocks unspoiled by
UNMOVABLE allocations.
> Maybe knob to control behaviour would be needed.
> How about it?
Adding new knobs is not a good solution.
> Thanks.
>
>>
>> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
>> extra free pages. When evaluating with stress-highalloc from mmtests, this has
>> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
>> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>> mm/page_alloc.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 548b072..a14249c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>>
>> if (current_order >= pageblock_order / 2 ||
>> start_type == MIGRATE_RECLAIMABLE ||
>> + start_type == MIGRATE_UNMOVABLE ||
>> page_group_by_mobility_disabled) {
>> int pages;
>>
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On 12/08/2014 08:36 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to another migratetype, it will steal a page with
>> highest available order, and (depending on this order and desired migratetype),
>> it might also steal the rest of free pages from the same pageblock.
>>
>> Given the preference of highest available order, it is likely that it will be
>> higher than the desired order, and result in the stolen buddy page being split.
>> The remaining pages after split are currently stolen only when the rest of the
>> free pages are stolen. This can however lead to situations where for MOVABLE
>> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
>> order-0 page. Then on the next MOVABLE allocation (which may be batched to
>> fill the pcplists) we split another order-3 or higher page, etc. By stealing
>> all pages that we have split, we can avoid further stealing.
>>
>> This patch therefore adjust the page stealing so that buddy pages created by
>> split are always stolen. This has effect only on MOVABLE allocations, as
>> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
>> stealing the rest of free pages from the pageblock.
>
> In fact, CMA also has same problem and this patch skips to fix it.
> If movable allocation steals the page on CMA reserved area, remained split
> freepages are always linked to original CMA buddy list. And then, next
> fallback allocation repeately selects most highorder freepage on CMA
> area and split it.
Hm yeah, for CMA it would make more sense to steal page of the lowest
available order, not highest.
> IMO, It'd be better to re-consider whole fragmentation avoidance logic.
>
> Thanks.
>
>>
>> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
>> has already performed this change (unintentinally), but was reverted by commit
>> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
>> Neither included evaluation. My evaluation with stress-highalloc from mmtests
>> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
>> without affecting the page stealing events for other allocation migratetypes.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>> mm/page_alloc.c | 4 +---
>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a14249c..82096a6 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>> if (pages >= (1 << (pageblock_order-1)) ||
>> page_group_by_mobility_disabled)
>> set_pageblock_migratetype(page, start_type);
>> -
>> - return start_type;
>> }
>>
>> - return fallback_type;
>> + return start_type;
>> }
>>
>> /* Remove an element from the buddy allocator from the fallback list */
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
>
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
>
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
>
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
>
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
>
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Assuming the tracepoint issue Joonsoo pointed out gets corrected;
Acked-by: Mel Gorman <[email protected]>
I'm kicking myself that I missed the effect of 47118af076f6 when I was
reviewing it. I knew allocation success rates were worse than they used
to be but had been blaming changes in aggression of reclaim and
compaction.
--
Mel Gorman
SUSE Labs
On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
>
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.
>
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Note that this is a slightly tricky tradeoff. UNMOVABLE allocations will now
be stealing more of a pageblock during fallback events. This will reduce the
probability that unmovable fallbacks will happen in the future. However,
it also increases the probability that a movable allocation will fallback
in the future. This is particularly true for kernel-build stress workloads
as the liklihood is that unmovable allocations are stealing from movable
pageblocks. The reason this happens is that the movable free lists are
smaller after an unmovable fallback event so a movable fallback event
happens sooner than it would have otherwise.
Movable fallback events are less severe than unmovable fallback events as
they can be moved or freed later so the patch heads the right direction. The
side-effect is simply interesting to note.
--
Mel Gorman
SUSE Labs
On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
>
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen.
The original intent was that the stolen fallback buddy page would be
added to the requested migratetype freelists. This was independent of
whether all other free pages in the pageblock were moved or whether the
pageblock migratetype was updated.
> This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
>
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
>
This restores the intended behaviour.
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
SUSE Labs
On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
>
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
>
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
>
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
>
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
>
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Minchan Kim <[email protected]>
I expect you will Cc -stable when you respin with fixing pointed out
by Joonsoo.
--
Kind regards,
Minchan Kim
On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
>
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.
Fair enough.
>
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Nit:
Please fix comment on try_to_steal_freepages.
We don't bias MIGRATE_RECLAIMABLE any more so remove it. Instead,
put some words about the policy and why.
Thanks.
> ---
> mm/page_alloc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>
> if (current_order >= pageblock_order / 2 ||
> start_type == MIGRATE_RECLAIMABLE ||
> + start_type == MIGRATE_UNMOVABLE ||
> page_group_by_mobility_disabled) {
> int pages;
>
> --
> 2.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
>
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
>
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
>
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Nit:
>From this patch, try_to_steal_freepages always return start_type excpet CMA
case so we could factor CMA case out in try_to_steal_freepages and put the
check right before calling try_to_steal_freepages.
The benefit are we could make try_to_steal_freepages's return type as void
and we could remove fallback_type argument(ie, make the function simple).
Additionally, we could move set_freepage_migratetype into
try_to_steal_freepages so that we could remove new_type variable
in __rmqueue_fallback.
trace_mm_page_alloc_extfrag could work without new_type using
get_pageblock_migratetype.
Thanks.
> ---
> mm/page_alloc.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
> if (pages >= (1 << (pageblock_order-1)) ||
> page_group_by_mobility_disabled)
> set_pageblock_migratetype(page, start_type);
> -
> - return start_type;
> }
>
> - return fallback_type;
> + return start_type;
> }
>
> /* Remove an element from the buddy allocator from the fallback list */
> --
> 2.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> >>When allocation falls back to stealing free pages of another migratetype,
> >>it can decide to steal extra pages, or even the whole pageblock in order to
> >>reduce fragmentation, which could happen if further allocation fallbacks
> >>pick a different pageblock. In try_to_steal_freepages(), one of the situations
> >>where extra pages are stolen happens when we are trying to allocate a
> >>MIGRATE_RECLAIMABLE page.
> >>
> >>However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> >>spreading such allocation over multiple fallback pageblocks is arguably even
> >>worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> >>should minimize the number of such fallbacks, and thus steal as much as is
> >>possible from each fallback pageblock.
> >
> >I'm not sure that this change is good. If we steal order 0 pages,
> >this may be good. But, sometimes, we try to steal high order page
> >and, in this case, there would be many order 0 freepages and blindly
> >stealing freepages in that pageblock make the system more fragmented.
>
> I don't understand. If we try to steal high order page
> (current_order >= pageblock_order / 2), then nothing changes, the
> condition for extra stealing is the same.
More accureately, I means mid order page (current_order <
pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
In this case, perhaps, the system has enough unmovable order 0 freepages,
so we don't need to worry about second kind of fragmentation you
mentioned below. Stealing one mid order freepage is enough to satify
request.
>
> >MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >it can be reclaimed so excessive migratetype movement doesn't result
> >in permanent fragmentation.
>
> There's two kinds of "fragmentation" IMHO. First, inside a
> pageblock, unmovable allocations can prevent merging of lower
> orders. This can get worse if we steal multiple pages from a single
> pageblock, but the pageblock itself is not marked as unmovable.
So, what's the intention pageblock itself not marked as unmovable?
I guess that if many pages are moved to unmovable, they can't be easily
back and this pageblock is highly fragmented. So, processing more unmovable
requests from this pageblock by changing pageblock migratetype makes more
sense to me.
> Second kind of fragmentation is when unmovable allocations spread
> over multiple pageblocks. Lower order allocations within each such
> pageblock might be still possible, but less pageblocks are able to
> compact to have whole pageblock free.
>
> I think the second kind is worse, so when do have to pollute a
> movable pageblock with unmovable allocation, we better take as much
> as possible, so we prevent polluting other pageblocks.
I agree.
>
> >What I'd like to do to prevent fragmentation is
> >1) check whether we can steal all or almost freepages and change
> >migratetype of pageblock.
> >2) If above condition isn't met, deny allocation and invoke compaction.
>
> Could work to some extend, but we need also to prevent excessive compaction.
So, I suggest knob to control behaviour. In small memory system,
fragmentation occurs frequently so the system can't handle just order 2
request. In that system, excessive compaction is acceptable because
it it better than system down.
>
> We could also introduce a new pageblock migratetype, something like
> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> MOVABLE allocations, it's marked as MIXED, until it either becomes
> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> fully freed. In more detail:
>
> - MIXED is preferred for fallback before any other migratetypes
> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> current rules), it marks it as MIXED instead.
> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> pageblocks, it will only mark it as MOVABLE if it was fully free.
> Otherwise, if current rules would result in marking it as MOVABLE
> (i.e. most of it was stolen, but not all) it will mark it as MIXED
> instead.
>
> This could in theory leave more MOVABLE pageblocks unspoiled by
> UNMOVABLE allocations.
I guess that we can do it without introducing new migratetype pageblock.
Just always marking it as RECLAIMABLE/UNMOVABLE when
RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
have same effect.
Thanks.
> >Maybe knob to control behaviour would be needed.
> >How about it?
>
> Adding new knobs is not a good solution.
On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
>> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
>>>
>>> I'm not sure that this change is good. If we steal order 0 pages,
>>> this may be good. But, sometimes, we try to steal high order page
>>> and, in this case, there would be many order 0 freepages and blindly
>>> stealing freepages in that pageblock make the system more fragmented.
>>
>> I don't understand. If we try to steal high order page
>> (current_order >= pageblock_order / 2), then nothing changes, the
>> condition for extra stealing is the same.
>
> More accureately, I means mid order page (current_order <
> pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> In this case, perhaps, the system has enough unmovable order 0 freepages,
> so we don't need to worry about second kind of fragmentation you
> mentioned below. Stealing one mid order freepage is enough to satify
> request.
OK.
>>
>>> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
>>> it can be reclaimed so excessive migratetype movement doesn't result
>>> in permanent fragmentation.
>>
>> There's two kinds of "fragmentation" IMHO. First, inside a
>> pageblock, unmovable allocations can prevent merging of lower
>> orders. This can get worse if we steal multiple pages from a single
>> pageblock, but the pageblock itself is not marked as unmovable.
>
> So, what's the intention pageblock itself not marked as unmovable?
> I guess that if many pages are moved to unmovable, they can't be easily
> back and this pageblock is highly fragmented. So, processing more unmovable
> requests from this pageblock by changing pageblock migratetype makes more
> sense to me.
There's the danger that we mark too much pageblocks as unmovable in some
unmovable allocation spike and even if the number of unmovable allocated
pages later decreases, they will keep being allocated from many
unmovable-marked pageblocks, and neither will become empty enough to be
remarked back. If we don't mark pageblocks unmovable as aggressively,
it's possible that the unmovable allocations in a partially-stolen
pageblock will be eventually freed, and no more unmovable allocations
will occur in that pageblock if it's not marked as unmovable.
>> Second kind of fragmentation is when unmovable allocations spread
>> over multiple pageblocks. Lower order allocations within each such
>> pageblock might be still possible, but less pageblocks are able to
>> compact to have whole pageblock free.
>>
>> I think the second kind is worse, so when do have to pollute a
>> movable pageblock with unmovable allocation, we better take as much
>> as possible, so we prevent polluting other pageblocks.
>
> I agree.
>
>>
>>> What I'd like to do to prevent fragmentation is
>>> 1) check whether we can steal all or almost freepages and change
>>> migratetype of pageblock.
>>> 2) If above condition isn't met, deny allocation and invoke compaction.
>>
>> Could work to some extend, but we need also to prevent excessive compaction.
>
> So, I suggest knob to control behaviour. In small memory system,
> fragmentation occurs frequently so the system can't handle just order 2
> request. In that system, excessive compaction is acceptable because
> it it better than system down.
So you say that in these systems, order 2 requests fail because of page
stealing?
>>
>> We could also introduce a new pageblock migratetype, something like
>> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
>> MOVABLE allocations, it's marked as MIXED, until it either becomes
>> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
>> fully freed. In more detail:
>>
>> - MIXED is preferred for fallback before any other migratetypes
>> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
>> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
>> current rules), it marks it as MIXED instead.
>> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
>> pageblocks, it will only mark it as MOVABLE if it was fully free.
>> Otherwise, if current rules would result in marking it as MOVABLE
>> (i.e. most of it was stolen, but not all) it will mark it as MIXED
>> instead.
>>
>> This could in theory leave more MOVABLE pageblocks unspoiled by
>> UNMOVABLE allocations.
>
> I guess that we can do it without introducing new migratetype pageblock.
> Just always marking it as RECLAIMABLE/UNMOVABLE when
> RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> have same effect.
See the argument above. The difference with MIXED marking is that new
unmovable allocations would take from these pageblocks only as a
fallback. Primarily it would try to reuse a more limited number of
unmovable-marked pageblocks.
But this is just an idea not related to the series at hand. Yes, it
could be better, these are all heuristics and any change is a potential
tradeoff.
Also we need to keep in mind that ultimately, anything we devise cannot
prevent fragmentation 100%. We cannot predict the future, so we don't
know which unmovable allocations will be freed soon, and which will stay
for longer time. To minimize fragmentation, we would need to recognize
those longer-lived unmovable allocations, so we could put them together
in as few pageblocks as possible.
> Thanks.
>
>>> Maybe knob to control behaviour would be needed.
>>> How about it?
>>
>> Adding new knobs is not a good solution.
>
On Tue, Dec 09, 2014 at 12:09:40PM +0900, Minchan Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> > When allocation falls back to stealing free pages of another migratetype,
> > it can decide to steal extra pages, or even the whole pageblock in order to
> > reduce fragmentation, which could happen if further allocation fallbacks
> > pick a different pageblock. In try_to_steal_freepages(), one of the situations
> > where extra pages are stolen happens when we are trying to allocate a
> > MIGRATE_RECLAIMABLE page.
> >
> > However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> > spreading such allocation over multiple fallback pageblocks is arguably even
> > worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> > should minimize the number of such fallbacks, and thus steal as much as is
> > possible from each fallback pageblock.
>
> Fair enough.
>
Just to be absolutly sure, check that data and see what the number of
MIGRATE_UNMOVABLE blocks looks like over time. Make sure it's not just
continually growing. MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE blocks were
expected to be freed if the system was aggressively reclaimed but the same
is not be true of MIGRATE_UNMOVABLE. Even if all processes are
aggressively reclaimed for example, the page tables are still there.
--
Mel Gorman
SUSE Labs
On Tue, Dec 09, 2014 at 10:12:15AM +0100, Vlastimil Babka wrote:
> On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> >On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> >>On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >>>
> >>>I'm not sure that this change is good. If we steal order 0 pages,
> >>>this may be good. But, sometimes, we try to steal high order page
> >>>and, in this case, there would be many order 0 freepages and blindly
> >>>stealing freepages in that pageblock make the system more fragmented.
> >>
> >>I don't understand. If we try to steal high order page
> >>(current_order >= pageblock_order / 2), then nothing changes, the
> >>condition for extra stealing is the same.
> >
> >More accureately, I means mid order page (current_order <
> >pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> >In this case, perhaps, the system has enough unmovable order 0 freepages,
> >so we don't need to worry about second kind of fragmentation you
> >mentioned below. Stealing one mid order freepage is enough to satify
> >request.
>
> OK.
>
> >>
> >>>MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >>>it can be reclaimed so excessive migratetype movement doesn't result
> >>>in permanent fragmentation.
> >>
> >>There's two kinds of "fragmentation" IMHO. First, inside a
> >>pageblock, unmovable allocations can prevent merging of lower
> >>orders. This can get worse if we steal multiple pages from a single
> >>pageblock, but the pageblock itself is not marked as unmovable.
> >
> >So, what's the intention pageblock itself not marked as unmovable?
> >I guess that if many pages are moved to unmovable, they can't be easily
> >back and this pageblock is highly fragmented. So, processing more unmovable
> >requests from this pageblock by changing pageblock migratetype makes more
> >sense to me.
>
> There's the danger that we mark too much pageblocks as unmovable in
> some unmovable allocation spike and even if the number of unmovable
> allocated pages later decreases, they will keep being allocated from
> many unmovable-marked pageblocks, and neither will become empty
> enough to be remarked back. If we don't mark pageblocks unmovable as
> aggressively, it's possible that the unmovable allocations in a
> partially-stolen pageblock will be eventually freed, and no more
> unmovable allocations will occur in that pageblock if it's not
> marked as unmovable.
Hmm... Yes, but, it seems to be really workload dependent. I'll check
the effect of changing pageblock migratetype aggressively on my test bed.
>
> >>Second kind of fragmentation is when unmovable allocations spread
> >>over multiple pageblocks. Lower order allocations within each such
> >>pageblock might be still possible, but less pageblocks are able to
> >>compact to have whole pageblock free.
> >>
> >>I think the second kind is worse, so when do have to pollute a
> >>movable pageblock with unmovable allocation, we better take as much
> >>as possible, so we prevent polluting other pageblocks.
> >
> >I agree.
> >
> >>
> >>>What I'd like to do to prevent fragmentation is
> >>>1) check whether we can steal all or almost freepages and change
> >>>migratetype of pageblock.
> >>>2) If above condition isn't met, deny allocation and invoke compaction.
> >>
> >>Could work to some extend, but we need also to prevent excessive compaction.
> >
> >So, I suggest knob to control behaviour. In small memory system,
> >fragmentation occurs frequently so the system can't handle just order 2
> >request. In that system, excessive compaction is acceptable because
> >it it better than system down.
>
> So you say that in these systems, order 2 requests fail because of
> page stealing?
Yes. At some point, system memory is highly fragmented and order 2
requests fail. It would be caused by page stealing but I didn't analyze it.
> >>
> >>We could also introduce a new pageblock migratetype, something like
> >>MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> >>MOVABLE allocations, it's marked as MIXED, until it either becomes
> >>marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> >>fully freed. In more detail:
> >>
> >>- MIXED is preferred for fallback before any other migratetypes
> >>- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> >>pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> >>current rules), it marks it as MIXED instead.
> >>- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> >>pageblocks, it will only mark it as MOVABLE if it was fully free.
> >>Otherwise, if current rules would result in marking it as MOVABLE
> >>(i.e. most of it was stolen, but not all) it will mark it as MIXED
> >>instead.
> >>
> >>This could in theory leave more MOVABLE pageblocks unspoiled by
> >>UNMOVABLE allocations.
> >
> >I guess that we can do it without introducing new migratetype pageblock.
> >Just always marking it as RECLAIMABLE/UNMOVABLE when
> >RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> >have same effect.
>
> See the argument above. The difference with MIXED marking is that
> new unmovable allocations would take from these pageblocks only as a
> fallback. Primarily it would try to reuse a more limited number of
> unmovable-marked pageblocks.
Ah, I understand now. Looks like a good idea.
> But this is just an idea not related to the series at hand. Yes, it
> could be better, these are all heuristics and any change is a
> potential tradeoff.
>
> Also we need to keep in mind that ultimately, anything we devise
> cannot prevent fragmentation 100%. We cannot predict the future, so
> we don't know which unmovable allocations will be freed soon, and
> which will stay for longer time. To minimize fragmentation, we would
> need to recognize those longer-lived unmovable allocations, so we
> could put them together in as few pageblocks as possible.
>
> >Thanks.
> >
> >>>Maybe knob to control behaviour would be needed.
> >>>How about it?
> >>
> >>Adding new knobs is not a good solution.
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>