2015-02-03 06:48:07

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
> Even after all the patches compaction received in last several versions, it
> turns out that its effectivneess degrades considerably as the system ages
> after reboot. For example, see how success rates of stress-highalloc from
> mmtests degrades when we re-execute it several times, first time being after
> fresh reboot:
> 3.19-rc4 3.19-rc4 3.19-rc4
> 4-nothp-1 4-nothp-2 4-nothp-3
> Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
> Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
> Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
> Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
> Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
> Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
> Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
> Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
> Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
>
> Wouldn't it be much better, if it looked like this?
>
> 3.18 3.18 3.18
> 3.19-rc4 3.19-rc4 3.19-rc4
> 5-nothp-1 5-nothp-2 5-nothp-3
> Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
> Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
> Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
> Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
> Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
> Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
> Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
> Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
> Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
>
> In my humble opinion, it would :) Much lower degradation, and a nice
> improvement in the first iteration as a bonus.
>
> So what sorcery is this? Nothing much, just a fundamental change of the
> compaction scanners operation...
>
> As everyone knows [1] the migration scanner starts at the first pageblock
> of a zone, and goes towards the end, and the free scanner starts at the
> last pageblock and goes towards the beginning. Somewhere in the middle of the
> zone, the scanners meet:
>
> zone_start zone_end
> | |
> -------------------------------------------------------------
> MMMMMMMMMMMMM| => <= |FFFFFFFFFFFF
> migrate_pfn free_pfn
>
> In my tests, the scanners meet around the middle of the pageblock on the first
> iteration, and around the 1/3 on subsequent iterations. Which means the
> migration scanner doesn't see the larger part of the zone at all. For more
> details why it's bad, see Patch 4 description.
>
> To make sure we eventually scan the whole zone with the migration scanner, we
> could e.g. reverse the directions after each run. But that would still be
> biased, and with 1/3 of zone reachable from each side, we would still omit the
> middle 1/3 of a zone.
>
> Or we could stop terminating compaction when the scanners meet, and let them
> continue to scan the whole zone. Mel told me it used to be the case long ago,
> but that approach would result in migrating pages back and forth during single
> compaction run, which wouldn't be cool.
>
> So the approach taken by this patchset is to let scanners start at any
> so-called pivot pfn within the zone, and keep their direction:
>
> zone_start pivot zone_end
> | | |
> -------------------------------------------------------------
> <= |FFFFFFFMMMMMM| =>
> free_pfn migrate_pfn
>
> Eventually, one of the scanners will reach the zone boundary and wrap around,
> e.g. the in the case of the free scanner:
>
> zone_start pivot zone_end
> | | |
> -------------------------------------------------------------
> FFFFFFFFFFFFFFFFFFFFFFFFFFFFFMMMMMMMMMMMM| => <= |F
> migrate_pfn free_pfn
>
> Compaction will again terminate when the scanners meet.
>
>
> As you can imagine, the required code changes made the termination detection
> and the scanners themselves quite hairy. There are lots of corner cases and
> the code is often hard to wrap one's head around [puns intended]. The scanner
> functions isolate_migratepages() and isolate_freepages() were recently cleaned
> up a lot, and this makes them messy again, as they can no longer rely on the
> fact that they will meet the other scanner and not the zone boundary.
>
> But the improvements seem to make these complications worth, and I hope
> somebody can suggest more elegant solutions to the various parts of the code.
> So here it is as a RFC. Patches 1-3 are cleanups that could be applied in any
> case. Patch 4 implements the main changes, but leaves the pivot to be the
> first zone's pfn, so that free scanner wraps immediately and there's no
> actual change. Patch 5 updates the pivot in a conservative way, as explained
> in the changelog.

Hello,

I don't have any elegant idea, but, have some humble opinion.

The point is that migrate scanner should scan whole zone.
Although your pivot approach makes some sense and it can scan whole zone,
it could cause back and forth migration in a very short term whenever
both scanners get toward and passed each other. I think that if we permit
overlap of scanner, we don't need to adhere to reverse linear scanning
in freepage scanner since reverse liner scan doesn't prevent back and
forth migration from now on.

There are two solutions on this problem.
One is that free scanner scans pfn in same direction where migrate scanner
goes with having proper interval.

|=========================|
MMM==> <Interval> FFF==>

Enough interval guarantees to prevent back and forth migration,
at least, in a very short period.

Or, we could make free scanner totally different with linear scan.
Linear scanning to get freepage wastes much time if system memory
is really big and most of it is used. If we takes freepage from the
buddy, we can eliminate this scanning overhead. With additional
logic, that is, comparing position of freepage with migrate scanner
and selectively taking it, we can avoid back and forth migration
in a very short period.

Thanks.


2015-02-03 09:05:26

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>
> Hello,
>
> I don't have any elegant idea, but, have some humble opinion.
>
> The point is that migrate scanner should scan whole zone.
> Although your pivot approach makes some sense and it can scan whole zone,
> it could cause back and forth migration in a very short term whenever
> both scanners get toward and passed each other.

I don't understand the scenario you suggest? The scanners don't overlap in any
single run, that doesn't change. If they meet, compaction terminates. They can
"overlap" if you compare the current run with previous run, after pivot change.
The it's true that e.g. migration scanner will operate on pageblocks where the
free scanner has operated on previously. But pivot changes are only done after
the full defer cycle, which is not short term.

> I think that if we permit
> overlap of scanner, we don't need to adhere to reverse linear scanning
> in freepage scanner since reverse liner scan doesn't prevent back and
> forth migration from now on.

I believe that we still don't permit overlap, but anyway...

> There are two solutions on this problem.
> One is that free scanner scans pfn in same direction where migrate scanner
> goes with having proper interval.
>
> |=========================|
> MMM==> <Interval> FFF==>
>
> Enough interval guarantees to prevent back and forth migration,
> at least, in a very short period.

That would depend on the termination criteria and what to do after restart.
You would have to terminate as soon as one scanner approaches the position where
the other started. Otherwise you overlap and migrate back in a single run. So
you terminate and that will typically mean one of the scanners did not finish
its part fully, so there are pageblocks scanned by neither one. You could adjust
the interval to find the optimal one. But you shouldn't do it immediately next
run, as that would overlap the previous run too soon. Or maybe adjust it only a
little... I don't know if that's simpler than my approach, it seems more quirky.

> Or, we could make free scanner totally different with linear scan.
> Linear scanning to get freepage wastes much time if system memory
> is really big and most of it is used. If we takes freepage from the
> buddy, we can eliminate this scanning overhead. With additional
> logic, that is, comparing position of freepage with migrate scanner
> and selectively taking it, we can avoid back and forth migration
> in a very short period.

I think the metric we should be looking is the ration between free pages scanned
and migrate pages scanned. It's true that after this series in the
allocate-as-thp scenario, it was more than 10 in the first run after reboot.
So maybe it could be more efficient to search the buddy lists. But then again,
when to terminate in this case? The free lists are changing continuously. And to
compare position, you also need to predict how much the migrate scanner will
progress in the single run, because you don't want to take pages from there.

> Thanks.
>

2015-02-03 15:00:57

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

2015-02-03 18:05 GMT+09:00 Vlastimil Babka <[email protected]>:
> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>
>> Hello,
>>
>> I don't have any elegant idea, but, have some humble opinion.
>>
>> The point is that migrate scanner should scan whole zone.
>> Although your pivot approach makes some sense and it can scan whole zone,
>> it could cause back and forth migration in a very short term whenever
>> both scanners get toward and passed each other.
>
> I don't understand the scenario you suggest? The scanners don't overlap in any
> single run, that doesn't change. If they meet, compaction terminates. They can
> "overlap" if you compare the current run with previous run, after pivot change.

Yeah, I mean this case.

I think that we should regard single run as whole zone scan rather than just
terminating criteria we have artificially defined and try to avoid
back and forth
problem as much as possible in this scale. Not overlapping in a single run you
mentioned doesn't solve this problem in this scale.

> The it's true that e.g. migration scanner will operate on pageblocks where the
> free scanner has operated on previously. But pivot changes are only done after
> the full defer cycle, which is not short term.

I don't think it's not short term. After successful run, if next high
order request
comes immediately, migrate scanner will immediately restart at the position
where previous free scanner has operated.

>
>> I think that if we permit
>> overlap of scanner, we don't need to adhere to reverse linear scanning
>> in freepage scanner since reverse liner scan doesn't prevent back and
>> forth migration from now on.
>
> I believe that we still don't permit overlap, but anyway...
>
>> There are two solutions on this problem.
>> One is that free scanner scans pfn in same direction where migrate scanner
>> goes with having proper interval.
>>
>> |=========================|
>> MMM==> <Interval> FFF==>
>>
>> Enough interval guarantees to prevent back and forth migration,
>> at least, in a very short period.
>
> That would depend on the termination criteria and what to do after restart.
> You would have to terminate as soon as one scanner approaches the position where
> the other started. Otherwise you overlap and migrate back in a single run. So
> you terminate and that will typically mean one of the scanners did not finish
> its part fully, so there are pageblocks scanned by neither one. You could adjust
> the interval to find the optimal one. But you shouldn't do it immediately next
> run, as that would overlap the previous run too soon. Or maybe adjust it only a
> little... I don't know if that's simpler than my approach, it seems more quirky.

Yeah, the idea comes from quick thought so it's not perfect.
In fact, if we regard single run as whole zone scan, back and forth problem is
inevitable. What we can do best is reducing bad effect of that problem. With
interval, we don't try to migrate page which we immediately use for freepage
in a very short period.

I think that we can break relationship of free scanner and migrate scanner.
It's not necessary that summation of scanned range of both scanner is whole
zone. The point is migrate scanner should scan whole zone. Free scanner
would adjust scanning position based on the position where
migrate scanner is, whenever necessary.

>> Or, we could make free scanner totally different with linear scan.
>> Linear scanning to get freepage wastes much time if system memory
>> is really big and most of it is used. If we takes freepage from the
>> buddy, we can eliminate this scanning overhead. With additional
>> logic, that is, comparing position of freepage with migrate scanner
>> and selectively taking it, we can avoid back and forth migration
>> in a very short period.
>
> I think the metric we should be looking is the ration between free pages scanned
> and migrate pages scanned. It's true that after this series in the
> allocate-as-thp scenario, it was more than 10 in the first run after reboot.
> So maybe it could be more efficient to search the buddy lists. But then again,
> when to terminate in this case? The free lists are changing continuously. And to
> compare position, you also need to predict how much the migrate scanner will
> progress in the single run, because you don't want to take pages from there.

We can terminate when whole zone is scanned. As I mentioned above, we can't
avoid back and forth problem in case of whole zone range and I'd like to do what
we can do our best, avoiding back and forth in a very short term. Maybe, taking
freepage positioned zone_range/2 far from with migrated scanner at that time
would prevent the problem occurrence in a very short term.

Thanks.

2015-02-03 15:51:53

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

On 02/03/2015 04:00 PM, Joonsoo Kim wrote:
> 2015-02-03 18:05 GMT+09:00 Vlastimil Babka <[email protected]>:
>> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>>
>>> Hello,
>>>
>>> I don't have any elegant idea, but, have some humble opinion.
>>>
>>> The point is that migrate scanner should scan whole zone.
>>> Although your pivot approach makes some sense and it can scan whole zone,
>>> it could cause back and forth migration in a very short term whenever
>>> both scanners get toward and passed each other.
>>
>> I don't understand the scenario you suggest? The scanners don't overlap in any
>> single run, that doesn't change. If they meet, compaction terminates. They can
>> "overlap" if you compare the current run with previous run, after pivot change.
>
> Yeah, I mean this case.
>
> I think that we should regard single run as whole zone scan rather than just
> terminating criteria we have artificially defined and try to avoid
> back and forth
> problem as much as possible in this scale. Not overlapping in a single run you
> mentioned doesn't solve this problem in this scale.
>
>> The it's true that e.g. migration scanner will operate on pageblocks where the
>> free scanner has operated on previously. But pivot changes are only done after
>> the full defer cycle, which is not short term.
>
> I don't think it's not short term. After successful run, if next high
> order request
> comes immediately, migrate scanner will immediately restart at the position
> where previous free scanner has operated.

Ah, I think I see where the misunderstanding comes from now. So to clarify,
let's consider

1. single compaction run - single invocation of compact_zone(). It can start
from cached pfn's from previous run, or zone boundaries (or pivot, after this
series), and terminate with scanners meeting or not meeting.

2. full zone compaction - consists one or more compaction runs, where the first
run starts at boundaries (pivot). It ends when scanners meet -
compact_finished() returns COMPACT_COMPLETE

3. compaction after full defer cycle - this is full zone compaction, where
compaction_restarting() returns true in its first run

My understanding is that you think pivot changing occurs after each full zone
compaction (definition 2), but in fact it occurs only each defer cycle
(definition 3). See patch 5 for detailed reasoning. I don't think it's short
term. It means full zone compactions (def 2) already failed many times and then
was deferred for further time, using the same unchanged pivot.

I think any of the alternatives you suggested below where migrate scanner
processes whole zone during full zone compaction (2), would necessarily result
in shorter-term back and forth migration than this scheme. On the other hand,
the pivot changing proposed here might be too long-term. But it's a first
attempt, and the frequency can be further tuned.

2015-02-03 17:07:14

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

2015-02-04 0:51 GMT+09:00 Vlastimil Babka <[email protected]>:
> On 02/03/2015 04:00 PM, Joonsoo Kim wrote:
>> 2015-02-03 18:05 GMT+09:00 Vlastimil Babka <[email protected]>:
>>> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>>>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>>>
>>>> Hello,
>>>>
>>>> I don't have any elegant idea, but, have some humble opinion.
>>>>
>>>> The point is that migrate scanner should scan whole zone.
>>>> Although your pivot approach makes some sense and it can scan whole zone,
>>>> it could cause back and forth migration in a very short term whenever
>>>> both scanners get toward and passed each other.
>>>
>>> I don't understand the scenario you suggest? The scanners don't overlap in any
>>> single run, that doesn't change. If they meet, compaction terminates. They can
>>> "overlap" if you compare the current run with previous run, after pivot change.
>>
>> Yeah, I mean this case.
>>
>> I think that we should regard single run as whole zone scan rather than just
>> terminating criteria we have artificially defined and try to avoid
>> back and forth
>> problem as much as possible in this scale. Not overlapping in a single run you
>> mentioned doesn't solve this problem in this scale.
>>
>>> The it's true that e.g. migration scanner will operate on pageblocks where the
>>> free scanner has operated on previously. But pivot changes are only done after
>>> the full defer cycle, which is not short term.
>>
>> I don't think it's not short term. After successful run, if next high
>> order request
>> comes immediately, migrate scanner will immediately restart at the position
>> where previous free scanner has operated.
>
> Ah, I think I see where the misunderstanding comes from now. So to clarify,
> let's consider
>
> 1. single compaction run - single invocation of compact_zone(). It can start
> from cached pfn's from previous run, or zone boundaries (or pivot, after this
> series), and terminate with scanners meeting or not meeting.
>
> 2. full zone compaction - consists one or more compaction runs, where the first
> run starts at boundaries (pivot). It ends when scanners meet -
> compact_finished() returns COMPACT_COMPLETE
>
> 3. compaction after full defer cycle - this is full zone compaction, where
> compaction_restarting() returns true in its first run
>
> My understanding is that you think pivot changing occurs after each full zone
> compaction (definition 2), but in fact it occurs only each defer cycle
> (definition 3). See patch 5 for detailed reasoning. I don't think it's short
> term. It means full zone compactions (def 2) already failed many times and then
> was deferred for further time, using the same unchanged pivot.

Ah... thanks for clarifying. I actually think pivot changing occurs at
definition 2
as you guess. :)

> I think any of the alternatives you suggested below where migrate scanner
> processes whole zone during full zone compaction (2), would necessarily result
> in shorter-term back and forth migration than this scheme. On the other hand,
> the pivot changing proposed here might be too long-term. But it's a first
> attempt, and the frequency can be further tuned.

Yes, your proposal would be less problematic on back and forth problem than
my suggestion.

Hmm...nevertheless, I can't completely agree with pivot approach.

I'd like to remove dependency of migrate scanner and free scanner such as
termination criteria at this chance. Meeting position of both scanner is roughly
determined by on amount of free memory in the zone. If 200 MB is free in
the zone, migrate scanner can scan at maximum 200 MB from the start pfn
of the pivot. Without changing pivot quickly, we can scan only
this region regardless zone size so it cause bad effect to high order
allocation for a long time.

In stress-highalloc test, it doesn't matter since we try to attempt a lot of
allocations. This bad effect would not appear easily. Although middle of
allocation attempts are failed, latter attempts would succeed
since pivot would be changed in the middle of attempts.

But, in real world scenario, all allocation attempts are precise and
it'd be better
first come high order allocation request to succeed and this is another problem
than allocation success rate in stress-highalloc test. To accomplish it, we
need to change pivot as soon as possible. Without it, we could miss some
precise allocation attempt until pivot is changed. For this purpose, we should
remove defer logic or change it more loosely and then, resetting pivot would
occur soon so we could encounter back and forth problem frequently.

Therefore, it's better to change compaction logic more fundamentally.

Thanks.

2015-02-04 14:39:35

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

On 02/03/2015 06:07 PM, Joonsoo Kim wrote:
> 2015-02-04 0:51 GMT+09:00 Vlastimil Babka <[email protected]>:
>> Ah, I think I see where the misunderstanding comes from now. So to clarify,
>> let's consider
>>
>> 1. single compaction run - single invocation of compact_zone(). It can start
>> from cached pfn's from previous run, or zone boundaries (or pivot, after this
>> series), and terminate with scanners meeting or not meeting.
>>
>> 2. full zone compaction - consists one or more compaction runs, where the first
>> run starts at boundaries (pivot). It ends when scanners meet -
>> compact_finished() returns COMPACT_COMPLETE
>>
>> 3. compaction after full defer cycle - this is full zone compaction, where
>> compaction_restarting() returns true in its first run
>>
>> My understanding is that you think pivot changing occurs after each full zone
>> compaction (definition 2), but in fact it occurs only each defer cycle
>> (definition 3). See patch 5 for detailed reasoning. I don't think it's short
>> term. It means full zone compactions (def 2) already failed many times and then
>> was deferred for further time, using the same unchanged pivot.
>
> Ah... thanks for clarifying. I actually think pivot changing occurs at
> definition 2
> as you guess. :)

Great it's clarified :)

>> I think any of the alternatives you suggested below where migrate scanner
>> processes whole zone during full zone compaction (2), would necessarily result
>> in shorter-term back and forth migration than this scheme. On the other hand,
>> the pivot changing proposed here might be too long-term. But it's a first
>> attempt, and the frequency can be further tuned.
>
> Yes, your proposal would be less problematic on back and forth problem than
> my suggestion.
>
> Hmm...nevertheless, I can't completely agree with pivot approach.
>
> I'd like to remove dependency of migrate scanner and free scanner such as
> termination criteria at this chance. Meeting position of both scanner is roughly

Well at some point compaction should terminate if it scanned the whole
zone, and failed. How else to check that than using the scanner positions?

> determined by on amount of free memory in the zone. If 200 MB is free in
> the zone, migrate scanner can scan at maximum 200 MB from the start pfn
> of the pivot. Without changing pivot quickly, we can scan only
> this region regardless zone size so it cause bad effect to high order
> allocation for a long time.
>
> In stress-highalloc test, it doesn't matter since we try to attempt a lot of
> allocations. This bad effect would not appear easily. Although middle of
> allocation attempts are failed, latter attempts would succeed
> since pivot would be changed in the middle of attempts.

OK, that might be true. It's not a perfect benchmark.

> But, in real world scenario, all allocation attempts are precise and
> it'd be better
> first come high order allocation request to succeed and this is another problem
> than allocation success rate in stress-highalloc test. To accomplish it, we
> need to change pivot as soon as possible. Without it, we could miss some
> precise allocation attempt until pivot is changed. For this purpose, we should
> remove defer logic or change it more loosely and then, resetting pivot would
> occur soon so we could encounter back and forth problem frequently.

It seems to me that you can't have both the "migration scanner should
try scanning whole zone during single compaction (or during relatively
few attempts)" and "we shouldn't migrate pages that we have just
(relatively recently) migrated", in any scheme including the two you
proposed in previous mail. These features just go against each other.

In any scheme you should divide the zone between part that's scanned for
pages to migrate from, and part that scanned for free pages as migration
targets. If you don't divide, then you end up migrating back and forth
instantly, which would be bad.

So what happens after you don't have any free pages in the part that was
for the free scanner (this is what happen in current scheme when
scanners meet). If you wanted to continue with the migration scanner,
the only free pages are in the part which the migration scanner just
processed. And funnily enough, the pivot changing scheme will put the
free scanner just in the position to scan this part. But doing that
immediately could mean excessive migration.

Your alternative scheme where free scanner follows the migration scanner
at some distance is not very different in this sense. If you manage to
tune the distance properly, you will also scan for free pages the part
that was just processed by the migration scanner. It might be more
efficient in that you don't rescan the part that the migration scanner
didn't reach both before and after pivot change. But fundamentally, it
means migrating pages that were recently migrated.

> Therefore, it's better to change compaction logic more fundamentally.

Maybe it's indeed better to excessively migrate than keep rescanning the
same pageblocks and then defer compaction. But we shouldn't forget that
immediate success rate is not the only criteria. We should also keep the
overhead sane. That's why there's pageblock suitability bitfield,
deferred compaction etc, which I'm not sure how those would fit into the
"continuously progressing migration scanner" scheme.

So what I think should precede such increase in compact aggressivity:
- on direct compaction, only try migrate when successfully isolated all
pages needed for merging the desired order page. I've had such patch
already in one series last year, but it affected the anti-fragmentation
effects of compaction.
- no more THP page faults (also for other good reasons), leave
collapsing to khugepaged, or rather task_work, leaving only the
expensive sync compaction to dedicated per-node daemons. These should
hopefully solve the anti-fragmentation issue as well.

> Thanks.
>

2015-02-10 08:52:16

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] compaction: changing initial position of scanners

On Wed, Feb 04, 2015 at 03:39:25PM +0100, Vlastimil Babka wrote:
> On 02/03/2015 06:07 PM, Joonsoo Kim wrote:
> >2015-02-04 0:51 GMT+09:00 Vlastimil Babka <[email protected]>:
> >>Ah, I think I see where the misunderstanding comes from now. So to clarify,
> >>let's consider
> >>
> >>1. single compaction run - single invocation of compact_zone(). It can start
> >>from cached pfn's from previous run, or zone boundaries (or pivot, after this
> >>series), and terminate with scanners meeting or not meeting.
> >>
> >>2. full zone compaction - consists one or more compaction runs, where the first
> >>run starts at boundaries (pivot). It ends when scanners meet -
> >>compact_finished() returns COMPACT_COMPLETE
> >>
> >>3. compaction after full defer cycle - this is full zone compaction, where
> >>compaction_restarting() returns true in its first run
> >>
> >>My understanding is that you think pivot changing occurs after each full zone
> >>compaction (definition 2), but in fact it occurs only each defer cycle
> >>(definition 3). See patch 5 for detailed reasoning. I don't think it's short
> >>term. It means full zone compactions (def 2) already failed many times and then
> >>was deferred for further time, using the same unchanged pivot.
> >
> >Ah... thanks for clarifying. I actually think pivot changing occurs at
> >definition 2
> >as you guess. :)
>
> Great it's clarified :)
>
> >>I think any of the alternatives you suggested below where migrate scanner
> >>processes whole zone during full zone compaction (2), would necessarily result
> >>in shorter-term back and forth migration than this scheme. On the other hand,
> >>the pivot changing proposed here might be too long-term. But it's a first
> >>attempt, and the frequency can be further tuned.
> >
> >Yes, your proposal would be less problematic on back and forth problem than
> >my suggestion.
> >
> >Hmm...nevertheless, I can't completely agree with pivot approach.
> >
> >I'd like to remove dependency of migrate scanner and free scanner such as
> >termination criteria at this chance. Meeting position of both scanner is roughly
>
> Well at some point compaction should terminate if it scanned the
> whole zone, and failed. How else to check that than using the
> scanner positions?

We can count number of pageblock we are scanning and if it matches
with zone span we can terminate.

> >determined by on amount of free memory in the zone. If 200 MB is free in
> >the zone, migrate scanner can scan at maximum 200 MB from the start pfn
> >of the pivot. Without changing pivot quickly, we can scan only
> >this region regardless zone size so it cause bad effect to high order
> >allocation for a long time.
> >
> >In stress-highalloc test, it doesn't matter since we try to attempt a lot of
> >allocations. This bad effect would not appear easily. Although middle of
> >allocation attempts are failed, latter attempts would succeed
> >since pivot would be changed in the middle of attempts.
>
> OK, that might be true. It's not a perfect benchmark.
>
> >But, in real world scenario, all allocation attempts are precise and
> >it'd be better
> >first come high order allocation request to succeed and this is another problem
> >than allocation success rate in stress-highalloc test. To accomplish it, we
> >need to change pivot as soon as possible. Without it, we could miss some
> >precise allocation attempt until pivot is changed. For this purpose, we should
> >remove defer logic or change it more loosely and then, resetting pivot would
> >occur soon so we could encounter back and forth problem frequently.
>
> It seems to me that you can't have both the "migration scanner
> should try scanning whole zone during single compaction (or during
> relatively few attempts)" and "we shouldn't migrate pages that we
> have just (relatively recently) migrated", in any scheme including
> the two you proposed in previous mail. These features just go
> against each other.
>
> In any scheme you should divide the zone between part that's scanned
> for pages to migrate from, and part that scanned for free pages as
> migration targets. If you don't divide, then you end up migrating
> back and forth instantly, which would be bad.

Okay. My proposal isn't perfect but it is just quick thought. :)
I hope that it is a seed for better idea, not the solution.
It may be true that we need to divide region for each scanner.

>
> So what happens after you don't have any free pages in the part that
> was for the free scanner (this is what happen in current scheme when
> scanners meet). If you wanted to continue with the migration
> scanner, the only free pages are in the part which the migration
> scanner just processed. And funnily enough, the pivot changing
> scheme will put the free scanner just in the position to scan this
> part. But doing that immediately could mean excessive migration.
>
> Your alternative scheme where free scanner follows the migration
> scanner at some distance is not very different in this sense. If you
> manage to tune the distance properly, you will also scan for free
> pages the part that was just processed by the migration scanner. It
> might be more efficient in that you don't rescan the part that the
> migration scanner didn't reach both before and after pivot change.
> But fundamentally, it means migrating pages that were recently
> migrated.

What I'm worrying about in pivot approach is that if pivot is changed
frequently due to burst of failed high order allocation or we decide
to make compaction aggressive, immediate migration back and forth
could happen. Isn't it the problem we need to consider here?

One of main difference of my approach is the direction of free scanner
and if we scan in same direction with migration scanner, we can
guarantees that certain migrated pages are migrated again as late as
possible. But, with reverse direction, it's not possible to keep this
order. See following example.

PIVOT(REVERSE DIRECTION)
|---1---2---3-----|------------------|
=> after compaction
|-----------------|---3---2---1------|
=> after pivot changed, #3 page which is migrated recently is migrated
first.

SAME DIRECTION
|---1---2---3-----|------------------|
=> after compaction
|-----------------|---1---2---3------|
=> we can keep the order of migratable pages.

I know that if we want to scan whole zone range, we can't avoid back
and forth migration. Best thing we can do is something like this that
preserve order of migration and my proposal aims at this purpose.

Thanks.

> >Therefore, it's better to change compaction logic more fundamentally.
>
> Maybe it's indeed better to excessively migrate than keep rescanning
> the same pageblocks and then defer compaction. But we shouldn't
> forget that immediate success rate is not the only criteria. We
> should also keep the overhead sane. That's why there's pageblock
> suitability bitfield, deferred compaction etc, which I'm not sure
> how those would fit into the "continuously progressing migration
> scanner" scheme.
> So what I think should precede such increase in compact aggressivity:
> - on direct compaction, only try migrate when successfully isolated
> all pages needed for merging the desired order page. I've had such
> patch already in one series last year, but it affected the
> anti-fragmentation effects of compaction.
> - no more THP page faults (also for other good reasons), leave
> collapsing to khugepaged, or rather task_work, leaving only the
> expensive sync compaction to dedicated per-node daemons. These
> should hopefully solve the anti-fragmentation issue as well.
>
> >Thanks.
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>