On Wed, 26 Oct 2022 20:24:38 +0900 NARIBAYASHI Akira <[email protected]> wrote:
> Depending on the memory configuration, isolate_freepages_block() may
> scan pages out of the target range and causes panic.
>
> The problem is that pfn as argument of fast_isolate_around() could
> be out of the target range. Therefore we should consider the case
> where pfn < start_pfn, and also the case where end_pfn < pfn.
>
> This problem should have been addressd by the commit 6e2b7044c199
> ("mm, compaction: make fast_isolate_freepages() stay within zone")
> but there was an oversight.
>
> Case1: pfn < start_pfn
>
> <at memory compaction for node Y>
> | node X's zone | node Y's zone
> +-----------------+------------------------------...
> pageblock ^ ^ ^
> +-----------+-----------+-----------+-----------+...
> ^ ^ ^
> ^ ^ end_pfn
> ^ start_pfn = cc->zone->zone_start_pfn
> pfn
> <---------> scanned range by "Scan After"
>
> Case2: end_pfn < pfn
>
> <at memory compaction for node X>
> | node X's zone | node Y's zone
> +-----------------+------------------------------...
> pageblock ^ ^ ^
> +-----------+-----------+-----------+-----------+...
> ^ ^ ^
> ^ ^ pfn
> ^ end_pfn
> start_pfn
> <---------> scanned range by "Scan Before"
>
> It seems that there is no good reason to skip nr_isolated pages
> just after given pfn. So let perform simple scan from start to end
> instead of dividing the scan into "Before" and "After".
Under what circumstances will this panic occur? I assume those
circumstnces are pretty rare, give that 6e2b7044c1992 was nearly two
years ago.
Did you consider the desirability of backporting this fix into earlier
kernels?
On Thu, Oct 27, 2022 at 01:25:57PM -0700, Andrew Morton wrote:
> On Wed, 26 Oct 2022 20:24:38 +0900 NARIBAYASHI Akira <[email protected]> wrote:
>
> > Depending on the memory configuration, isolate_freepages_block() may
> > scan pages out of the target range and causes panic.
> >
> > The problem is that pfn as argument of fast_isolate_around() could
> > be out of the target range. Therefore we should consider the case
> > where pfn < start_pfn, and also the case where end_pfn < pfn.
> >
> > This problem should have been addressd by the commit 6e2b7044c199
> > ("mm, compaction: make fast_isolate_freepages() stay within zone")
> > but there was an oversight.
> >
> > Case1: pfn < start_pfn
> >
> > <at memory compaction for node Y>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ end_pfn
> > ^ start_pfn = cc->zone->zone_start_pfn
> > pfn
> > <---------> scanned range by "Scan After"
> >
> > Case2: end_pfn < pfn
> >
> > <at memory compaction for node X>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ pfn
> > ^ end_pfn
> > start_pfn
> > <---------> scanned range by "Scan Before"
> >
> > It seems that there is no good reason to skip nr_isolated pages
> > just after given pfn. So let perform simple scan from start to end
> > instead of dividing the scan into "Before" and "After".
>
> Under what circumstances will this panic occur?
I'd also like to see a warning or oops report combined with the
/proc/zoneinfo file of the machine affected. This is to confirm it's an
actual bug and not a suspicion based on code inspection and a simplification
of the code. The answer determines whether this is a -stable candidate
or not.
Both Case 1 and 2 require that the initial pfn started outside the zone
which is unexpected. The clamping on zone boundary in fast_isolate_aropund()
is happening due to pageblock alignment as there is no guarantee that zones
are aligned on a hugepage boundary. pfn itself should have been fine as
it is the PFN of a page that was recently isolated.
The Scan After logic should also be ok. In the context it's called,
nr_isolated is the number of pages that were just isolated so
pfn + nr_isolated is the end of the free page that was just isolated.
The patch itself should be functionally fine but it rescans a region that has
already been isolated which is a little wasteful but it is straight-forward
and the overhead is probably negligible.
--
Mel Gorman
SUSE Labs