2022-04-07 15:59:59

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm, page_alloc: fix build_zonerefs_node()

On Thu, Apr 07, 2022 at 01:17:19PM +0200, Juergen Gross wrote:
> On 07.04.22 13:07, Michal Hocko wrote:
> > On Thu 07-04-22 12:45:41, Juergen Gross wrote:
> > > On 07.04.22 12:34, Michal Hocko wrote:
> > > > Ccing Mel
> > > >
> > > > On Thu 07-04-22 11:32:21, Juergen Gross wrote:
> > > > > Since commit 9d3be21bf9c0 ("mm, page_alloc: simplify zonelist
> > > > > initialization") only zones with free memory are included in a built
> > > > > zonelist. This is problematic when e.g. all memory of a zone has been
> > > > > ballooned out.
> > > >
> > > > What is the actual problem there?
> > >
> > > When running as Xen guest new hotplugged memory will not be onlined
> > > automatically, but only on special request. This is done in order to
> > > support adding e.g. the possibility to use another GB of memory, while
> > > adding only a part of that memory initially.
> > >
> > > In case adding that memory is populating a new zone, the page allocator
> > > won't be able to use this memory when it is onlined, as the zone wasn't
> > > added to the zonelist, due to managed_zone() returning 0.
> >
> > How is that memory onlined? Because "regular" onlining (online_pages())
> > does rebuild zonelists if their zone hasn't been populated before.
>
> The Xen balloon driver has an own callback for onlining pages. The pages
> are just added to the ballooned-out page list without handing them to the
> allocator. This is done only when the guest is ballooned up.
>

Is this new behaviour? I ask because keeping !managed_zones out of the
zonelist and reclaim paths and the behaviour makes sense. Elsewhere you
state "zone can always happen to have no free memory left" and this is true
but it's usually a transient event. The difference between a populated
vs managed zone is usually permanent event where no memory will ever be
placed on the buddy lists because the memory was reserved early in boot
or a similar reason. The patch is probably harmless but it has the
potential to waste CPUs allocating or reclaiming from zones that will
never succeed.

--
Mel Gorman
SUSE Labs


2022-04-07 21:05:50

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] mm, page_alloc: fix build_zonerefs_node()

On 07.04.22 14:32, Mel Gorman wrote:
> On Thu, Apr 07, 2022 at 01:17:19PM +0200, Juergen Gross wrote:
>> On 07.04.22 13:07, Michal Hocko wrote:
>>> On Thu 07-04-22 12:45:41, Juergen Gross wrote:
>>>> On 07.04.22 12:34, Michal Hocko wrote:
>>>>> Ccing Mel
>>>>>
>>>>> On Thu 07-04-22 11:32:21, Juergen Gross wrote:
>>>>>> Since commit 9d3be21bf9c0 ("mm, page_alloc: simplify zonelist
>>>>>> initialization") only zones with free memory are included in a built
>>>>>> zonelist. This is problematic when e.g. all memory of a zone has been
>>>>>> ballooned out.
>>>>>
>>>>> What is the actual problem there?
>>>>
>>>> When running as Xen guest new hotplugged memory will not be onlined
>>>> automatically, but only on special request. This is done in order to
>>>> support adding e.g. the possibility to use another GB of memory, while
>>>> adding only a part of that memory initially.
>>>>
>>>> In case adding that memory is populating a new zone, the page allocator
>>>> won't be able to use this memory when it is onlined, as the zone wasn't
>>>> added to the zonelist, due to managed_zone() returning 0.
>>>
>>> How is that memory onlined? Because "regular" onlining (online_pages())
>>> does rebuild zonelists if their zone hasn't been populated before.
>>
>> The Xen balloon driver has an own callback for onlining pages. The pages
>> are just added to the ballooned-out page list without handing them to the
>> allocator. This is done only when the guest is ballooned up.
>>
>
> Is this new behaviour? I ask because keeping !managed_zones out of the

For some time (since kernel 5.9) Xen is using the zone device functionality
with memremap_pages() and pgmap->type = MEMORY_DEVICE_GENERIC.

> zonelist and reclaim paths and the behaviour makes sense. Elsewhere you
> state "zone can always happen to have no free memory left" and this is true
> but it's usually a transient event. The difference between a populated

And if this "transient event" is just happening when the zonelists are
being rebuilt the zone will be off the lists maybe forever.

> vs managed zone is usually permanent event where no memory will ever be
> placed on the buddy lists because the memory was reserved early in boot
> or a similar reason. The patch is probably harmless but it has the
> potential to waste CPUs allocating or reclaiming from zones that will
> never succeed.

I'd recommend to have an explicit flag per-zone for this case if you
really care about that. This would be much cleaner than to imply from
no free page being present at a specific point in time, that the zone
will never be subject to memory allocation.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments