LinuxLists.cc - [PATCH v2 0/3] Memory hotplug locking cleanup

On 08.06.21 17:00, David Hildenbrand wrote:
> On 07.06.21 12:23, Oscar Salvador wrote:
>> On Mon, Jun 07, 2021 at 10:49:01AM +0200, David Hildenbrand wrote:
>>> I'd like to point out that I think the seqlock is not in place to
>>> synchronize with actual growing/shrinking but to get consistent zone ranges
>>> -- like using atomics, but we have two inter-dependent values here.
>>
>> I guess so, at least that's what it should do.
>> But the way it is placed right now is misleading.
>>
>> If we really want to get consistent zone ranges, we should start using
>> zone's seqlock where it matters and that is pretty much all those
>> places that use zone_spans_pfn().
>
> Right, or even only zone_end_pfn() to get a consistent value.
>
>> Otherwise there is no way you can be sure the pfn you're checking is
>> within the limits. Moreover, as Michal pointed out early, if we really
>> want to go down that road the locking should be made in the caller
>> evolving the operation, otheriwse things might change once the lock
>> is dropped and you're working with a wrong assumption.
>>
>> I can see arguments for both riping it out and doing it right (but none for
>> the way it is right now).
>> For riping it out, one could say that those races might not be fatal,
>> as usually the pfn you're working with (the one you want to check falls
>> within a certain range) you know is valid, so the worst can happen is
>> you get false positives/negatives and that might or might not be detected
>> further down. How bad are false positive/negatives I guess it depends on the
>> situation, but we already do that right now.
>> The zone_spans_pfn() from page_outside_zone_boundaries() is the only one using
>> locking right now, so well, if we survided this long without locks in other places
>> using zone_spans_pfn() makes one wonder if it is that bad.
>>
>> On the other hand, one could argue that for correctness sake, we should be holding
>> zone's seqlock whenever checking for zone_spans_pfn() to avoid any inconsistency.
>>
>>
>
> IMHO, as we know the race exists and we have a tool to handle it in
> place, we should maybe fix the obvious cases if possible.
>
> Code that uses zone->zone_start_pfn directly is unlikely to be broken on
> most architectures. We will usually read/write via single instruction
> and won't get inconsistencies, for example, when shrinking or growing
> the zone. We most probably don't want to use an atomic for that right now.
>
> Code that uses zone->spanned_pages to detect the zone end, however, is
> more likely to be broken. I don't think we have any relevant around
> anymore. Everything was converted to zone_end_pfn().
>
> I feel like we should just make zone_end_pfn() take the seqlock in read.
> Then, we at least get a consistent value, for example, while growing a zone.
>
> Just imagine the following case when we grow a section to the front when
> onlining memory:
>
> zone->zone_start_pfn -= new_pages;
> zone->spanned_pages += new_pages;
>
> Note that compilers/CPUs might reshuffle as they like. If someone (e.g.,
> zone_spans_pfn()) races with that code, it might get new
> zone->zone_start_pfn but old zone->spanned_pages. zone_end_pfn() will
> report a "too small zone" and trigger false negatives in zone_spans_pfn().
>

Thinking again, we could of course also simply convert to
zone->zone_start_+ pfn zone->zone_end_pfn. Places that need
spanned_pages() would have the same issue, but I think they are rather a
concern case.

--
Thanks,

David / dhildenb