2023-07-28 22:36:57

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout

On Fri, Jul 28, 2023 at 11:02:46PM +0200, David Hildenbrand wrote:
> Can we get a simple revert in first (without that FOLL_FORCE special casing
> and ideally with a better name) to handle stable backports, and I'll
> follow-up with more documentation and letting GUP callers pass in that flag
> instead?
>
> That would help a lot. Then we also have more time to let that "move it to
> GUP callers" mature a bit in -next, to see if we find any surprises?

As I raised my concern over the other thread, I still worry numa users can
be affected by this change. After all, numa isn't so uncommon to me, at
least fedora / rhel as CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y. I highly
suspect that's also true to major distros. Meanwhile all kernel modules
use gup..

I'd say we can go ahead and try if we want, but I really don't know why
that helps in any form to move it to the callers.. with the risk of
breaking someone.

Logically it should also be always better to migrate earlier than later,
not only because the page will be local earlier, but also per I discussed
also in the other thread (that the gup can hold a ref to the page, and it
could potentially stop numa balancing to succeed later).

Thanks,

--
Peter Xu



2023-07-28 23:32:34

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout

On 28.07.23 23:20, Peter Xu wrote:
> On Fri, Jul 28, 2023 at 11:02:46PM +0200, David Hildenbrand wrote:
>> Can we get a simple revert in first (without that FOLL_FORCE special casing
>> and ideally with a better name) to handle stable backports, and I'll
>> follow-up with more documentation and letting GUP callers pass in that flag
>> instead?
>>
>> That would help a lot. Then we also have more time to let that "move it to
>> GUP callers" mature a bit in -next, to see if we find any surprises?
>
> As I raised my concern over the other thread, I still worry numa users can
> be affected by this change. After all, numa isn't so uncommon to me, at
> least fedora / rhel as CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y. I highly
> suspect that's also true to major distros. Meanwhile all kernel modules
> use gup..
>
> I'd say we can go ahead and try if we want, but I really don't know why
> that helps in any form to move it to the callers.. with the risk of
> breaking someone.
>

Indeed, that's why I suggest to be a bit careful, especially with stable.

> Logically it should also be always better to migrate earlier than later,
> not only because the page will be local earlier, but also per I discussed
> also in the other thread (that the gup can hold a ref to the page, and it
> could potentially stop numa balancing to succeed later).

I get your point, but I also see the following cases (QEMU/KVM as example):

* User space triggers O_DIRECT. It will be short-lived. But is it really
an access from that CPU (NUMA node) to that page? At least for KVM,
you much rather want to let KVM trigger the NUMA fault on actual
memory access from a guest VCPU, not from a QEMU iothread when pinning
the page?

* vfio triggers FOLL_PIN|FOLL_LONGTERM from a random QEMU thread.
Where should we migrate that page to? Would it actually be counter-
productive to migrate it to the NUMA node of the setup thread? The
longterm pin will turn the page unmovable, yes, but where to migrate
it to?

--
Cheers,

David / dhildenb