2024-03-12 08:22:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] arm64/mm: adds soft dirty page tracking

On 12.03.24 02:16, Shivansh Vij wrote:

Hi,

> Checkpoint-Restore in Userspace (CRIU) needs to be able
> to track a memory page's changes if we want to enable
> pre-dumping, which is important for live migrations.
>
> The PTE_DIRTY bit (defined in pgtable-prot.h) is already
> used to track software dirty pages, and the PTE_WRITE and
> PTE_READ bits are used to track hardware dirty pages.
>
> This patch enables full soft dirty page tracking
> (including swap PTE support) for arm64 systems, and is
> based very closely on the x86 implementation.
>
> It is based on an unfinished patch by
> Bin Lu ([email protected]) from 2017
> (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/),
> but has been updated for newer 6.x kernels as well as
> tested on various 5.x kernels.

There has also been more recently:

https://lore.kernel.org/lkml/[email protected]/#r

I recall that we are short on SW PTE bits:

"
So if you need software dirty, it can only be done with another software
PTE bit. The problem is that we are short of such bits (only one left if
we move PTE_PROT_NONE to a different location). The userfaultfd people
also want such bit.

Personally I'd reuse the four PBHA bits but I keep hearing that they may
be used with some out of tree patches.
"

https://lore.kernel.org/lkml/[email protected]/

--
Cheers,

David / dhildenb



2024-03-12 10:28:08

by Joey Gouly

[permalink] [raw]
Subject: Re: [PATCH] arm64/mm: adds soft dirty page tracking

On Tue, Mar 12, 2024 at 09:22:25AM +0100, David Hildenbrand wrote:
> On 12.03.24 02:16, Shivansh Vij wrote:
>
> Hi,
>
> > Checkpoint-Restore in Userspace (CRIU) needs to be able
> > to track a memory page's changes if we want to enable
> > pre-dumping, which is important for live migrations.
> >
> > The PTE_DIRTY bit (defined in pgtable-prot.h) is already
> > used to track software dirty pages, and the PTE_WRITE and
> > PTE_READ bits are used to track hardware dirty pages.
> >
> > This patch enables full soft dirty page tracking
> > (including swap PTE support) for arm64 systems, and is
> > based very closely on the x86 implementation.
> >
> > It is based on an unfinished patch by
> > Bin Lu ([email protected]) from 2017
> > (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/),
> > but has been updated for newer 6.x kernels as well as
> > tested on various 5.x kernels.
>
> There has also been more recently:
>
> https://lore.kernel.org/lkml/[email protected]/#r
>
> I recall that we are short on SW PTE bits:
>
> "
> So if you need software dirty, it can only be done with another software
> PTE bit. The problem is that we are short of such bits (only one left if
> we move PTE_PROT_NONE to a different location). The userfaultfd people
> also want such bit.
>
> Personally I'd reuse the four PBHA bits but I keep hearing that they may
> be used with some out of tree patches.
> "
>
> https://lore.kernel.org/lkml/[email protected]/
>

I have some patches on the list (Permission Overlay) that also uses bit 60

series: https://lore.kernel.org/linux-arm-kernel/[email protected]/
commit: https://lore.kernel.org/linux-arm-kernel/[email protected]/

I will be sending out a v4 of that in several weeks.

Thanks,
Joey

2024-03-12 22:33:38

by Shivansh Vij

[permalink] [raw]
Subject: Re: [PATCH] arm64/mm: adds soft dirty page tracking

Hi David,

On Tue, Mar 12, 2024 at 09:22:25AM +0100, David Hildenbrand wrote:
> On 12.03.24 02:16, Shivansh Vij wrote:
>
> Hi,
>
> > Checkpoint-Restore in Userspace (CRIU) needs to be able
> > to track a memory page's changes if we want to enable
> > pre-dumping, which is important for live migrations.
> >
> > The PTE_DIRTY bit (defined in pgtable-prot.h) is already
> > used to track software dirty pages, and the PTE_WRITE and
> > PTE_READ bits are used to track hardware dirty pages.
> >
> > This patch enables full soft dirty page tracking
> > (including swap PTE support) for arm64 systems, and is
> > based very closely on the x86 implementation.
> >
> > It is based on an unfinished patch by
> > Bin Lu ([email protected]) from 2017
> > (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/),
> > but has been updated for newer 6.x kernels as well as
> > tested on various 5.x kernels.
>
> There has also been more recently:
>
> https://lore.kernel.org/lkml/[email protected]/#r
>
> I recall that we are short on SW PTE bits:
>
> "
> So if you need software dirty, it can only be done with another software
> PTE bit. The problem is that we are short of such bits (only one left if
> we move PTE_PROT_NONE to a different location). The userfaultfd people
> also want such bit.
>
> Personally I'd reuse the four PBHA bits but I keep hearing that they may
> be used with some out of tree patches.
> "
>
> https://lore.kernel.org/lkml/[email protected]/

If I'm understanding the previous discussion (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/) correctly, the core issue is that we actually do need to use a special SW PTE bit (like the PTE_SOFT_DIRTY that's in this patch) - but at the same time, the PTE bits are highly contentious so it would be ideal if we could reuse an existing bit (maybe one of the PBHA bits like you suggested) instead of creating a new one.

Is my understanding correct?

Thanks,
Shivansh

2024-03-15 09:30:46

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] arm64/mm: adds soft dirty page tracking

On 12.03.24 23:32, Shivansh Vij wrote:
> Hi David,
>
> On Tue, Mar 12, 2024 at 09:22:25AM +0100, David Hildenbrand wrote:
>> On 12.03.24 02:16, Shivansh Vij wrote:
>>
>> Hi,
>>
>>> Checkpoint-Restore in Userspace (CRIU) needs to be able
>>> to track a memory page's changes if we want to enable
>>> pre-dumping, which is important for live migrations.
>>>
>>> The PTE_DIRTY bit (defined in pgtable-prot.h) is already
>>> used to track software dirty pages, and the PTE_WRITE and
>>> PTE_READ bits are used to track hardware dirty pages.
>>>
>>> This patch enables full soft dirty page tracking
>>> (including swap PTE support) for arm64 systems, and is
>>> based very closely on the x86 implementation.
>>>
>>> It is based on an unfinished patch by
>>> Bin Lu ([email protected]) from 2017
>>> (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/),
>>> but has been updated for newer 6.x kernels as well as
>>> tested on various 5.x kernels.
>>
>> There has also been more recently:
>>
>> https://lore.kernel.org/lkml/[email protected]/#r
>>
>> I recall that we are short on SW PTE bits:
>>
>> "
>> So if you need software dirty, it can only be done with another software
>> PTE bit. The problem is that we are short of such bits (only one left if
>> we move PTE_PROT_NONE to a different location). The userfaultfd people
>> also want such bit.
>>
>> Personally I'd reuse the four PBHA bits but I keep hearing that they may
>> be used with some out of tree patches.
>> "
>>
>> https://lore.kernel.org/lkml/[email protected]/
>
> If I'm understanding the previous discussion (https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/) correctly, the core issue is that we actually do need to use a special SW PTE bit (like the PTE_SOFT_DIRTY that's in this patch) - but at the same time, the PTE bits are highly contentious so it would be ideal if we could reuse an existing bit (maybe one of the PBHA bits like you suggested) instead of creating a new one.
>
> Is my understanding correct?

Yes, that matches my understanding. As Joey noted, the bit you chose is
defined by HW and might soon get used.

As Catalin wrote, some OOT patches might use the PBHA bits; although I
am not sure what the latest state on that is and if we really should
care about OOT patches. Maybe it would be good enough to allow driver
use only in PFNMAP mappings, and simply not use the bit for
softdirty/uffd-wp in there.

I don't know much about PBHA, this [1] never got merged but is an
interesting read. We are certainly short on sw bits in any case.

There was recently some discussions around why soft-dirty tracking is
not suitable (unfixable) for some cases, buried in previous iterations
of [2]. The outcome of that was a new UFFD_FEATURE_WP_ASYNC mode as a
replacement for soft-dirty tracking.

So long-term, avoiding introducing soft-dirty tracking and instead
supporting uffd-wp might be the better choice on arm64.

[1] https://lkml.kernel.org/r/[email protected]
[2]
https://lore.kernel.org/all/[email protected]/

--
Cheers,

David / dhildenb