2022-11-22 12:42:46

by Muhammad Usama Anjum

[permalink] [raw]
Subject: [PATCH v2] mm: set the vma flags dirty before testing if it is mergeable

The VM_SOFTDIRTY should be set in the vma flags to be tested if new
allocation should be merged in previous vma or not. With this patch,
the new allocations are merged in the previous VMAs.

I've tested it by reverting the commit 34228d473efe ("mm: ignore
VM_SOFTDIRTY on VMA merging") and after adding this following patch,
I'm seeing that all the new allocations done through mmap() are merged
in the previous VMAs. The number of VMAs doesn't increase drastically
which had contributed to the crash of gimp. If I run the same test after
reverting and not including this patch, the number of VMAs keep on
increasing with every mmap() syscall which proves this patch.

The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
seems like a workaround. But it lets the soft-dirty and non-soft-dirty
VMA to get merged. It helps in avoiding the creation of too many VMAs.
But it creates the problem while adding the feature of clearing the
soft-dirty status of only a part of the memory region.

Cc: <[email protected]>
Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
We need more testing of this patch.

While implementing clear soft-dirty bit for a range of address space, I'm
facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
When discussed with the some other developers they consider it the
regression. Why the non-soft dirty page should appear as soft dirty when it
isn't soft dirty in reality? I agree with them. Should we revert
34228d473efe or find a workaround in the IOCTL?

* Revert may cause the VMAs to expand in uncontrollable situation where the
soft dirty bit of a lot of memory regions or the whole address space is
being cleared again and again. AFAIK normal process must either be only
clearing a few memory regions. So the applications should be okay. There is
still chance of regressions if some applications are already using the
soft-dirty bit. I'm not sure how to test it.

* Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
surely lose the functionality to detect reused memory regions. But the
extraneous soft-dirty pages would not appear. I'm trying to do this in the
patch series [1]. Some discussion is going on that this fails with some
mprotect use case [2]. I still need to have a look at the mprotect selftest
to see how and why this fails. I think this can be implemented after some
more work probably in mprotect side.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Changes in v2:
- Rebase on top of next-20221122
---
mm/mmap.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index f4e2989be5ff..031d23bc43c4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2552,6 +2552,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags |= VM_ACCOUNT;
}

+ /*
+ * New (or expanded) vma always get soft dirty status.
+ * Otherwise user-space soft-dirty page tracker won't
+ * be able to distinguish situation when vma area unmapped,
+ * then new mapped in-place (which must be aimed as
+ * a completely new data area).
+ */
+ vm_flags |= VM_SOFTDIRTY;
+
next = mas_next(&mas, ULONG_MAX);
prev = mas_prev(&mas, 0);
if (vm_flags & VM_SPECIAL)
@@ -2724,15 +2733,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (file)
uprobe_mmap(vma);

- /*
- * New (or expanded) vma always get soft dirty status.
- * Otherwise user-space soft-dirty page tracker won't
- * be able to distinguish situation when vma area unmapped,
- * then new mapped in-place (which must be aimed as
- * a completely new data area).
- */
- vma->vm_flags |= VM_SOFTDIRTY;
-
vma_set_page_prot(vma);

validate_mm(mm);
@@ -2974,7 +2974,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
vma->vm_start = addr;
vma->vm_end = addr + len;
vma->vm_pgoff = addr >> PAGE_SHIFT;
- vma->vm_flags = flags;
+ vma->vm_flags = flags | VM_SOFTDIRTY;
vma->vm_page_prot = vm_get_page_prot(flags);
mas_set_range(mas, vma->vm_start, addr + len - 1);
if (mas_store_gfp(mas, vma, GFP_KERNEL))
@@ -2987,7 +2987,6 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vma->vm_flags |= VM_SOFTDIRTY;
validate_mm(mm);
return 0;

@@ -3021,6 +3020,8 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
if ((flags & (~VM_EXEC)) != 0)
return -EINVAL;

+ flags |= VM_SOFTDIRTY;
+
ret = check_brk_limits(addr, len);
if (ret)
goto limits_failed;
--
2.30.2


2022-11-29 07:49:39

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: set the vma flags dirty before testing if it is mergeable

On Tue, Nov 22, 2022 at 04:50:07PM +0500, Muhammad Usama Anjum wrote:
> The VM_SOFTDIRTY should be set in the vma flags to be tested if new
> allocation should be merged in previous vma or not. With this patch,
> the new allocations are merged in the previous VMAs.

Hi Muhammad! Thanks for the patch and sorry for late reply. Here is a moment
I don't understand -- when we test for vma merge we use is_mergeable_vma() helper
which excludes VM_SOFTDIRTY flag from comarision, so setting this flag earlier
should not change the behaviour. Or I miss something obvious?

> I've tested it by reverting the commit 34228d473efe ("mm: ignore
> VM_SOFTDIRTY on VMA merging") and after adding this following patch,
> I'm seeing that all the new allocations done through mmap() are merged
> in the previous VMAs. The number of VMAs doesn't increase drastically
> which had contributed to the crash of gimp. If I run the same test after
> reverting and not including this patch, the number of VMAs keep on
> increasing with every mmap() syscall which proves this patch.

The is_mergeable_vma is key function here, either we should setup VM_SOFTDIRTY
explicitly as your patch does and drop VM_SOFTDIRTY from is_mergeable_vma,
or we continue excluding this flag in such low level helper as is.

> The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
> seems like a workaround. But it lets the soft-dirty and non-soft-dirty
> VMA to get merged. It helps in avoiding the creation of too many VMAs.
> But it creates the problem while adding the feature of clearing the
> soft-dirty status of only a part of the memory region.

So you need an extended functionality, could you please put this
changelog snippet somewhere on top? Otherwise srat reading this patch
I simply didn't get what we're trying to achieve.

>
> Cc: <[email protected]>
> Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")

Wait, is there some critical bug or error that needs stable@ to be
patched? The way softdirty has been implemented in first place is
to reach minimum needs for dirty page tracking. More precise tracking
(such as partial cleanup of memory region) will require at least other
structures to remember which part of vma is cleared and which one is
dirty after their merge. And I don't think this is possible to implement
without extending vma structure itself (which is big enough already).

Or maybe I'm blind and not see obvious problem here, sorry then :)

> Signed-off-by: Muhammad Usama Anjum <[email protected]>
> ---
> We need more testing of this patch.
>
> While implementing clear soft-dirty bit for a range of address space, I'm
> facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
> dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
>
> When discussed with the some other developers they consider it the
> regression. Why the non-soft dirty page should appear as soft dirty when it
> isn't soft dirty in reality? I agree with them. Should we revert
> 34228d473efe or find a workaround in the IOCTL?

Well, this is not the regression, it is been designed this way because
there is no place to keep subflags on regions covered by one VMA and non
merging them cause vma fragmentation (I've seen massive vma fragmentations
especially in db engines). So no, reverting it is not an option but rather
will cause problems in real applications I fear.

>
> * Revert may cause the VMAs to expand in uncontrollable situation where the
> soft dirty bit of a lot of memory regions or the whole address space is
> being cleared again and again. AFAIK normal process must either be only
> clearing a few memory regions. So the applications should be okay. There is
> still chance of regressions if some applications are already using the
> soft-dirty bit. I'm not sure how to test it.

Main purpose of this dirty functionality came from containers c/r procedure.
As far as I remember we've been clearing vmas for the whole container, though
it's been a while and i'm not involved into c/r development right now so may
miss something from my memory.

> * Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
> surely lose the functionality to detect reused memory regions. But the
> extraneous soft-dirty pages would not appear. I'm trying to do this in the
> patch series [1]. Some discussion is going on that this fails with some
> mprotect use case [2]. I still need to have a look at the mprotect selftest
> to see how and why this fails. I think this can be implemented after some
> more work probably in mprotect side.

ioctl might be an option indeed

>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/

2022-11-29 13:55:56

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [PATCH v2] mm: set the vma flags dirty before testing if it is mergeable

On 11/29/22 12:27 PM, Cyrill Gorcunov wrote:
> On Tue, Nov 22, 2022 at 04:50:07PM +0500, Muhammad Usama Anjum wrote:
>> The VM_SOFTDIRTY should be set in the vma flags to be tested if new
>> allocation should be merged in previous vma or not. With this patch,
>> the new allocations are merged in the previous VMAs.
>
> Hi Muhammad! Thanks for the patch and sorry for late reply. Here is a moment
> I don't understand -- when we test for vma merge we use is_mergeable_vma() helper
> which excludes VM_SOFTDIRTY flag from comarision, so setting this flag earlier
> should not change the behaviour. Or I miss something obvious?
Yeah, it doesn't change the behavior until we also revert the 34228d473efe.
is_mergeable_vma() started ignoring VM_SOFTDIRTY flag when every new
allocation was creating a new VMA. So this patch proposes the correct fix
that instead of ignoring the VM_SOFTDIRTY, the newly allocated should be
marked VM_SOFTDIRTY and then is_mergeable_vma() should be called.

>
>> I've tested it by reverting the commit 34228d473efe ("mm: ignore
>> VM_SOFTDIRTY on VMA merging") and after adding this following patch,
>> I'm seeing that all the new allocations done through mmap() are merged
>> in the previous VMAs. The number of VMAs doesn't increase drastically
>> which had contributed to the crash of gimp. If I run the same test after
>> reverting and not including this patch, the number of VMAs keep on
>> increasing with every mmap() syscall which proves this patch.
>
> The is_mergeable_vma is key function here, either we should setup VM_SOFTDIRTY
> explicitly as your patch does and drop VM_SOFTDIRTY from is_mergeable_vma,
> or we continue excluding this flag in such low level helper as is.
Agreed.

>
>> The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
>> seems like a workaround. But it lets the soft-dirty and non-soft-dirty
>> VMA to get merged. It helps in avoiding the creation of too many VMAs.
>> But it creates the problem while adding the feature of clearing the
>> soft-dirty status of only a part of the memory region.
>
> So you need an extended functionality, could you please put this
> changelog snippet somewhere on top? Otherwise srat reading this patch
> I simply didn't get what we're trying to achieve.
I'm referring to the changeset in [1]
(https://lore.kernel.org/all/[email protected]/)
which is trying to add the extended functionality.

>>
>> Cc: <[email protected]>
>> Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")
>
> Wait, is there some critical bug or error that needs stable@ to be
> patched? The way softdirty has been implemented in first place is
No there is no bug or any thing. I've added fixes tag considering that the
original patch tracking vma changes was written such that every new
allocated VMA was not being merged in the previous one.

> to reach minimum needs for dirty page tracking. More precise tracking
> (such as partial cleanup of memory region) will require at least other
> structures to remember which part of vma is cleared and which one is
> dirty after their merge. And I don't think this is possible to implement
> without extending vma structure itself (which is big enough already).
We'll have to think about it if I'm unable to make the IOCTL work without
considering VM_SOFTDIRTY.

>
> Or maybe I'm blind and not see obvious problem here, sorry then :)
>
>> Signed-off-by: Muhammad Usama Anjum <[email protected]>
>> ---
>> We need more testing of this patch.
>>
>> While implementing clear soft-dirty bit for a range of address space, I'm
>> facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
>> dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
>>
>> When discussed with the some other developers they consider it the
>> regression. Why the non-soft dirty page should appear as soft dirty when it
>> isn't soft dirty in reality? I agree with them. Should we revert
>> 34228d473efe or find a workaround in the IOCTL?
>
> Well, this is not the regression, it is been designed this way because
> there is no place to keep subflags on regions covered by one VMA and non
> merging them cause vma fragmentation (I've seen massive vma fragmentations
> especially in db engines). So no, reverting it is not an option but rather
> will cause problems in real applications I fear.
Yeah, makes sense.

>
>>
>> * Revert may cause the VMAs to expand in uncontrollable situation where the
>> soft dirty bit of a lot of memory regions or the whole address space is
>> being cleared again and again. AFAIK normal process must either be only
>> clearing a few memory regions. So the applications should be okay. There is
>> still chance of regressions if some applications are already using the
>> soft-dirty bit. I'm not sure how to test it.
>
> Main purpose of this dirty functionality came from containers c/r procedure.
> As far as I remember we've been clearing vmas for the whole container, though
> it's been a while and i'm not involved into c/r development right now so may
> miss something from my memory.
>
>> * Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
>> surely lose the functionality to detect reused memory regions. But the
>> extraneous soft-dirty pages would not appear. I'm trying to do this in the
>> patch series [1]. Some discussion is going on that this fails with some
>> mprotect use case [2]. I still need to have a look at the mprotect selftest
>> to see how and why this fails. I think this can be implemented after some
>> more work probably in mprotect side.
>
> ioctl might be an option indeed
Thank you for supporting this. I'll track down the issue caused by
remapping and mprotect mentioned here:
https://lore.kernel.org/all/[email protected]/
and we can proceed with this.

>
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>> [2] https://lore.kernel.org/all/[email protected]/

Thank you so much for the review.

--
BR,
Muhammad Usama Anjum

2022-11-30 22:10:15

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: set the vma flags dirty before testing if it is mergeable

On Tue, Nov 29, 2022 at 06:49:53PM +0500, Muhammad Usama Anjum wrote:
> > ioctl might be an option indeed
> Thank you for supporting this. I'll track down the issue caused by
> remapping and mprotect mentioned here:
> https://lore.kernel.org/all/[email protected]/
> and we can proceed with this.
>
> >
> >>
> >> [1] https://lore.kernel.org/all/[email protected]/
> >> [2] https://lore.kernel.org/all/[email protected]/

Hi Muhammad! Hopefully I'll find some time soon to read all these conversation,
so for now my replies might be simply out of context. Initially the vma softdirty
was needed to catch a case where memory remapped inplace and what is worse it might
have _same_ ptes dirty after clear_refs call. IOW, the process allocated vma and
write some data into. Then we (page tracker process) come in, read pagemap and clear
softdirty bits, and page traker process terminates. While we're not watching the program
unmaps vma, maps new one with same size and what is worse it writes data to the same pages
as we saw at last scan time. So without VM_SOFTDIRTY we won't be able to find that this
VMA is actually carrying new pages which were not yet dumped.

And similar scenario can be for merging: say former vma has been 4 pages, we scan it
and clear dirty pages at low and hight address. Then process splits this VMA to two with
gap inbetween and then map new area which merge them all into one vma, and process can
write again pages to same address so we have to mark this new VMA as softdirty. If only
I rememeber correctly about the initial idea :)