2024-02-21 09:15:29

by Yajun Deng

[permalink] [raw]
Subject: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()

In most cases, the range of the area is valid. But in do_mprotect_pkey(),
the minimum value of end and vma->vm_end is passed to mprotect_fixup().
This will lead to the end is less than the end of prev.

In this case, the curr will be NULL, but the next will be equal to the
prev. So it will attempt to merge before, the vm_pgoff check will cause
this case to fail.

To avoid the process described above and reduce unnecessary operations.
Add a check to immediately return NULL if the end is less than the end of
prev.

Signed-off-by: Yajun Deng <[email protected]>
---
v2: remove the case label.
v1: https://lore.kernel.org/all/[email protected]/
---
mm/mmap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 0fccd23f056e..7668854d2246 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -890,6 +890,9 @@ static struct vm_area_struct
if (vm_flags & VM_SPECIAL)
return NULL;

+ if (prev && end < prev->vm_end)
+ return NULL;
+
/* Does the input range span an existing VMA? (cases 5 - 8) */
curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);

--
2.25.1



2024-02-21 15:39:06

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()

* Yajun Deng <[email protected]> [240221 04:15]:
> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> This will lead to the end is less than the end of prev.
>
> In this case, the curr will be NULL, but the next will be equal to the
> prev. So it will attempt to merge before, the vm_pgoff check will cause
> this case to fail.
>
> To avoid the process described above and reduce unnecessary operations.
> Add a check to immediately return NULL if the end is less than the end of
> prev.

If it's only one caller, could we stop that caller instead of checking
an almost never case for all callers? Would this better fit in
vma_modify()? Although that's not just for this caller at this point.
Maybe there isn't a good place?

Or are there other reasons this may happen and is better done in this
function?

Often, this is called the "punch a hole" scenario; where an operation
creates two entries from the old data and either leaves an empty space
or fills the space with a new VMA.

>
> Signed-off-by: Yajun Deng <[email protected]>
> ---
> v2: remove the case label.
> v1: https://lore.kernel.org/all/[email protected]/
> ---
> mm/mmap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0fccd23f056e..7668854d2246 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -890,6 +890,9 @@ static struct vm_area_struct
> if (vm_flags & VM_SPECIAL)
> return NULL;
>
> + if (prev && end < prev->vm_end)
> + return NULL;
> +
> /* Does the input range span an existing VMA? (cases 5 - 8) */
> curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>
> --
> 2.25.1
>

2024-02-21 20:44:07

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()

On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
> * Yajun Deng <[email protected]> [240221 04:15]:
> > In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> > the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> > This will lead to the end is less than the end of prev.
> >
> > In this case, the curr will be NULL, but the next will be equal to the
> > prev. So it will attempt to merge before, the vm_pgoff check will cause
> > this case to fail.
> >
> > To avoid the process described above and reduce unnecessary operations.
> > Add a check to immediately return NULL if the end is less than the end of
> > prev.
>
> If it's only one caller, could we stop that caller instead of checking
> an almost never case for all callers? Would this better fit in
> vma_modify()? Although that's not just for this caller at this point.
> Maybe there isn't a good place?

I definitely agree with Liam that this should not be in vma_merge(), as
it's not going to be relevant to _most_ callers.

I am not sure vma_modify() is much better, this would be the only early
exit check in that function and makes what is very simple and
straightforward now more confusing.

And I think this is the crux of it - it's confusing that we special case
this one particular non-merge scenario, but no others (all of which we then
deem ok to be caught by the usual rules).

I think it's simpler (and more efficient) to just keep things the way they
are.

>
> Or are there other reasons this may happen and is better done in this
> function?
>
> Often, this is called the "punch a hole" scenario; where an operation
> creates two entries from the old data and either leaves an empty space
> or fills the space with a new VMA.
>
> >
> > Signed-off-by: Yajun Deng <[email protected]>
> > ---
> > v2: remove the case label.
> > v1: https://lore.kernel.org/all/[email protected]/
> > ---
> > mm/mmap.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 0fccd23f056e..7668854d2246 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -890,6 +890,9 @@ static struct vm_area_struct
> > if (vm_flags & VM_SPECIAL)
> > return NULL;
> >
> > + if (prev && end < prev->vm_end)
> > + return NULL;
> > +
> > /* Does the input range span an existing VMA? (cases 5 - 8) */
> > curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
> >
> > --
> > 2.25.1
> >

So overall I don't think this check makes much sense anywhere.

I think a better solution would be to prevent it happening _at source_ if
you can find a logical way of doing so.

I do plan to do some cleanup passes over this stuff once again so maybe I
can figure something out that better handles non-merge scenarios.

This is a great find though overall even if a patch doesn't make sense
Yajun, thanks for this, it's really made me think about this case (+ others
like it) :)

2024-02-22 07:47:27

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()


On 2024/2/22 04:41, Lorenzo Stoakes wrote:
> On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
>> * Yajun Deng <[email protected]> [240221 04:15]:
>>> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
>>> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
>>> This will lead to the end is less than the end of prev.
>>>
>>> In this case, the curr will be NULL, but the next will be equal to the
>>> prev. So it will attempt to merge before, the vm_pgoff check will cause
>>> this case to fail.
>>>
>>> To avoid the process described above and reduce unnecessary operations.
>>> Add a check to immediately return NULL if the end is less than the end of
>>> prev.
>> If it's only one caller, could we stop that caller instead of checking
>> an almost never case for all callers? Would this better fit in
>> vma_modify()? Although that's not just for this caller at this point.
>> Maybe there isn't a good place?
> I definitely agree with Liam that this should not be in vma_merge(), as
> it's not going to be relevant to _most_ callers.
>
> I am not sure vma_modify() is much better, this would be the only early
> exit check in that function and makes what is very simple and
> straightforward now more confusing.


There are two paths that will cause this case. One is in
mprotect_fixup(), the other is in

madvise_update_vma().


One way is to separate out the split_vma() from vma_modify(). And create
a new helper function.

We can call it directly at source, but we need to do this in both
paths.  It's more complicated.


The other way is to add a check in vma_modify(). Like the following:

diff --git a/mm/mmap.c b/mm/mmap.c
index 0fccd23f056e..f93f1d3939f2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct
vma_iterator *vmi,
        pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >>
PAGE_SHIFT);
        struct vm_area_struct *merged;

+       if (prev && end < prev->vm_end)
+               goto cannot_merge;
+
        merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
                           pgoff, policy, uffd_ctx, anon_name);
        if (merged)
                return merged;

+cannot_merge:
        if (vma->vm_start < start) {
                int err = split_vma(vmi, vma, start, 1);


> And I think this is the crux of it - it's confusing that we special case
> this one particular non-merge scenario, but no others (all of which we then
> deem ok to be caught by the usual rules).
>
> I think it's simpler (and more efficient) to just keep things the way they
> are.
>
>> Or are there other reasons this may happen and is better done in this
>> function?
>>
>> Often, this is called the "punch a hole" scenario; where an operation
>> creates two entries from the old data and either leaves an empty space
>> or fills the space with a new VMA.
>>
>>> Signed-off-by: Yajun Deng <[email protected]>
>>> ---
>>> v2: remove the case label.
>>> v1: https://lore.kernel.org/all/[email protected]/
>>> ---
>>> mm/mmap.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 0fccd23f056e..7668854d2246 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -890,6 +890,9 @@ static struct vm_area_struct
>>> if (vm_flags & VM_SPECIAL)
>>> return NULL;
>>>
>>> + if (prev && end < prev->vm_end)
>>> + return NULL;
>>> +
>>> /* Does the input range span an existing VMA? (cases 5 - 8) */
>>> curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>>>
>>> --
>>> 2.25.1
>>>
> So overall I don't think this check makes much sense anywhere.
>
> I think a better solution would be to prevent it happening _at source_ if
> you can find a logical way of doing so.
>
> I do plan to do some cleanup passes over this stuff once again so maybe I
> can figure something out that better handles non-merge scenarios.
>
> This is a great find though overall even if a patch doesn't make sense
> Yajun, thanks for this, it's really made me think about this case (+ others
> like it) :)

2024-02-22 08:35:11

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()

On Thu, Feb 22, 2024 at 03:47:04PM +0800, Yajun Deng wrote:
>
> On 2024/2/22 04:41, Lorenzo Stoakes wrote:
> > On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
> > > * Yajun Deng <[email protected]> [240221 04:15]:
> > > > In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> > > > the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> > > > This will lead to the end is less than the end of prev.
> > > >
> > > > In this case, the curr will be NULL, but the next will be equal to the
> > > > prev. So it will attempt to merge before, the vm_pgoff check will cause
> > > > this case to fail.
> > > >
> > > > To avoid the process described above and reduce unnecessary operations.
> > > > Add a check to immediately return NULL if the end is less than the end of
> > > > prev.
> > > If it's only one caller, could we stop that caller instead of checking
> > > an almost never case for all callers? Would this better fit in
> > > vma_modify()? Although that's not just for this caller at this point.
> > > Maybe there isn't a good place?
> > I definitely agree with Liam that this should not be in vma_merge(), as
> > it's not going to be relevant to _most_ callers.
> >
> > I am not sure vma_modify() is much better, this would be the only early
> > exit check in that function and makes what is very simple and
> > straightforward now more confusing.
>
>
> There are two paths that will cause this case. One is in mprotect_fixup(),
> the other is in
>
> madvise_update_vma().
>
>
> One way is to separate out the split_vma() from vma_modify(). And create a
> new helper function.

Absolutely not. I wrote the vma_modify() patch series explicitly to expose
_less_ not more.

>
> We can call it directly at source, but we need to do this in both paths.
> It's more complicated.
>
>
> The other way is to add a check in vma_modify(). Like the following:

As I said above, I really don't think this is a good idea, you're just
special casing one non-merge case but not any others + adding an
unnecessary branch.

>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0fccd23f056e..f93f1d3939f2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct
> vma_iterator *vmi,
> ??????? pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >>
> PAGE_SHIFT);
> ??????? struct vm_area_struct *merged;
>
> +?????? if (prev && end < prev->vm_end)
> +?????????????? goto cannot_merge;
> +
> ??????? merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
> ?????????????????????????? pgoff, policy, uffd_ctx, anon_name);
> ??????? if (merged)
> ??????????????? return merged;
>
> +cannot_merge:
> ??????? if (vma->vm_start < start) {
> ??????????????? int err = split_vma(vmi, vma, start, 1);
>
>
> > And I think this is the crux of it - it's confusing that we special case
> > this one particular non-merge scenario, but no others (all of which we then
> > deem ok to be caught by the usual rules).
> >
> > I think it's simpler (and more efficient) to just keep things the way they
> > are.
> >
> > > Or are there other reasons this may happen and is better done in this
> > > function?
> > >
> > > Often, this is called the "punch a hole" scenario; where an operation
> > > creates two entries from the old data and either leaves an empty space
> > > or fills the space with a new VMA.
> > >
> > > > Signed-off-by: Yajun Deng <[email protected]>
> > > > ---
> > > > v2: remove the case label.
> > > > v1: https://lore.kernel.org/all/[email protected]/
> > > > ---
> > > > mm/mmap.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 0fccd23f056e..7668854d2246 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -890,6 +890,9 @@ static struct vm_area_struct
> > > > if (vm_flags & VM_SPECIAL)
> > > > return NULL;
> > > >
> > > > + if (prev && end < prev->vm_end)
> > > > + return NULL;
> > > > +
> > > > /* Does the input range span an existing VMA? (cases 5 - 8) */
> > > > curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > So overall I don't think this check makes much sense anywhere.
> >
> > I think a better solution would be to prevent it happening _at source_ if
> > you can find a logical way of doing so.
> >
> > I do plan to do some cleanup passes over this stuff once again so maybe I
> > can figure something out that better handles non-merge scenarios.
> >
> > This is a great find though overall even if a patch doesn't make sense
> > Yajun, thanks for this, it's really made me think about this case (+ others
> > like it) :)

I guess maybe again I've not been clear enough on this, so unless
compelling reasoning can otherwise be provided, I feel this check should
not be added _anywhere_.

Therefore, NACK.

2024-02-22 08:38:06

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()


On 2024/2/22 16:31, Lorenzo Stoakes wrote:
> On Thu, Feb 22, 2024 at 03:47:04PM +0800, Yajun Deng wrote:
>> On 2024/2/22 04:41, Lorenzo Stoakes wrote:
>>> On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
>>>> * Yajun Deng <[email protected]> [240221 04:15]:
>>>>> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
>>>>> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
>>>>> This will lead to the end is less than the end of prev.
>>>>>
>>>>> In this case, the curr will be NULL, but the next will be equal to the
>>>>> prev. So it will attempt to merge before, the vm_pgoff check will cause
>>>>> this case to fail.
>>>>>
>>>>> To avoid the process described above and reduce unnecessary operations.
>>>>> Add a check to immediately return NULL if the end is less than the end of
>>>>> prev.
>>>> If it's only one caller, could we stop that caller instead of checking
>>>> an almost never case for all callers? Would this better fit in
>>>> vma_modify()? Although that's not just for this caller at this point.
>>>> Maybe there isn't a good place?
>>> I definitely agree with Liam that this should not be in vma_merge(), as
>>> it's not going to be relevant to _most_ callers.
>>>
>>> I am not sure vma_modify() is much better, this would be the only early
>>> exit check in that function and makes what is very simple and
>>> straightforward now more confusing.
>>
>> There are two paths that will cause this case. One is in mprotect_fixup(),
>> the other is in
>>
>> madvise_update_vma().
>>
>>
>> One way is to separate out the split_vma() from vma_modify(). And create a
>> new helper function.
> Absolutely not. I wrote the vma_modify() patch series explicitly to expose
> _less_ not more.
>
>> We can call it directly at source, but we need to do this in both paths.
>> It's more complicated.
>>
>>
>> The other way is to add a check in vma_modify(). Like the following:
> As I said above, I really don't think this is a good idea, you're just
> special casing one non-merge case but not any others + adding an
> unnecessary branch.
>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 0fccd23f056e..f93f1d3939f2 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct
>> vma_iterator *vmi,
>>         pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >>
>> PAGE_SHIFT);
>>         struct vm_area_struct *merged;
>>
>> +       if (prev && end < prev->vm_end)
>> +               goto cannot_merge;
>> +
>>         merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
>>                            pgoff, policy, uffd_ctx, anon_name);
>>         if (merged)
>>                 return merged;
>>
>> +cannot_merge:
>>         if (vma->vm_start < start) {
>>                 int err = split_vma(vmi, vma, start, 1);
>>
>>
>>> And I think this is the crux of it - it's confusing that we special case
>>> this one particular non-merge scenario, but no others (all of which we then
>>> deem ok to be caught by the usual rules).
>>>
>>> I think it's simpler (and more efficient) to just keep things the way they
>>> are.
>>>
>>>> Or are there other reasons this may happen and is better done in this
>>>> function?
>>>>
>>>> Often, this is called the "punch a hole" scenario; where an operation
>>>> creates two entries from the old data and either leaves an empty space
>>>> or fills the space with a new VMA.
>>>>
>>>>> Signed-off-by: Yajun Deng <[email protected]>
>>>>> ---
>>>>> v2: remove the case label.
>>>>> v1: https://lore.kernel.org/all/[email protected]/
>>>>> ---
>>>>> mm/mmap.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>> index 0fccd23f056e..7668854d2246 100644
>>>>> --- a/mm/mmap.c
>>>>> +++ b/mm/mmap.c
>>>>> @@ -890,6 +890,9 @@ static struct vm_area_struct
>>>>> if (vm_flags & VM_SPECIAL)
>>>>> return NULL;
>>>>>
>>>>> + if (prev && end < prev->vm_end)
>>>>> + return NULL;
>>>>> +
>>>>> /* Does the input range span an existing VMA? (cases 5 - 8) */
>>>>> curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>>>>>
>>>>> --
>>>>> 2.25.1
>>>>>
>>> So overall I don't think this check makes much sense anywhere.
>>>
>>> I think a better solution would be to prevent it happening _at source_ if
>>> you can find a logical way of doing so.
>>>
>>> I do plan to do some cleanup passes over this stuff once again so maybe I
>>> can figure something out that better handles non-merge scenarios.
>>>
>>> This is a great find though overall even if a patch doesn't make sense
>>> Yajun, thanks for this, it's really made me think about this case (+ others
>>> like it) :)
> I guess maybe again I've not been clear enough on this, so unless
> compelling reasoning can otherwise be provided, I feel this check should
> not be added _anywhere_.


Okay, I got it. Thank you!

> Therefore, NACK.

2024-02-22 08:44:56

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2] mm/mmap: return early if it can't merge in vma_merge()

On Thu, Feb 22, 2024 at 04:37:44PM +0800, Yajun Deng wrote:
>
[snip]>
>
> Okay, I got it. Thank you!
>

Please do keep contributing though, you have really raised something
interesting here and I know you have a maple tree patch going also, it's
all very much appreciated :)

Thanks, Lorenzo