LinuxLists.cc - [RFC] mm: change find

2015-12-14 11:04:18

Subject: [RFC] mm: change find_vma() function

change find_vma() to break ealier when found the adderss
is not in any vma, don't need loop to search all vma.

Signed-off-by: yalin wang <[email protected]>
---
mm/mmap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index b513f20..8294c9b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
vma = tmp;
if (tmp->vm_start <= addr)
break;
+ if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
+ break;
+
rb_node = rb_node->rb_left;
} else
rb_node = rb_node->rb_right;
--
1.9.1

2015-12-14 12:11:12

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
> change find_vma() to break ealier when found the adderss
> is not in any vma, don't need loop to search all vma.
>
> Signed-off-by: yalin wang <[email protected]>
> ---
> mm/mmap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b513f20..8294c9b 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> vma = tmp;
> if (tmp->vm_start <= addr)
> break;
> + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
> + break;
> +

This 'break' would return 'tmp' as found vma.

Have you even tried to test the code?

> rb_node = rb_node->rb_left;
> } else
> rb_node = rb_node->rb_right;
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2015-12-14 17:54:50

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

On 12/14, Kirill A. Shutemov wrote:
>
> On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
> > change find_vma() to break ealier when found the adderss
> > is not in any vma, don't need loop to search all vma.
> >
> > Signed-off-by: yalin wang <[email protected]>
> > ---
> > mm/mmap.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index b513f20..8294c9b 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> > vma = tmp;
> > if (tmp->vm_start <= addr)
> > break;
> > + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
> > + break;
> > +
>
> This 'break' would return 'tmp' as found vma.

But this would be right?

Not that I think this optimization makes sense, I simply do not know,
but to me this change looks technically correct at first glance...

But the changelog is wrong or I missed something. This change can stop
the main loop earlier; if "tmp" is the first vma, or if the previous one
is below the address. Or perhaps I just misread that "not in any vma"
note in the changelog.

No?

Oleg.

2015-12-14 21:11:41

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

On Mon, Dec 14, 2015 at 06:55:09PM +0100, Oleg Nesterov wrote:
> On 12/14, Kirill A. Shutemov wrote:
> >
> > On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
> > > change find_vma() to break ealier when found the adderss
> > > is not in any vma, don't need loop to search all vma.
> > >
> > > Signed-off-by: yalin wang <[email protected]>
> > > ---
> > > mm/mmap.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index b513f20..8294c9b 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> > > vma = tmp;
> > > if (tmp->vm_start <= addr)
> > > break;
> > > + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
> > > + break;
> > > +
> >
> > This 'break' would return 'tmp' as found vma.
>
> But this would be right?

Hm. Right. Sorry for my tone.

I think the right condition is 'tmp->vm_prev->vm_end < addr', not '<=' as
vm_end is the first byte after the vma. But it's equivalent in practice
here.

Anyway, I don't think it's possible to gain anything measurable from this
optimization.

>
> Not that I think this optimization makes sense, I simply do not know,
> but to me this change looks technically correct at first glance...
>
> But the changelog is wrong or I missed something. This change can stop
> the main loop earlier; if "tmp" is the first vma,

For the first vma, we don't get anything comparing to what we have now:
check for !rb_node on the next iteration would have the same trade off and
effect as the proposed check.

> or if the previous one is below the address.

Yes, but would it compensate additional check on each 'tmp->vm_end > addr'
iteration to the point? That's not obvious.

> Or perhaps I just misread that "not in any vma" note in the changelog.
>
> No?
>
> Oleg.
>

--
Kirill A. Shutemov

2015-12-15 06:41:36

by yalin wang

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

> On Dec 15, 2015, at 05:11, Kirill A. Shutemov <[email protected]> wrote:
>
> On Mon, Dec 14, 2015 at 06:55:09PM +0100, Oleg Nesterov wrote:
>> On 12/14, Kirill A. Shutemov wrote:
>>>
>>> On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
>>>> change find_vma() to break ealier when found the adderss
>>>> is not in any vma, don't need loop to search all vma.
>>>>
>>>> Signed-off-by: yalin wang <[email protected]>
>>>> ---
>>>> mm/mmap.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>> index b513f20..8294c9b 100644
>>>> --- a/mm/mmap.c
>>>> +++ b/mm/mmap.c
>>>> @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>>>> vma = tmp;
>>>> if (tmp->vm_start <= addr)
>>>> break;
>>>> + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
>>>> + break;
>>>> +
>>>
>>> This 'break' would return 'tmp' as found vma.
>>
>> But this would be right?
>
> Hm. Right. Sorry for my tone.
>
> I think the right condition is 'tmp->vm_prev->vm_end < addr', not '<=' as
> vm_end is the first byte after the vma. But it's equivalent in practice
> here.
>
this should be <= here,
because vma’s effect address space doesn’t include vm_end add,
so if an address vm_end <= add , this means this addr don’t belong to this vma,

> Anyway, I don't think it's possible to gain anything measurable from this
> optimization.
>
the advantage is that if addr don’t belong to any vma, we don’t need loop all vma,
we can break earlier if we found the most closest vma which vma->end_add > addr,
>>
>> Not that I think this optimization makes sense, I simply do not know,
>> but to me this change looks technically correct at first glance...
>>
>> But the changelog is wrong or I missed something. This change can stop
>> the main loop earlier; if "tmp" is the first vma,
>
> For the first vma, we don't get anything comparing to what we have now:
> check for !rb_node on the next iteration would have the same trade off and
> effect as the proposed check.
Yes
>
>> or if the previous one is below the address.
>
> Yes, but would it compensate additional check on each 'tmp->vm_end > addr'
> iteration to the point? That's not obvious.
>
>> Or perhaps I just misread that "not in any vma" note in the changelog.
>>
>> No?
>>
>> Oleg.
>>

i have test it, it works fine. :)
Thanks

2015-12-15 11:47:19

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

On Tue, Dec 15, 2015 at 02:41:21PM +0800, yalin wang wrote:
>
> > On Dec 15, 2015, at 05:11, Kirill A. Shutemov <[email protected]> wrote:
> >
> > On Mon, Dec 14, 2015 at 06:55:09PM +0100, Oleg Nesterov wrote:
> >> On 12/14, Kirill A. Shutemov wrote:
> >>>
> >>> On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
> >>>> change find_vma() to break ealier when found the adderss
> >>>> is not in any vma, don't need loop to search all vma.
> >>>>
> >>>> Signed-off-by: yalin wang <[email protected]>
> >>>> ---
> >>>> mm/mmap.c | 3 +++
> >>>> 1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>>> index b513f20..8294c9b 100644
> >>>> --- a/mm/mmap.c
> >>>> +++ b/mm/mmap.c
> >>>> @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> >>>> vma = tmp;
> >>>> if (tmp->vm_start <= addr)
> >>>> break;
> >>>> + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
> >>>> + break;
> >>>> +
> >>>
> >>> This 'break' would return 'tmp' as found vma.
> >>
> >> But this would be right?
> >
> > Hm. Right. Sorry for my tone.
> >
> > I think the right condition is 'tmp->vm_prev->vm_end < addr', not '<=' as
> > vm_end is the first byte after the vma. But it's equivalent in practice
> > here.
> >
> this should be <= here,
> because vma’s effect address space doesn’t include vm_end add,
> so if an address vm_end <= add , this means this addr don’t belong to this vma,

We need to return the vma with lowest vm_end that satisifes "addr <
vm_end", so if vm_prev has "addr >= vm_end" would imply we can return
"tmp", that includes if addr == vm_prev->vm_end yes. If we'd spend CPU
for this it's worth to optimize for the case of tmp->vm_start ==
tmp->vm_prev->vm_end too and stop in such case too.

>
> > Anyway, I don't think it's possible to gain anything measurable from this
> > optimization.
> >
> the advantage is that if addr don’t belong to any vma, we don’t need loop all vma,
> we can break earlier if we found the most closest vma which vma->end_add > addr,

But that costs CPU for all cases were we cannot stop: it would waste
cachelines to check vmas that we would otherwise not even touch at
all. All those vmas are in different addresses, they're not
contiguous, plus the vma is pretty large object anyway (even if they
were contiguous).

So this will requires 2 cachelines instead of 1, for each vma we
encounter during the rbtree lookup. And for a large tree we may not
have to rescan those vmas of the vm_prev while moving down, so even if
the CPU cache could fit those extra vm_prev cachelines they would be
just useless as we continue the lookup down the tree.

So if a tree is very large and this optimization only allows us to
skip the last few level of tree walk, it'll still double up the
cachline cost of all the upper level lookups. Which may greatly exceed
the last few levels we skept.

Proper (probably non trivial) math could calculate the exact
probability this ends up being an optimization vs a slowdown. My gut
feeling is that for a small tree this increases performance, for a
large tree this decreases performance and the breakpoint is somewhere
in the middle. However a small tree runs fast anyway.

Last but not the least, one easy thing we can trivially tell already,
is that the worst case latency for the worst possible slowest lookup
would definitely be worsened and it would almost double with this
patch applied. So this is enough for me to be against applying this
patch.

All special cases we already hard optimize it with vmacache_find so if
that is about a special case with userland knowledge that should go
elsewhere and we already do those kind of fast path optimizations.

The patched code is the one that runs if we're not in a special
userland fast path case and to me the highest priority for this piece
of code is to optimize to guarantee the least harmful worst case for
the most inefficient lookup, and this patch would make the worst case
twice as slow as it is now for the worst case.

I think keeping the worst case as fast as possible is the highest
priority here.

Thanks,
Andrea

2015-12-15 11:53:49

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

On Tue, Dec 15, 2015 at 02:41:21PM +0800, yalin wang wrote:
> > On Dec 15, 2015, at 05:11, Kirill A. Shutemov <[email protected]> wrote:
> > Anyway, I don't think it's possible to gain anything measurable from this
> > optimization.
> >
> the advantage is that if addr don’t belong to any vma, we don’t need loop all vma,
> we can break earlier if we found the most closest vma which vma->end_add > addr,

Do you have any workload which can demonstrate the advantage?

--
Kirill A. Shutemov

2015-12-22 05:31:33

by yalin wang

[permalink] [raw]

Subject: Re: [RFC] mm: change find_vma() function

> On Dec 15, 2015, at 19:53, Kirill A. Shutemov <[email protected]> wrote:
>
> On Tue, Dec 15, 2015 at 02:41:21PM +0800, yalin wang wrote:
>>> On Dec 15, 2015, at 05:11, Kirill A. Shutemov <[email protected]> wrote:
>>> Anyway, I don't think it's possible to gain anything measurable from this
>>> optimization.
>>>
>> the advantage is that if addr don’t belong to any vma, we don’t need loop all vma,
>> we can break earlier if we found the most closest vma which vma->end_add > addr,
>
> Do you have any workload which can demonstrate the advantage?
>
> —
i add the log in find_vma() to see the call stack ,
it is very efficient in mmap() / munmap / do_execve() / get_unmaped_area() /
mem_cgroup_move_task()->walk_page_range()->find_vma() call ,

in most time the loop will break after search about 7 vm,
i don’t consider the cache pollution problem in this patch,
yeah, this patch will check the vm_prev->vm_end for every loop,
but this only happened when tmp->vm_end > addr ,
if you don’t not check this , you will continue to loop to check next rb ,
this will also pollute the cache ,

so the question is which one is better ?
i don’t have a better method to test this .
Any good ideas about this ?
how to test it ?

Thanks