2009-06-05 14:36:15

by Minchan Kim

[permalink] [raw]
Subject: [RFC] remove page_table_lock in anon_vma_prepare

As I looked over the page_table_lock, it related to page table not anon_vma

I think anon_vma->lock can protect race against threads.
Do I miss something ?

If I am right, we can remove unnecessary page_table_lock holding
in anon_vma_prepare. We can get performance benefit.

Signed-off-by: Minchan Kim <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Nick Piggin <[email protected]>
---
mm/rmap.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b5c6e12..65b4877 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -113,14 +113,11 @@ int anon_vma_prepare(struct vm_area_struct *vma)
}
spin_lock(&anon_vma->lock);

- /* page_table_lock to protect against threads */
- spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
vma->anon_vma = anon_vma;
list_add_tail(&vma->anon_vma_node, &anon_vma->head);
allocated = NULL;
}
- spin_unlock(&mm->page_table_lock);

spin_unlock(&anon_vma->lock);
if (unlikely(allocated))
--
1.5.6.5


2009-06-05 18:42:58

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC] remove page_table_lock in anon_vma_prepare

On Fri, 5 Jun 2009, Minchan Kim wrote:

> As I looked over the page_table_lock, it related to page table not anon_vma
>
> I think anon_vma->lock can protect race against threads.
> Do I miss something ?
>
> If I am right, we can remove unnecessary page_table_lock holding
> in anon_vma_prepare. We can get performance benefit.
>
> Signed-off-by: Minchan Kim <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Nick Piggin <[email protected]>

No, NAK to this one. Look above the context shown in the patch:

anon_vma = find_mergeable_anon_vma(vma);
allocated = NULL;
if (!anon_vma) {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
return -ENOMEM;
allocated = anon_vma;
}
spin_lock(&anon_vma->lock);

So if find_mergeable_anon_vma failed to find a suitable neighbouring
vma to share with, we'll have got the anon_vma from anon_vma_alloc().

Two threads could perfectly well do that concurrently (mmap_sem is
held only for reading), each allocating a separate fresh anon_vma,
then they'd each do spin_lock(&anon_vma->lock), but on _different_
anon_vmas, so wouldn't exclude each other at all: we need a common
lock to exclude that race, and abuse page_table_lock for the purpose.

(As I expect you've noticed, we used not to bother with the spin_lock
on anon_vma->lock when we'd freshly allocated the anon_vma, it looks
as if it's unnecessary. But in fact Nick and Linus found there's a
subtle reason why it is necessary even then - hopefully the git log
explains it, or I could look up the mails if you want, but at this
moment the details escape me.

And do we need the page_table_lock even when find_mergeable_anon_vma
succeeds? That also looks as if it's unnecessary, but I've the ghost
of a memory that it's needed even for that case: I seem to remember
that there can be a benign race where find_mergeable_anon_vma called
by concurrent threads could actually return different anon_vmas.
That also is something I don't want to think too deeply into at
this instant, but beg me if you wish!)

Hugh

> ---
> mm/rmap.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b5c6e12..65b4877 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -113,14 +113,11 @@ int anon_vma_prepare(struct vm_area_struct *vma)
> }
> spin_lock(&anon_vma->lock);
>
> - /* page_table_lock to protect against threads */
> - spin_lock(&mm->page_table_lock);
> if (likely(!vma->anon_vma)) {
> vma->anon_vma = anon_vma;
> list_add_tail(&vma->anon_vma_node, &anon_vma->head);
> allocated = NULL;
> }
> - spin_unlock(&mm->page_table_lock);
>
> spin_unlock(&anon_vma->lock);
> if (unlikely(allocated))
> --
> 1.5.6.5

2009-06-07 15:16:24

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] remove page_table_lock in anon_vma_prepare

Hi, Hugh.

On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<[email protected]> wrote:
> On Fri, 5 Jun 2009, Minchan Kim wrote:
>
>> As I looked over the page_table_lock, it related to page table not anon_vma
>>
>> I think anon_vma->lock can protect race against threads.
>> Do I miss something ?
>>
>> If I am right, we can remove unnecessary page_table_lock holding
>> in anon_vma_prepare. We can get performance benefit.
>>
>> Signed-off-by: Minchan Kim <[email protected]>
>> Cc: Hugh Dickins <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Nick Piggin <[email protected]>
>
> No, NAK to this one.  Look above the context shown in the patch:
>
>                anon_vma = find_mergeable_anon_vma(vma);
>                allocated = NULL;
>                if (!anon_vma) {
>                        anon_vma = anon_vma_alloc();
>                        if (unlikely(!anon_vma))
>                                return -ENOMEM;
>                        allocated = anon_vma;
>                }
>                spin_lock(&anon_vma->lock);
>
> So if find_mergeable_anon_vma failed to find a suitable neighbouring
> vma to share with, we'll have got the anon_vma from anon_vma_alloc().
>
> Two threads could perfectly well do that concurrently (mmap_sem is
> held only for reading), each allocating a separate fresh anon_vma,
> then they'd each do spin_lock(&anon_vma->lock), but on _different_
> anon_vmas, so wouldn't exclude each other at all: we need a common
> lock to exclude that race, and abuse page_table_lock for the purpose.

Indeed!
I have missed it until now.
In fact, I expected whoever expert like you point me out.


> (As I expect you've noticed, we used not to bother with the spin_lock
> on anon_vma->lock when we'd freshly allocated the anon_vma, it looks
> as if it's unnecessary.  But in fact Nick and Linus found there's a
> subtle reason why it is necessary even then - hopefully the git log
> explains it, or I could look up the mails if you want, but at this
> moment the details escape me.

Hmm. I didn't follow up that at that time.

After you noticed me, I found that.
commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6
Author: Linus Torvalds <[email protected]>
Date: Sun Oct 19 10:32:20 2008 -0700

anon_vma_prepare: properly lock even newly allocated entries

It's subtle race so I can't digest it fully but I can understand that
following as.

If we don't hold lock at fresh anon_vma, it can be removed and
reallocated by other threads since other cpu's can find it, free,
reallocate before first thread which call anon_vma_prepare adds
anon_vma to list after vma->anon_vma = anon_vma

I hope my above explanation is right :)

> And do we need the page_table_lock even when find_mergeable_anon_vma
> succeeds?  That also looks as if it's unnecessary, but I've the ghost
> of a memory that it's needed even for that case: I seem to remember
> that there can be a benign race where find_mergeable_anon_vma called
> by concurrent threads could actually return different anon_vmas.
> That also is something I don't want to think too deeply into at
> this instant, but beg me if you wish!)

Unfortunately I can't found this issue mail or changelog.
Hugh. Could you explain this issue more detail in your convenient time ?
I don't mind you ignore me. I don't want you to be busy from me. :)

I always thanks for your kind explanation and learns lots of thing from you. :)
Thanks again.

--
Kinds regards,
Minchan Kim

2009-06-07 16:28:41

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC] remove page_table_lock in anon_vma_prepare

On Mon, 8 Jun 2009, Minchan Kim wrote:
> On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<[email protected]> wrote:
> > On Fri, 5 Jun 2009, Minchan Kim wrote:
>
> > (As I expect you've noticed, we used not to bother with the spin_lock
> > on anon_vma->lock when we'd freshly allocated the anon_vma, it looks
> > as if it's unnecessary.  But in fact Nick and Linus found there's a
> > subtle reason why it is necessary even then - hopefully the git log

Actually, Linus put a lot of his git comment into the comment above
anon_vma_prepare(); but it doesn't pin down the case Nick identified
as well as Nick's original mail.

> > explains it, or I could look up the mails if you want, but at this
> > moment the details escape me.
>
> Hmm. I didn't follow up that at that time.
>
> After you noticed me, I found that.
> commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6
> Author: Linus Torvalds <[email protected]>
> Date: Sun Oct 19 10:32:20 2008 -0700
>
> anon_vma_prepare: properly lock even newly allocated entries
>
> It's subtle race so I can't digest it fully but I can understand that
> following as.
>
> If we don't hold lock at fresh anon_vma, it can be removed and
> reallocated by other threads since other cpu's can find it, free,
> reallocate before first thread which call anon_vma_prepare adds
> anon_vma to list after vma->anon_vma = anon_vma
>
> I hope my above explanation is right :)

Not really: I don't think there was a risk of it getting freed at
that point, but there was a risk of its list head getting dereferenced
before we'd initialized it.

Here's a link to Nick's 16oct08 linux-mm mail on the subject, then you
can follow the thread from there. In brief, IIRC, Nick found a race
which he proposed to fix with barriers, but in the end we were all
much happier just taking the anon_vma lock in all cases.

http://marc.info/?l=linux-mm&m=122413030612659&w=2

>
> > And do we need the page_table_lock even when find_mergeable_anon_vma
> > succeeds?  That also looks as if it's unnecessary, but I've the ghost
> > of a memory that it's needed even for that case: I seem to remember
> > that there can be a benign race where find_mergeable_anon_vma called
> > by concurrent threads could actually return different anon_vmas.
> > That also is something I don't want to think too deeply into at
> > this instant, but beg me if you wish!)
>
> Unfortunately I can't found this issue mail or changelog.
> Hugh. Could you explain this issue more detail in your convenient time ?

Sure, I remembered it once I went to bed that night, it's an easy one;
wasn't ever discussed on list, just something I'd been aware of.

Remember that anon_vma_prepare() gets called at fault time, when we
have only down_read of mmap_sem, so there may well be concurrent faults.

find_mergeable_anon_vma looks at the vma on either side of our faulting
vma, to see if the neighbouring vma already has an anon_vma, which we'd
be wise to use if that vma could plausibly be merged with our vma later
e.g. mprotect may have temporarily split ours from the next, but another
mprotect may make them mergeable - it would be a pity to be prevented
from merging them just because we'd already attached distinct anon_vmas.

But, as I said, there may well be concurrent faults, on ours and on
neighbouring vmas: so one call to find_mergeable_anon_vma on our vma
may find that the next vma has no anon_vma yet, but the prev has one,
so it returns the prev's anon_vma; but a racing fault on the next
vma immediately gives it an anon_vma, and a racing fault on our vma
finds that, so its find_mergeable_anon_vma returns the next's anon_vma.

So the two faults on our vma could both be in anon_vma_prepare(),
doing the spin_lock(&anon_vma->lock) on find_mergeable_anon_vma's
anon_vma, but those could still be different anon_vmas: but if
both lock the page_table_lock, we can be sure to catch that case.

When I said the race was benign, I meant that it doesn't matter in
such a case which one we choose; but we don't want to choose both!

Hugh

2009-06-07 23:50:19

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] remove page_table_lock in anon_vma_prepare

On Mon, Jun 8, 2009 at 1:28 AM, Hugh Dickins<[email protected]> wrote:
> On Mon, 8 Jun 2009, Minchan Kim wrote:
>> On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<[email protected]> wrote:
>> > On Fri, 5 Jun 2009, Minchan Kim wrote:
>>
>> > (As I expect you've noticed, we used not to bother with the spin_lock
>> > on anon_vma->lock when we'd freshly allocated the anon_vma, it looks
>> > as if it's unnecessary.  But in fact Nick and Linus found there's a
>> > subtle reason why it is necessary even then - hopefully the git log
>
> Actually, Linus put a lot of his git comment into the comment above
> anon_vma_prepare(); but it doesn't pin down the case Nick identified
> as well as Nick's original mail.
>
>> > explains it, or I could look up the mails if you want, but at this
>> > moment the details escape me.
>>
>> Hmm. I didn't follow up that at that time.
>>
>> After you noticed me, I found that.
>> commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6
>> Author: Linus Torvalds <[email protected]>
>> Date:   Sun Oct 19 10:32:20 2008 -0700
>>
>>     anon_vma_prepare: properly lock even newly allocated entries
>>
>> It's subtle race so I can't digest it fully but I can understand that
>> following as.
>>
>> If we don't hold lock at fresh anon_vma, it can be removed and
>> reallocated by other threads since other cpu's can find it, free,
>> reallocate before first thread which call anon_vma_prepare adds
>> anon_vma to list after vma->anon_vma = anon_vma
>>
>> I hope my above explanation is right :)
>
> Not really: I don't think there was a risk of it getting freed at
> that point, but there was a risk of its list head getting dereferenced
> before we'd initialized it.
>
> Here's a link to Nick's 16oct08 linux-mm mail on the subject, then you
> can follow the thread from there.  In brief, IIRC, Nick found a race
> which he proposed to fix with barriers, but in the end we were all
> much happier just taking the anon_vma lock in all cases.
>
> http://marc.info/?l=linux-mm&m=122413030612659&w=2

Huge long.
Thanks for searching it for me.
I will read the thread and digest it. ;-)

>>
>> > And do we need the page_table_lock even when find_mergeable_anon_vma
>> > succeeds?  That also looks as if it's unnecessary, but I've the ghost
>> > of a memory that it's needed even for that case: I seem to remember
>> > that there can be a benign race where find_mergeable_anon_vma called
>> > by concurrent threads could actually return different anon_vmas.
>> > That also is something I don't want to think too deeply into at
>> > this instant, but beg me if you wish!)
>>
>> Unfortunately I can't found this issue mail or changelog.
>> Hugh. Could you explain this issue more detail in your convenient time ?
>
> Sure, I remembered it once I went to bed that night, it's an easy one;
> wasn't ever discussed on list, just something I'd been aware of.
>
> Remember that anon_vma_prepare() gets called at fault time, when we
> have only down_read of mmap_sem, so there may well be concurrent faults.
>
> find_mergeable_anon_vma looks at the vma on either side of our faulting
> vma, to see if the neighbouring vma already has an anon_vma, which we'd
> be wise to use if that vma could plausibly be merged with our vma later
> e.g. mprotect may have temporarily split ours from the next, but another
> mprotect may make them mergeable - it would be a pity to be prevented
> from merging them just because we'd already attached distinct anon_vmas.

Absolutely.

> But, as I said, there may well be concurrent faults, on ours and on
> neighbouring vmas: so one call to find_mergeable_anon_vma on our vma
> may find that the next vma has no anon_vma yet, but the prev has one,
> so it returns the prev's anon_vma; but a racing fault on the next
> vma immediately gives it an anon_vma, and a racing fault on our vma
> finds that, so its find_mergeable_anon_vma returns the next's anon_vma.
>
> So the two faults on our vma could both be in anon_vma_prepare(),
> doing the spin_lock(&anon_vma->lock) on find_mergeable_anon_vma's
> anon_vma, but those could still be different anon_vmas: but if
> both lock the page_table_lock, we can be sure to catch that case.

I can understand it completely.
Thanks for quick replay and good explanation.

I expect this thread can help other some day. :)

>
> When I said the race was benign, I meant that it doesn't matter in
> such a case which one we choose; but we don't want to choose both!
>
> Hugh



--
Kinds regards,
Minchan Kim