2013-03-31 23:46:47

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] THP: Use explicit memory barrier

__do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
spinlock for making sure that clear_huge_page write become visible
after set set_pmd_at() write.

But lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user may see inconsistent data.

This patch fixes it with using explict barrier rather than depending
on lru spinlock.

Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/huge_memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bfa142e..fad800e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
pmd_t entry;
entry = mk_huge_pmd(page, vma);
/*
- * The spinlocking to take the lru_lock inside
- * page_add_new_anon_rmap() acts as a full memory
- * barrier to be sure clear_huge_page writes become
- * visible after the set_pmd_at() write.
+ * clear_huge_page write become visible after the
+ * set_pmd_at() write.
*/
+ smp_wmb();
page_add_new_anon_rmap(page, vma, haddr);
set_pmd_at(mm, haddr, pmd, entry);
pgtable_trans_huge_deposit(mm, pgtable);
--
1.8.2


2013-04-01 01:02:23

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

(2013/04/01 8:45), Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
>
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
>

Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
Users can see non-zero value after page fault in theory ?

Thanks,
-Kame

2013-04-01 04:45:19

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

Hey Kame,

On Mon, Apr 01, 2013 at 10:01:49AM +0900, Kamezawa Hiroyuki wrote:
> (2013/04/01 8:45), Minchan Kim wrote:
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> >
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> >
>
> Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> Users can see non-zero value after page fault in theory ?

Maybe, but as you know well, we didn't see any report about that until now
so I'm not sure. Ccing people in this thread could have clear answer rather
than me.

In THP, I just found inconsistency between Andrea's comment and code.
That's why I sent a patch.

If your concern turns out truth, Ya, you might find ancient bug. :)
Thanks.

>
> Thanks,
> -Kame
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2013-04-01 23:35:42

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Mon, 1 Apr 2013, Minchan Kim wrote:

> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
>
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
>
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
>

Is this the same issue that Andrea responded to in the "thp and memory
barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

2013-04-02 00:37:50

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> On Mon, 1 Apr 2013, Minchan Kim wrote:
>
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> >
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> >
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> >
>
> Is this the same issue that Andrea responded to in the "thp and memory
> barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

Yes and Peter pointed out further step.
Thanks for pointing out.
Not that I know that Andrea alreay noticed it, I don't care about this
patch.

Remaining question is Kame's one.
Isn't there anyone could answer it?

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2013-04-02 19:20:46

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Tue, 2 Apr 2013, Minchan Kim wrote:

> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
>

Andrea, do you have time to send

c08e0c9ee786 ("thp: add memory barrier to __do_huge_pmd_anonymous_page")
b08d75a595ec ("thp: document barrier() in wrprotect THP fault path")

from the master branch of aa.git to Andrew? I would do it, but one isn't
signed-off in your tree.

2013-04-02 19:30:37

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Tue, 2 Apr 2013, Minchan Kim wrote:
> On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > On Mon, 1 Apr 2013, Minchan Kim wrote:
> >
> > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > spinlock for making sure that clear_huge_page write become visible
> > > after set set_pmd_at() write.
> > >
> > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > easily so above rule was broken so user may see inconsistent data.
> > >
> > > This patch fixes it with using explict barrier rather than depending
> > > on lru spinlock.
> > >
> >
> > Is this the same issue that Andrea responded to in the "thp and memory
> > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
>
> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
>
> Remaining question is Kame's one.
> > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > Users can see non-zero value after page fault in theory ?
> Isn't there anyone could answer it?

See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us

static inline void __SetPageUptodate(struct page *page)
{
smp_wmb();
__set_bit(PG_uptodate, &(page)->flags);
}

So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
to me already, though the huge_memory one could do with a fixed comment.

Hugh

2013-04-03 00:14:07

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote:
> On Tue, 2 Apr 2013, Minchan Kim wrote:
> > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > >
> > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > > spinlock for making sure that clear_huge_page write become visible
> > > > after set set_pmd_at() write.
> > > >
> > > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > > easily so above rule was broken so user may see inconsistent data.
> > > >
> > > > This patch fixes it with using explict barrier rather than depending
> > > > on lru spinlock.
> > > >
> > >
> > > Is this the same issue that Andrea responded to in the "thp and memory
> > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> >
> > Yes and Peter pointed out further step.
> > Thanks for pointing out.
> > Not that I know that Andrea alreay noticed it, I don't care about this
> > patch.
> >
> > Remaining question is Kame's one.
> > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > > Users can see non-zero value after page fault in theory ?
> > Isn't there anyone could answer it?
>
> See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us
>
> static inline void __SetPageUptodate(struct page *page)
> {
> smp_wmb();
> __set_bit(PG_uptodate, &(page)->flags);
> }
>
> So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
> to me already, though the huge_memory one could do with a fixed comment.

Thanks you very much!
That's one everybody are really missing.

Here it goes!

==================== 8< =====================

>From fb0b9f3df698547bfb70f81d85e0d1e00f19e1fc Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Wed, 3 Apr 2013 08:53:27 +0900
Subject: [PATCH] THP: fix comment about memory barrier

Now, memory barrier in __do_huge_pmd_anonymous_page doesn't work.
Because lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user might see inconsistent data.

I was not first person who pointed out the problem. Mel and Peter
pointed out a few months ago and Peter pointed out further that
even spin_lock/unlock can't make sure it.
http://marc.info/?t=134333512700004

In particular:

*A = a;
LOCK
UNLOCK
*B = b;

may occur as:

LOCK, STORE *B, STORE *A, UNLOCK

At last, Hugh pointed out that even we don't need memory barrier
in there because __SetPageUpdate already have done it from
Nick's [1] explicitly.

So this patch fixes comment on THP and adds same comment for
do_anonymous_page, too because everybody except Hugh was missing
that. It means we needs COMMENT about that.

[1] 0ed361dec "mm: fix PageUptodate data race"

Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kamezawa Hiroyuki <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/huge_memory.c | 11 +++++------
mm/memory.c | 5 +++++
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2f7f5aa..f2f17ff 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -713,6 +713,11 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
return VM_FAULT_OOM;

clear_huge_page(page, haddr, HPAGE_PMD_NR);
+ /*
+ * The memory barrier inside __SetPageUptodate makes sure that
+ * clear_huge_page writes become visible after the set_pmd_at()
+ * write.
+ */
__SetPageUptodate(page);

spin_lock(&mm->page_table_lock);
@@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
} else {
pmd_t entry;
entry = mk_huge_pmd(page, vma);
- /*
- * The spinlocking to take the lru_lock inside
- * page_add_new_anon_rmap() acts as a full memory
- * barrier to be sure clear_huge_page writes become
- * visible after the set_pmd_at() write.
- */
page_add_new_anon_rmap(page, vma, haddr);
set_pmd_at(mm, haddr, pmd, entry);
pgtable_trans_huge_deposit(mm, pgtable);
diff --git a/mm/memory.c b/mm/memory.c
index 494526a..d0da51e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = alloc_zeroed_user_highpage_movable(vma, address);
if (!page)
goto oom;
+ /*
+ * The memory barrier inside __SetPageUptodate makes sure that
+ * preceeding stores to the page contents become visible after
+ * the set_pte_at() write.
+ */
__SetPageUptodate(page);

if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))
--
1.8.2

--
Kind regards,
Minchan Kim

2013-04-04 02:45:34

by Simon Jeons

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

Hi Minchan,
On 04/01/2013 07:45 AM, Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.

1. There are no pte modify, why take page_table_lock here?
2. What's the meaning of "clear_huge_page write become visible after set
set_pmd_at() write"?

>
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
>
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/huge_memory.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bfa142e..fad800e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
> pmd_t entry;
> entry = mk_huge_pmd(page, vma);
> /*
> - * The spinlocking to take the lru_lock inside
> - * page_add_new_anon_rmap() acts as a full memory
> - * barrier to be sure clear_huge_page writes become
> - * visible after the set_pmd_at() write.
> + * clear_huge_page write become visible after the
> + * set_pmd_at() write.
> */
> + smp_wmb();
> page_add_new_anon_rmap(page, vma, haddr);
> set_pmd_at(mm, haddr, pmd, entry);
> pgtable_trans_huge_deposit(mm, pgtable);

2013-04-04 13:46:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] THP: Use explicit memory barrier

On Wed, Apr 03, 2013 at 09:14:01AM +0900, Minchan Kim wrote:
> clear_huge_page(page, haddr, HPAGE_PMD_NR);
> + /*
> + * The memory barrier inside __SetPageUptodate makes sure that
> + * clear_huge_page writes become visible after the set_pmd_at()

s/after/before/

> + * write.
> + */
> __SetPageUptodate(page);
>
> spin_lock(&mm->page_table_lock);
> @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
> } else {
> pmd_t entry;
> entry = mk_huge_pmd(page, vma);
> - /*
> - * The spinlocking to take the lru_lock inside
> - * page_add_new_anon_rmap() acts as a full memory
> - * barrier to be sure clear_huge_page writes become
> - * visible after the set_pmd_at() write.
> - */
> page_add_new_anon_rmap(page, vma, haddr);
> set_pmd_at(mm, haddr, pmd, entry);
> pgtable_trans_huge_deposit(mm, pgtable);
> diff --git a/mm/memory.c b/mm/memory.c
> index 494526a..d0da51e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> page = alloc_zeroed_user_highpage_movable(vma, address);
> if (!page)
> goto oom;
> + /*
> + * The memory barrier inside __SetPageUptodate makes sure that
> + * preceeding stores to the page contents become visible after
> + * the set_pte_at() write.
> + */

s/after/before/

After the above correction it looks nice cleanup, thanks!

Acked-by: Andrea Arcangeli <[email protected]>