2015-05-11 07:51:39

by Vladimir Davydov

[permalink] [raw]
Subject: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

Hi,

I've been arguing with Minchan for a while about whether store-tearing
is possible while setting page->mapping in __page_set_anon_rmap and
friends, see

http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132

This patch is intended to draw attention to this discussion. It fixes a
race that could happen if store-tearing were possible. The race is as
follows.

In do_wp_page() we can call page_move_anon_rmap(), which sets
page->mapping as follows:

anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
page->mapping = (struct address_space *) anon_vma;

The page in question may be on an LRU list, because nowhere in
do_wp_page() we remove it from the list, neither do we take any LRU
related locks. Although the page is locked, shrink_active_list() can
still call page_referenced() on it concurrently, because the latter does
not require an anonymous page to be locked.

If store tearing described in the thread were possible, we could face
the following race resulting in kernel panic:

CPU0 CPU1
---- ----
do_wp_page shrink_active_list
lock_page page_referenced
PageAnon->yes, so skip trylock_page
page_move_anon_rmap
page->mapping = anon_vma
rmap_walk
PageAnon->no
rmap_walk_file
BUG
page->mapping += PAGE_MAPPING_ANON

This patch fixes this race by explicitly forbidding the compiler to
split page->mapping store in __page_set_anon_rmap() and friends and load
in PageAnon() with the aid of WRITE/READ_ONCE.

Personally, I don't believe that this can ever happen on any sane
compiler, because such an "optimization" would only result in two stores
vs one (note, anon_vma is not a constant), but since I can be mistaken I
would like to hear from synchronization experts what they think about
it.

Thanks,
Vladimir
---
include/linux/page-flags.h | 3 ++-
mm/rmap.c | 6 +++---
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5e7c4f50a644..a529e0a35fe9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)

static inline int PageAnon(struct page *page)
{
- return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+ return ((unsigned long)READ_ONCE(page->mapping) &
+ PAGE_MAPPING_ANON) != 0;
}

#ifdef CONFIG_KSM
diff --git a/mm/rmap.c b/mm/rmap.c
index eca7416f55d7..aa60c63704e6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);

anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
- page->mapping = (struct address_space *) anon_vma;
+ WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
}

/**
@@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
anon_vma = anon_vma->root;

anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
- page->mapping = (struct address_space *) anon_vma;
+ WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
page->index = linear_page_index(vma, address);
}

@@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
anon_vma = anon_vma->root;

anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
- page->mapping = (struct address_space *) anon_vma;
+ WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
page->index = linear_page_index(vma, address);
}

--
1.7.10.4


2015-05-11 08:59:30

by yalin wang

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

2015-05-11 15:51 GMT+08:00 Vladimir Davydov <[email protected]>:
> Hi,
>
> I've been arguing with Minchan for a while about whether store-tearing
> is possible while setting page->mapping in __page_set_anon_rmap and
> friends, see
>
> http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
>
> This patch is intended to draw attention to this discussion. It fixes a
> race that could happen if store-tearing were possible. The race is as
> follows.
>
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> page->mapping = (struct address_space *) anon_vma;
>
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked.
>
> If store tearing described in the thread were possible, we could face
> the following race resulting in kernel panic:
>
> CPU0 CPU1
> ---- ----
> do_wp_page shrink_active_list
> lock_page page_referenced
> PageAnon->yes, so skip trylock_page
> page_move_anon_rmap
> page->mapping = anon_vma
> rmap_walk
> PageAnon->no
> rmap_walk_file
> BUG
> page->mapping += PAGE_MAPPING_ANON
>
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in __page_set_anon_rmap() and friends and load
> in PageAnon() with the aid of WRITE/READ_ONCE.
>
> Personally, I don't believe that this can ever happen on any sane
> compiler, because such an "optimization" would only result in two stores
> vs one (note, anon_vma is not a constant), but since I can be mistaken I
> would like to hear from synchronization experts what they think about
> it.
>
> Thanks,
> Vladimir
> ---
> include/linux/page-flags.h | 3 ++-
> mm/rmap.c | 6 +++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5e7c4f50a644..a529e0a35fe9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
>
> static inline int PageAnon(struct page *page)
> {
> - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> + return ((unsigned long)READ_ONCE(page->mapping) &
> + PAGE_MAPPING_ANON) != 0;
> }
>
> #ifdef CONFIG_KSM
> diff --git a/mm/rmap.c b/mm/rmap.c
> index eca7416f55d7..aa60c63704e6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
> VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> }
>
> /**
> @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> page->index = linear_page_index(vma, address);
> }
>
> @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> page->index = linear_page_index(vma, address);
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

i am confused about your analysis ,
for the race stack:

CPU0 CPU1

---- ----

do_wp_page shrink_active_list

lock_page page_referenced

PageAnon->yes, so skip trylock_page

page_move_anon_rmap

page->mapping = anon_vma

rmap_walk

PageAnon->no

rmap_walk_file

BUG

page->mapping += PAGE_MAPPING_ANON

the page should must change from PageAnon() to !PageAnon() when crash happened.
but page_move_anon_rmap() is doing change a page from !PageAnon()
(swapcache page)
to PageAnon() , how does this race condition crash happened ?

BRs,
Yalin

2015-05-11 09:39:59

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> Hi,
>
> I've been arguing with Minchan for a while about whether store-tearing
> is possible while setting page->mapping in __page_set_anon_rmap and
> friends, see
>
> http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
>
> This patch is intended to draw attention to this discussion. It fixes a
> race that could happen if store-tearing were possible. The race is as
> follows.
>
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> page->mapping = (struct address_space *) anon_vma;
>
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked.
>
> If store tearing described in the thread were possible, we could face
> the following race resulting in kernel panic:
>
> CPU0 CPU1
> ---- ----
> do_wp_page shrink_active_list
> lock_page page_referenced
> PageAnon->yes, so skip trylock_page
> page_move_anon_rmap
> page->mapping = anon_vma
> rmap_walk
> PageAnon->no
> rmap_walk_file
> BUG
> page->mapping += PAGE_MAPPING_ANON
>
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in __page_set_anon_rmap() and friends and load
> in PageAnon() with the aid of WRITE/READ_ONCE.
>
> Personally, I don't believe that this can ever happen on any sane
> compiler, because such an "optimization" would only result in two stores
> vs one (note, anon_vma is not a constant), but since I can be mistaken I
> would like to hear from synchronization experts what they think about
> it.
>
> Thanks,
> Vladimir
> ---
> include/linux/page-flags.h | 3 ++-
> mm/rmap.c | 6 +++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5e7c4f50a644..a529e0a35fe9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
>
> static inline int PageAnon(struct page *page)
> {
> - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> + return ((unsigned long)READ_ONCE(page->mapping) &
> + PAGE_MAPPING_ANON) != 0;

Why do we need this? Write side should be enough to get this
deterministic.

> }
>
> #ifdef CONFIG_KSM
> diff --git a/mm/rmap.c b/mm/rmap.c
> index eca7416f55d7..aa60c63704e6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
> VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> }
>
> /**
> @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> page->index = linear_page_index(vma, address);

No need: we don't hit this code if page is already PageAnon().

> }
>
> @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);

Ditto.

> page->index = linear_page_index(vma, address);
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kirill A. Shutemov

2015-05-11 14:29:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> Hi,
>
> I've been arguing with Minchan for a while about whether store-tearing
> is possible while setting page->mapping in __page_set_anon_rmap and
> friends, see
>
> http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
>
> This patch is intended to draw attention to this discussion. It fixes a
> race that could happen if store-tearing were possible. The race is as
> follows.
>
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> page->mapping = (struct address_space *) anon_vma;
>
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked.
>
> If store tearing described in the thread were possible, we could face
> the following race resulting in kernel panic:
>
> CPU0 CPU1
> ---- ----
> do_wp_page shrink_active_list
> lock_page page_referenced
> PageAnon->yes, so skip trylock_page
> page_move_anon_rmap
> page->mapping = anon_vma
> rmap_walk
> PageAnon->no
> rmap_walk_file
> BUG
> page->mapping += PAGE_MAPPING_ANON
>
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in __page_set_anon_rmap() and friends and load
> in PageAnon() with the aid of WRITE/READ_ONCE.
>
> Personally, I don't believe that this can ever happen on any sane
> compiler, because such an "optimization" would only result in two stores
> vs one (note, anon_vma is not a constant), but since I can be mistaken I
> would like to hear from synchronization experts what they think about
> it.

An example "insane" compiler might notice that the value set cannot be
safely observed without multiple CPUs accessing that variable at the
same time. A paper entitled "No Sane Compiler Would Optimize Atomics"
has some examples:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

If this paper doesn't scare you, then you didn't read it carefully enough.
And yes, I did give the author a very hard time about the need to suppress
some of these optimizations in order to correctly compile old code, and
will continue to do so. However, a READ_ONCE() would be a most excellent
and very cheap way to future-proof this code, and is highly recommended.

Thanx, Paul

> Thanks,
> Vladimir
> ---
> include/linux/page-flags.h | 3 ++-
> mm/rmap.c | 6 +++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5e7c4f50a644..a529e0a35fe9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
>
> static inline int PageAnon(struct page *page)
> {
> - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> + return ((unsigned long)READ_ONCE(page->mapping) &
> + PAGE_MAPPING_ANON) != 0;
> }
>
> #ifdef CONFIG_KSM
> diff --git a/mm/rmap.c b/mm/rmap.c
> index eca7416f55d7..aa60c63704e6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
> VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> }
>
> /**
> @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> page->index = linear_page_index(vma, address);
> }
>
> @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
> anon_vma = anon_vma->root;
>
> anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> - page->mapping = (struct address_space *) anon_vma;
> + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> page->index = linear_page_index(vma, address);
> }
>
> --
> 1.7.10.4
>

2015-05-12 08:35:01

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

On Mon, May 11, 2015 at 04:59:27PM +0800, yalin wang wrote:
> i am confused about your analysis ,
> for the race stack:
>
> CPU0 CPU1
>
> ---- ----
>
> do_wp_page shrink_active_list
>
> lock_page page_referenced
>
> PageAnon->yes, so skip trylock_page
>
> page_move_anon_rmap
>
> page->mapping = anon_vma
>
> rmap_walk
>
> PageAnon->no
>
> rmap_walk_file
>
> BUG
>
> page->mapping += PAGE_MAPPING_ANON
>
> the page should must change from PageAnon() to !PageAnon() when crash happened.
> but page_move_anon_rmap() is doing change a page from !PageAnon()
> (swapcache page) to PageAnon() ,

A swapcache page is not necessarily !PageAnon. In do_wp_page() old_page
*is* PageAnon. It may or may not be on the swapcache though, which does
not really matter.

> how does this race condition crash happened ?

It never happened. It might theoretically happen due to a compiler
"optimization" I described above.

Thanks,
Vladimir

2015-05-12 09:30:37

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

On Mon, May 11, 2015 at 12:36:52PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 5e7c4f50a644..a529e0a35fe9 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
> >
> > static inline int PageAnon(struct page *page)
> > {
> > - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> > + return ((unsigned long)READ_ONCE(page->mapping) &
> > + PAGE_MAPPING_ANON) != 0;
>
> Why do we need this? Write side should be enough to get this
> deterministic.

Yeah, this seems to be completely redundant, my bad.

>
> > }
> >
> > #ifdef CONFIG_KSM
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index eca7416f55d7..aa60c63704e6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
> > VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
> >
> > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > - page->mapping = (struct address_space *) anon_vma;
> > + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> > }
> >
> > /**
> > @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
> > anon_vma = anon_vma->root;
> >
> > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > - page->mapping = (struct address_space *) anon_vma;
> > + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> > page->index = linear_page_index(vma, address);
>
> No need: we don't hit this code if page is already PageAnon().

Agree.

>
> > }
> >
> > @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
> > anon_vma = anon_vma->root;
> >
> > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > - page->mapping = (struct address_space *) anon_vma;
> > + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>
> Ditto.

Agree.

So we do need this eventually, don't we? Frankly, I doubted that,
because the fact that a compiler can do such wicked things really scares
me :-/

All right then, I'll resend the patch with your comments addressed.
Thank you for spending your time reviewing it.

Thanks,
Vladimir

2015-05-12 09:31:53

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

On Mon, May 11, 2015 at 07:24:02AM -0700, Paul E. McKenney wrote:
> On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> > Hi,
> >
> > I've been arguing with Minchan for a while about whether store-tearing
> > is possible while setting page->mapping in __page_set_anon_rmap and
> > friends, see
> >
> > http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
> >
> > This patch is intended to draw attention to this discussion. It fixes a
> > race that could happen if store-tearing were possible. The race is as
> > follows.
> >
> > In do_wp_page() we can call page_move_anon_rmap(), which sets
> > page->mapping as follows:
> >
> > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > page->mapping = (struct address_space *) anon_vma;
> >
> > The page in question may be on an LRU list, because nowhere in
> > do_wp_page() we remove it from the list, neither do we take any LRU
> > related locks. Although the page is locked, shrink_active_list() can
> > still call page_referenced() on it concurrently, because the latter does
> > not require an anonymous page to be locked.
> >
> > If store tearing described in the thread were possible, we could face
> > the following race resulting in kernel panic:
> >
> > CPU0 CPU1
> > ---- ----
> > do_wp_page shrink_active_list
> > lock_page page_referenced
> > PageAnon->yes, so skip trylock_page
> > page_move_anon_rmap
> > page->mapping = anon_vma
> > rmap_walk
> > PageAnon->no
> > rmap_walk_file
> > BUG
> > page->mapping += PAGE_MAPPING_ANON
> >
> > This patch fixes this race by explicitly forbidding the compiler to
> > split page->mapping store in __page_set_anon_rmap() and friends and load
> > in PageAnon() with the aid of WRITE/READ_ONCE.
> >
> > Personally, I don't believe that this can ever happen on any sane
> > compiler, because such an "optimization" would only result in two stores
> > vs one (note, anon_vma is not a constant), but since I can be mistaken I
> > would like to hear from synchronization experts what they think about
> > it.
>
> An example "insane" compiler might notice that the value set cannot be
> safely observed without multiple CPUs accessing that variable at the
> same time. A paper entitled "No Sane Compiler Would Optimize Atomics"
> has some examples:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
>
> If this paper doesn't scare you, then you didn't read it carefully enough.
> And yes, I did give the author a very hard time about the need to suppress
> some of these optimizations in order to correctly compile old code, and
> will continue to do so. However, a READ_ONCE() would be a most excellent
> and very cheap way to future-proof this code, and is highly recommended.

Really interesting paper (although scary :-). I think I'm now convinced
that a compiler may be really wicked at times. Thank you for sharing the
link.

Thanks,
Vladimir

2015-05-17 12:48:31

by yalin wang

[permalink] [raw]
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list


> On Mon, May 11, 2015 at 04:59:27PM +0800, yalin wang wrote:
>> i am confused about your analysis ,
>> for the race stack:
>>
>> CPU0 CPU1
>>
>> ---- ----
>>
>> do_wp_page shrink_active_list
>>
>> lock_page page_referenced
>>
>> PageAnon->yes, so skip trylock_page
>>
>> page_move_anon_rmap
>>
>> page->mapping = anon_vma
>>
>> rmap_walk
>>
>> PageAnon->no
>>
>> rmap_walk_file
>>
>> BUG
>>
>> page->mapping += PAGE_MAPPING_ANON
>>
>> the page should must change from PageAnon() to !PageAnon() when crash happened.
>> but page_move_anon_rmap() is doing change a page from !PageAnon()
>> (swapcache page) to PageAnon() ,
> A swapcache page is not necessarily !PageAnon. In do_wp_page() old_page
> *is* PageAnon. It may or may not be on the swapcache though, which does
> not really matter.
>
>> how does this race condition crash happened ?
> It never happened. It might theoretically happen due to a compiler
> "optimization" I described above.
i see,
Thanks for your explanation!