2020-02-13 18:40:57

by Qian Cai

[permalink] [raw]
Subject: [PATCH -next v2] mm: annotate a data race in page_zonenum()

BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
page_cpupid_xchg_last+0x51/0x80
page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
wp_page_reuse+0x3e/0xc0
wp_page_reuse at mm/memory.c:2453
do_wp_page+0x472/0x7b0
do_wp_page at mm/memory.c:2798
__handle_mm_fault+0xcb0/0xd00
handle_pte_fault at mm/memory.c:4049
(inlined by) __handle_mm_fault at mm/memory.c:4163
handle_mm_fault+0xfc/0x2f0
handle_mm_fault at mm/memory.c:4200
do_page_fault+0x263/0x6f9
do_user_addr_fault at arch/x86/mm/fault.c:1465
(inlined by) do_page_fault at arch/x86/mm/fault.c:1539
page_fault+0x34/0x40

read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
put_page+0x15a/0x1f0
page_zonenum at include/linux/mm.h:923
(inlined by) is_zone_device_page at include/linux/mm.h:929
(inlined by) page_is_devmap_managed at include/linux/mm.h:948
(inlined by) put_page at include/linux/mm.h:1023
wp_page_copy+0x571/0x930
wp_page_copy at mm/memory.c:2615
do_wp_page+0x107/0x7b0
__handle_mm_fault+0xcb0/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A page never changes its zone number. The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write, so it can accept
a reload of the zone bits after an intervening write and it don't need
to use READ_ONCE(). Thus, annotate this data race using
ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
writes to it.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
---

v2: use ASSERT_EXCLUSIVE_BITS().

BTW, not sure if it is easier for Andrew with Paul to pick this up (with
Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
tomorrow's -next tree).

include/linux/mm.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..0d70fafd055c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,

static inline enum zone_type page_zonenum(const struct page *page)
{
+ ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
}

--
1.8.3.1


2020-02-13 19:27:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
> BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
>
> write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> page_cpupid_xchg_last+0x51/0x80
> page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> wp_page_reuse+0x3e/0xc0
> wp_page_reuse at mm/memory.c:2453
> do_wp_page+0x472/0x7b0
> do_wp_page at mm/memory.c:2798
> __handle_mm_fault+0xcb0/0xd00
> handle_pte_fault at mm/memory.c:4049
> (inlined by) __handle_mm_fault at mm/memory.c:4163
> handle_mm_fault+0xfc/0x2f0
> handle_mm_fault at mm/memory.c:4200
> do_page_fault+0x263/0x6f9
> do_user_addr_fault at arch/x86/mm/fault.c:1465
> (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> page_fault+0x34/0x40
>
> read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> put_page+0x15a/0x1f0
> page_zonenum at include/linux/mm.h:923
> (inlined by) is_zone_device_page at include/linux/mm.h:929
> (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> (inlined by) put_page at include/linux/mm.h:1023
> wp_page_copy+0x571/0x930
> wp_page_copy at mm/memory.c:2615
> do_wp_page+0x107/0x7b0
> __handle_mm_fault+0xcb0/0xd00
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>
> A page never changes its zone number. The zone number happens to be
> stored in the same word as other bits which are modified, but the zone
> number bits will never be modified by any other write, so it can accept
> a reload of the zone bits after an intervening write and it don't need
> to use READ_ONCE(). Thus, annotate this data race using
> ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> writes to it.
>
> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Qian Cai <[email protected]>
> ---
>
> v2: use ASSERT_EXCLUSIVE_BITS().
>
> BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> tomorrow's -next tree).

Here are the options I know of, any of which work for me:

1. I take the patch given appropriate acks/reviews.

2. Someone hangs onto the patch until the KCSAN infrastructure
hits mainline and then sends it up via whatever path.

3. One way to do #2 is to merge the -rcu tree's "kcsan" branch and
then queue this patch on top of that, again sending this patch
along once KCSAN hits mainline. Unusually for the -rcu tree,
the "kcsan" branch is not supposed to be rebased.

Either way, just let me know!

Thanx, Paul

> include/linux/mm.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..0d70fafd055c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>
> static inline enum zone_type page_zonenum(const struct page *page)
> {
> + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> }
>
> --
> 1.8.3.1
>

2020-02-13 21:47:57

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

On 2/13/20 10:38 AM, Qian Cai wrote:
> BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
>
> write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> page_cpupid_xchg_last+0x51/0x80
> page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> wp_page_reuse+0x3e/0xc0
> wp_page_reuse at mm/memory.c:2453
> do_wp_page+0x472/0x7b0
> do_wp_page at mm/memory.c:2798
> __handle_mm_fault+0xcb0/0xd00
> handle_pte_fault at mm/memory.c:4049
> (inlined by) __handle_mm_fault at mm/memory.c:4163
> handle_mm_fault+0xfc/0x2f0
> handle_mm_fault at mm/memory.c:4200
> do_page_fault+0x263/0x6f9
> do_user_addr_fault at arch/x86/mm/fault.c:1465
> (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> page_fault+0x34/0x40
>
> read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> put_page+0x15a/0x1f0
> page_zonenum at include/linux/mm.h:923
> (inlined by) is_zone_device_page at include/linux/mm.h:929
> (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> (inlined by) put_page at include/linux/mm.h:1023
> wp_page_copy+0x571/0x930
> wp_page_copy at mm/memory.c:2615
> do_wp_page+0x107/0x7b0
> __handle_mm_fault+0xcb0/0xd00
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>
> A page never changes its zone number. The zone number happens to be
> stored in the same word as other bits which are modified, but the zone
> number bits will never be modified by any other write, so it can accept
> a reload of the zone bits after an intervening write and it don't need
> to use READ_ONCE(). Thus, annotate this data race using
> ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> writes to it.
>
> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Qian Cai <[email protected]>
> ---
>
> v2: use ASSERT_EXCLUSIVE_BITS().


Much cleaner, thanks to this new macro. You can add:


Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

>
> BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> tomorrow's -next tree).
>
> include/linux/mm.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..0d70fafd055c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>
> static inline enum zone_type page_zonenum(const struct page *page)
> {
> + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> }
>
>

2020-02-14 14:32:52

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

On Thu, 2020-02-13 at 11:26 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
> > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> >
> > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > page_cpupid_xchg_last+0x51/0x80
> > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > wp_page_reuse+0x3e/0xc0
> > wp_page_reuse at mm/memory.c:2453
> > do_wp_page+0x472/0x7b0
> > do_wp_page at mm/memory.c:2798
> > __handle_mm_fault+0xcb0/0xd00
> > handle_pte_fault at mm/memory.c:4049
> > (inlined by) __handle_mm_fault at mm/memory.c:4163
> > handle_mm_fault+0xfc/0x2f0
> > handle_mm_fault at mm/memory.c:4200
> > do_page_fault+0x263/0x6f9
> > do_user_addr_fault at arch/x86/mm/fault.c:1465
> > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > page_fault+0x34/0x40
> >
> > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > put_page+0x15a/0x1f0
> > page_zonenum at include/linux/mm.h:923
> > (inlined by) is_zone_device_page at include/linux/mm.h:929
> > (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > (inlined by) put_page at include/linux/mm.h:1023
> > wp_page_copy+0x571/0x930
> > wp_page_copy at mm/memory.c:2615
> > do_wp_page+0x107/0x7b0
> > __handle_mm_fault+0xcb0/0xd00
> > handle_mm_fault+0xfc/0x2f0
> > do_page_fault+0x263/0x6f9
> > page_fault+0x34/0x40
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >
> > A page never changes its zone number. The zone number happens to be
> > stored in the same word as other bits which are modified, but the zone
> > number bits will never be modified by any other write, so it can accept
> > a reload of the zone bits after an intervening write and it don't need
> > to use READ_ONCE(). Thus, annotate this data race using
> > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > writes to it.
> >
> > Suggested-by: Marco Elver <[email protected]>
> > Signed-off-by: Qian Cai <[email protected]>
> > ---
> >
> > v2: use ASSERT_EXCLUSIVE_BITS().
> >
> > BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> > Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> > tomorrow's -next tree).
>
> Here are the options I know of, any of which work for me:
>
> 1. I take the patch given appropriate acks/reviews.
>
> 2. Someone hangs onto the patch until the KCSAN infrastructure
> hits mainline and then sends it up via whatever path.
>
> 3. One way to do #2 is to merge the -rcu tree's "kcsan" branch and
> then queue this patch on top of that, again sending this patch
> along once KCSAN hits mainline. Unusually for the -rcu tree,
> the "kcsan" branch is not supposed to be rebased.
>
> Either way, just let me know!

I suppose it is up to Andrew what he prefers. I may have another pile of patches
depends on some of those KCSAN things (except the data_race() macro) are coming,
so I'd like to find a way to get into -next first rather than waiting to send
them in April (once KCSAN hit the mainline) if possible.

>
> Thanx, Paul
>
> > include/linux/mm.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 52269e56c514..0d70fafd055c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> >
> > static inline enum zone_type page_zonenum(const struct page *page)
> > {
> > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > }
> >
> > --
> > 1.8.3.1
> >

2020-02-14 16:18:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

On Fri, Feb 14, 2020 at 09:30:26AM -0500, Qian Cai wrote:
> On Thu, 2020-02-13 at 11:26 -0800, Paul E. McKenney wrote:
> > On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
> > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > >
> > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > > page_cpupid_xchg_last+0x51/0x80
> > > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > > wp_page_reuse+0x3e/0xc0
> > > wp_page_reuse at mm/memory.c:2453
> > > do_wp_page+0x472/0x7b0
> > > do_wp_page at mm/memory.c:2798
> > > __handle_mm_fault+0xcb0/0xd00
> > > handle_pte_fault at mm/memory.c:4049
> > > (inlined by) __handle_mm_fault at mm/memory.c:4163
> > > handle_mm_fault+0xfc/0x2f0
> > > handle_mm_fault at mm/memory.c:4200
> > > do_page_fault+0x263/0x6f9
> > > do_user_addr_fault at arch/x86/mm/fault.c:1465
> > > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > > page_fault+0x34/0x40
> > >
> > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > > put_page+0x15a/0x1f0
> > > page_zonenum at include/linux/mm.h:923
> > > (inlined by) is_zone_device_page at include/linux/mm.h:929
> > > (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > > (inlined by) put_page at include/linux/mm.h:1023
> > > wp_page_copy+0x571/0x930
> > > wp_page_copy at mm/memory.c:2615
> > > do_wp_page+0x107/0x7b0
> > > __handle_mm_fault+0xcb0/0xd00
> > > handle_mm_fault+0xfc/0x2f0
> > > do_page_fault+0x263/0x6f9
> > > page_fault+0x34/0x40
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > >
> > > A page never changes its zone number. The zone number happens to be
> > > stored in the same word as other bits which are modified, but the zone
> > > number bits will never be modified by any other write, so it can accept
> > > a reload of the zone bits after an intervening write and it don't need
> > > to use READ_ONCE(). Thus, annotate this data race using
> > > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > > writes to it.
> > >
> > > Suggested-by: Marco Elver <[email protected]>
> > > Signed-off-by: Qian Cai <[email protected]>
> > > ---
> > >
> > > v2: use ASSERT_EXCLUSIVE_BITS().
> > >
> > > BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> > > Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> > > tomorrow's -next tree).
> >
> > Here are the options I know of, any of which work for me:
> >
> > 1. I take the patch given appropriate acks/reviews.
> >
> > 2. Someone hangs onto the patch until the KCSAN infrastructure
> > hits mainline and then sends it up via whatever path.
> >
> > 3. One way to do #2 is to merge the -rcu tree's "kcsan" branch and
> > then queue this patch on top of that, again sending this patch
> > along once KCSAN hits mainline. Unusually for the -rcu tree,
> > the "kcsan" branch is not supposed to be rebased.
> >
> > Either way, just let me know!
>
> I suppose it is up to Andrew what he prefers. I may have another pile of patches
> depends on some of those KCSAN things (except the data_race() macro) are coming,
> so I'd like to find a way to get into -next first rather than waiting to send
> them in April (once KCSAN hit the mainline) if possible.

Any of the three options above would get you into -next quickly.

So just let me know how you would like to proceed.

Thanx, Paul

> > > include/linux/mm.h | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 52269e56c514..0d70fafd055c 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> > >
> > > static inline enum zone_type page_zonenum(const struct page *page)
> > > {
> > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > > }
> > >
> > > --
> > > 1.8.3.1
> > >

2020-02-18 12:43:18

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()



> On Feb 14, 2020, at 11:16 AM, Paul E. McKenney <[email protected]> wrote:
>
> Any of the three options above would get you into -next quickly.
>
> So just let me know how you would like to proceed.

Unless Andrew has any objection, I think it makes sense that you pick this up as it is pretty much a low risk patch.

2020-02-19 21:57:18

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

Andrew, since you had picked the similar one which also depends
on ASSERT_EXCLUSIVE_*, can you pick up this patch as well?

On Thu, 2020-02-13 at 13:20 -0800, John Hubbard wrote:
> On 2/13/20 10:38 AM, Qian Cai wrote:
> > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> >
> > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > page_cpupid_xchg_last+0x51/0x80
> > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > wp_page_reuse+0x3e/0xc0
> > wp_page_reuse at mm/memory.c:2453
> > do_wp_page+0x472/0x7b0
> > do_wp_page at mm/memory.c:2798
> > __handle_mm_fault+0xcb0/0xd00
> > handle_pte_fault at mm/memory.c:4049
> > (inlined by) __handle_mm_fault at mm/memory.c:4163
> > handle_mm_fault+0xfc/0x2f0
> > handle_mm_fault at mm/memory.c:4200
> > do_page_fault+0x263/0x6f9
> > do_user_addr_fault at arch/x86/mm/fault.c:1465
> > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > page_fault+0x34/0x40
> >
> > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > put_page+0x15a/0x1f0
> > page_zonenum at include/linux/mm.h:923
> > (inlined by) is_zone_device_page at include/linux/mm.h:929
> > (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > (inlined by) put_page at include/linux/mm.h:1023
> > wp_page_copy+0x571/0x930
> > wp_page_copy at mm/memory.c:2615
> > do_wp_page+0x107/0x7b0
> > __handle_mm_fault+0xcb0/0xd00
> > handle_mm_fault+0xfc/0x2f0
> > do_page_fault+0x263/0x6f9
> > page_fault+0x34/0x40
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >
> > A page never changes its zone number. The zone number happens to be
> > stored in the same word as other bits which are modified, but the zone
> > number bits will never be modified by any other write, so it can accept
> > a reload of the zone bits after an intervening write and it don't need
> > to use READ_ONCE(). Thus, annotate this data race using
> > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > writes to it.
> >
> > Suggested-by: Marco Elver <[email protected]>
> > Signed-off-by: Qian Cai <[email protected]>
> > ---
> >
> > v2: use ASSERT_EXCLUSIVE_BITS().
>
>
> Much cleaner, thanks to this new macro. You can add:
>
>
> Reviewed-by: John Hubbard <[email protected]>
>
>
> thanks,

2020-02-19 23:48:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -next v2] mm: annotate a data race in page_zonenum()

On Wed, Feb 19, 2020 at 04:55:40PM -0500, Qian Cai wrote:
> Andrew, since you had picked the similar one which also depends
> on?ASSERT_EXCLUSIVE_*,?can you pick up this patch as well?
>
> On Thu, 2020-02-13 at 13:20 -0800, John Hubbard wrote:
> > On 2/13/20 10:38 AM, Qian Cai wrote:
> > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > >
> > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > > page_cpupid_xchg_last+0x51/0x80
> > > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > > wp_page_reuse+0x3e/0xc0
> > > wp_page_reuse at mm/memory.c:2453
> > > do_wp_page+0x472/0x7b0
> > > do_wp_page at mm/memory.c:2798
> > > __handle_mm_fault+0xcb0/0xd00
> > > handle_pte_fault at mm/memory.c:4049
> > > (inlined by) __handle_mm_fault at mm/memory.c:4163
> > > handle_mm_fault+0xfc/0x2f0
> > > handle_mm_fault at mm/memory.c:4200
> > > do_page_fault+0x263/0x6f9
> > > do_user_addr_fault at arch/x86/mm/fault.c:1465
> > > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > > page_fault+0x34/0x40
> > >
> > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > > put_page+0x15a/0x1f0
> > > page_zonenum at include/linux/mm.h:923
> > > (inlined by) is_zone_device_page at include/linux/mm.h:929
> > > (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > > (inlined by) put_page at include/linux/mm.h:1023
> > > wp_page_copy+0x571/0x930
> > > wp_page_copy at mm/memory.c:2615
> > > do_wp_page+0x107/0x7b0
> > > __handle_mm_fault+0xcb0/0xd00
> > > handle_mm_fault+0xfc/0x2f0
> > > do_page_fault+0x263/0x6f9
> > > page_fault+0x34/0x40
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6
> > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > >
> > > A page never changes its zone number. The zone number happens to be
> > > stored in the same word as other bits which are modified, but the zone
> > > number bits will never be modified by any other write, so it can accept
> > > a reload of the zone bits after an intervening write and it don't need
> > > to use READ_ONCE(). Thus, annotate this data race using
> > > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > > writes to it.
> > >
> > > Suggested-by: Marco Elver <[email protected]>
> > > Signed-off-by: Qian Cai <[email protected]>
> > > ---
> > >
> > > v2: use ASSERT_EXCLUSIVE_BITS().
> >
> >
> > Much cleaner, thanks to this new macro. You can add:
> >
> >
> > Reviewed-by: John Hubbard <[email protected]>

It looks like Andrew already pulled this one in, but let me know if
he drops it for whatever reason.

Thanx, Paul