2020-02-19 14:29:18

by Qian Cai

[permalink] [raw]
Subject: [PATCH -next v2] fork: annotate a data race in vm_area_dup()

struct vm_area_struct could be accessed concurrently as noticed by
KCSAN,

write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
vma_interval_tree_insert+0x101/0x150:
rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
(inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
__vma_link_file+0x6e/0xe0
__vma_link_file at mm/mmap.c:629
vma_link+0xa2/0x120
mmap_region+0x753/0xb90
do_mmap+0x45c/0x710
vm_mmap_pgoff+0xc0/0x130
ksys_mmap_pgoff+0x1d1/0x300
__x64_sys_mmap+0x33/0x40
do_syscall_64+0x91/0xc44
entry_SYSCALL_64_after_hwframe+0x49/0xbe

read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
vm_area_dup+0x6a/0xe0
vm_area_dup at kernel/fork.c:362
__split_vma+0x72/0x2a0
__split_vma at mm/mmap.c:2661
split_vma+0x5a/0x80
mprotect_fixup+0x368/0x3f0
do_mprotect_pkey+0x263/0x420
__x64_sys_mprotect+0x51/0x70
do_syscall_64+0x91/0xc44
entry_SYSCALL_64_after_hwframe+0x49/0xbe

vm_area_dup() blindly copies all fields of original VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally
protected by i_mmap_lock. But this is fine because the read value will
be overwritten on the following __vma_link_file() under proper
protection. Thus, mark it as an intentional data race and insert a few
assertions for the fields that should not be modified concurrently.

Signed-off-by: Qian Cai <[email protected]>
---

v2: Insert a few assertions thanks to Marco.
Update the commit log thanks to Kirill.

kernel/fork.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index cb2ae49e497e..67a5d691ffa8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);

if (new) {
- *new = *orig;
+ ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+ ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+ /*
+ * orig->shared.rb may be modified concurrently, but the clone
+ * will be reinitialized.
+ */
+ *new = data_race(*orig);
INIT_LIST_HEAD(&new->anon_vma_chain);
new->vm_next = new->vm_prev = NULL;
}
--
1.8.3.1


2020-02-19 19:04:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -next v2] fork: annotate a data race in vm_area_dup()

On Wed, Feb 19, 2020 at 09:28:15AM -0500, Qian Cai wrote:
> struct vm_area_struct could be accessed concurrently as noticed by
> KCSAN,
>
> write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
> vma_interval_tree_insert+0x101/0x150:
> rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
> (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
> __vma_link_file+0x6e/0xe0
> __vma_link_file at mm/mmap.c:629
> vma_link+0xa2/0x120
> mmap_region+0x753/0xb90
> do_mmap+0x45c/0x710
> vm_mmap_pgoff+0xc0/0x130
> ksys_mmap_pgoff+0x1d1/0x300
> __x64_sys_mmap+0x33/0x40
> do_syscall_64+0x91/0xc44
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
> vm_area_dup+0x6a/0xe0
> vm_area_dup at kernel/fork.c:362
> __split_vma+0x72/0x2a0
> __split_vma at mm/mmap.c:2661
> split_vma+0x5a/0x80
> mprotect_fixup+0x368/0x3f0
> do_mprotect_pkey+0x263/0x420
> __x64_sys_mprotect+0x51/0x70
> do_syscall_64+0x91/0xc44
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> vm_area_dup() blindly copies all fields of original VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally
> protected by i_mmap_lock. But this is fine because the read value will
> be overwritten on the following __vma_link_file() under proper
> protection. Thus, mark it as an intentional data race and insert a few
> assertions for the fields that should not be modified concurrently.
>
> Signed-off-by: Qian Cai <[email protected]>

Queued for safekeeping on -rcu. I had to adjust a bit to get it to
apply on -rcu, please see below. In my experience, git should have
no trouble figuring it out. ;-)

Thanx, Paul

------------------------------------------------------------------------

commit 1228aca56f2a25b67876d8a819437b620a6e1cee
Author: Qian Cai <[email protected]>
Date: Wed Feb 19 11:00:54 2020 -0800

fork: Annotate a data race in vm_area_dup()

struct vm_area_struct could be accessed concurrently as noticed by
KCSAN,

write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
vma_interval_tree_insert+0x101/0x150:
rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
(inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
__vma_link_file+0x6e/0xe0
__vma_link_file at mm/mmap.c:629
vma_link+0xa2/0x120
mmap_region+0x753/0xb90
do_mmap+0x45c/0x710
vm_mmap_pgoff+0xc0/0x130
ksys_mmap_pgoff+0x1d1/0x300
__x64_sys_mmap+0x33/0x40
do_syscall_64+0x91/0xc44
entry_SYSCALL_64_after_hwframe+0x49/0xbe

read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
vm_area_dup+0x6a/0xe0
vm_area_dup at kernel/fork.c:362
__split_vma+0x72/0x2a0
__split_vma at mm/mmap.c:2661
split_vma+0x5a/0x80
mprotect_fixup+0x368/0x3f0
do_mprotect_pkey+0x263/0x420
__x64_sys_mprotect+0x51/0x70
do_syscall_64+0x91/0xc44
entry_SYSCALL_64_after_hwframe+0x49/0xbe

vm_area_dup() blindly copies all fields of original VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally
protected by i_mmap_lock. But this is fine because the read value will
be overwritten on the following __vma_link_file() under proper
protection. Thus, mark it as an intentional data race and insert a few
assertions for the fields that should not be modified concurrently.

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/fork.c b/kernel/fork.c
index 60a1295..e592e6f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);

if (new) {
- *new = *orig;
+ ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+ ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+ /*
+ * orig->shared.rb may be modified concurrently, but the clone
+ * will be reinitialized.
+ */
+ *new = data_race(*orig);
INIT_LIST_HEAD(&new->anon_vma_chain);
}
return new;

2020-02-20 12:56:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH -next v2] fork: annotate a data race in vm_area_dup()

On Wed, Feb 19, 2020 at 09:28:15AM -0500, Qian Cai wrote:
> struct vm_area_struct could be accessed concurrently as noticed by
> KCSAN,
>
> write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
> vma_interval_tree_insert+0x101/0x150:
> rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
> (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
> __vma_link_file+0x6e/0xe0
> __vma_link_file at mm/mmap.c:629
> vma_link+0xa2/0x120
> mmap_region+0x753/0xb90
> do_mmap+0x45c/0x710
> vm_mmap_pgoff+0xc0/0x130
> ksys_mmap_pgoff+0x1d1/0x300
> __x64_sys_mmap+0x33/0x40
> do_syscall_64+0x91/0xc44
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
> vm_area_dup+0x6a/0xe0
> vm_area_dup at kernel/fork.c:362
> __split_vma+0x72/0x2a0
> __split_vma at mm/mmap.c:2661
> split_vma+0x5a/0x80
> mprotect_fixup+0x368/0x3f0
> do_mprotect_pkey+0x263/0x420
> __x64_sys_mprotect+0x51/0x70
> do_syscall_64+0x91/0xc44
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> vm_area_dup() blindly copies all fields of original VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally
> protected by i_mmap_lock. But this is fine because the read value will
> be overwritten on the following __vma_link_file() under proper
> protection. Thus, mark it as an intentional data race and insert a few
> assertions for the fields that should not be modified concurrently.
>
> Signed-off-by: Qian Cai <[email protected]>

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kirill A. Shutemov