find_extend_vma assumes the caller holds mmap_sem as a reader (explained
in expand_downwards()). The path when we are extending the stack VMA to
accomodate argv[] pointers happens without the lock.
I was not able to cause an mm_struct corruption but
BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be
triggered as
# <bigfile xargs echo
xargs: echo: terminated by signal 11
(bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows)
Other accesses to mm_struct in exec path are protected by mmap_sem, so
conservatively, protect also this one. Besides that, explain why we omit
mm_struct.arg_lock in the exec(2) path.
Cc: Cyrill Gorcunov <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
When I was attempting to reduce usage of mmap_sem I came across this
unprotected access and increased number of its holders :-/
I'm not sure whether there is a real concurrent writer at this early
stages (I considered khugepaged especially as setup_arg_pages invokes
khugepaged_enter_vma_merge but we're lucky because khugepaged skips it
because of VM_STACK_INCOMPLETE_SETUP).
A nicer approach would perhaps be to do all this exec setup when the
mm_struct is still not exposed via current->mm (and hence no need to
synchronize via mmap_sem). But I didn't look enough into binfmt specific
whether it is even doable and worth it.
So I'm sending this for a discussion.
fs/binfmt_elf.c | 10 +++++++++-
fs/exec.c | 3 ++-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8264b468f283..48e169760a9c 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
* Grow the stack manually; some architectures have a limit on how
* far ahead a user-space access may be in order to grow the stack.
*/
+ if (down_read_killable(¤t->mm->mmap_sem))
+ return -EINTR;
vma = find_extend_vma(current->mm, bprm->p);
+ up_read(¤t->mm->mmap_sem);
+
if (!vma)
return -EFAULT;
@@ -1123,11 +1127,15 @@ static int load_elf_binary(struct linux_binprm *bprm)
goto out;
#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
+ /*
+ * Don't take mm->arg_lock. The concurrent change might happen only
+ * from prctl_set_mm but after de_thread we are certainly alone here.
+ */
retval = create_elf_tables(bprm, &loc->elf_ex,
load_addr, interp_load_addr);
if (retval < 0)
goto out;
- /* N.B. passed_fileno might not be initialized? */
+
current->mm->end_code = end_code;
current->mm->start_code = start_code;
current->mm->start_data = start_data;
diff --git a/fs/exec.c b/fs/exec.c
index 89a500bb897a..d5b55c92019a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -212,7 +212,8 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
/*
* We are doing an exec(). 'current' is the process
- * doing the exec and bprm->mm is the new process's mm.
+ * doing the exec and bprm->mm is the new process's mm that is not
+ * shared yet, so no synchronization on mmap_sem.
*/
ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
--
2.21.0
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutn? wrote:
> - /* N.B. passed_fileno might not be initialized? */
> +
Why did you delete this comment?
On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <[email protected]> wrote:
> On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutn? wrote:
> > - /* N.B. passed_fileno might not be initialized? */
> > +
>
> Why did you delete this comment?
The variable got removed in
d20894a23708 ("Remove a.out interpreter support in ELF loader")
so it is not relevant anymore.
On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutn? wrote:
> On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <[email protected]> wrote:
> > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutn? wrote:
> > > - /* N.B. passed_fileno might not be initialized? */
> > > +
> >
> > Why did you delete this comment?
> The variable got removed in
> d20894a23708 ("Remove a.out interpreter support in ELF loader")
> so it is not relevant anymore.
Better put that in the changelog for v2 then. or even make it a
separate patch.
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutn? wrote:
> find_extend_vma assumes the caller holds mmap_sem as a reader (explained
> in expand_downwards()). The path when we are extending the stack VMA to
> accomodate argv[] pointers happens without the lock.
>
> I was not able to cause an mm_struct corruption but
> BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be
> triggered as
>
> # <bigfile xargs echo
> xargs: echo: terminated by signal 11
>
> (bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows)
>
> Other accesses to mm_struct in exec path are protected by mmap_sem, so
> conservatively, protect also this one. Besides that, explain why we omit
> mm_struct.arg_lock in the exec(2) path.
>
> Cc: Cyrill Gorcunov <[email protected]>
> Signed-off-by: Michal Koutn? <[email protected]>
> ---
>
> When I was attempting to reduce usage of mmap_sem I came across this
> unprotected access and increased number of its holders :-/
>
> I'm not sure whether there is a real concurrent writer at this early
> stages (I considered khugepaged especially as setup_arg_pages invokes
> khugepaged_enter_vma_merge but we're lucky because khugepaged skips it
> because of VM_STACK_INCOMPLETE_SETUP).
>
> A nicer approach would perhaps be to do all this exec setup when the
> mm_struct is still not exposed via current->mm (and hence no need to
> synchronize via mmap_sem). But I didn't look enough into binfmt specific
> whether it is even doable and worth it.
>
> So I'm sending this for a discussion.
>
> fs/binfmt_elf.c | 10 +++++++++-
> fs/exec.c | 3 ++-
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 8264b468f283..48e169760a9c 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
> * Grow the stack manually; some architectures have a limit on how
> * far ahead a user-space access may be in order to grow the stack.
> */
> + if (down_read_killable(¤t->mm->mmap_sem))
> + return -EINTR;
> vma = find_extend_vma(current->mm, bprm->p);
> + up_read(¤t->mm->mmap_sem);
> +
Good catch, Michal! Actually the loader code is heavy on its own so
I think having readlock taken here should not cause any perf problems
but worth having for consistency.
Reviewed-by: Cyrill Gorcunov <[email protected]>
On Wed, Jun 12, 2019 at 10:51:59AM -0700, Matthew Wilcox wrote:
> On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutn? wrote:
> > On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <[email protected]> wrote:
> > > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutn? wrote:
> > > > - /* N.B. passed_fileno might not be initialized? */
> > > > +
> > >
> > > Why did you delete this comment?
> > The variable got removed in
> > d20894a23708 ("Remove a.out interpreter support in ELF loader")
> > so it is not relevant anymore.
>
> Better put that in the changelog for v2 then. or even make it a
> separate patch.
Just updated changelog should be fine, I guess. A separate commit
just to remove an obsolete comment is too much.
find_extend_vma assumes the caller holds mmap_sem as a reader (explained
in expand_downwards()). The path when we are extending the stack VMA to
accommodate argv[] pointers happens without the lock.
I was not able to cause an mm_struct corruption but an inserted
BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be
triggered as
# <bigfile xargs echo
xargs: echo: terminated by signal 11
(bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows)
Other accesses to mm_struct in exec path are protected by mmap_sem, so
conservatively, protect also this one.
Besides that, explain in comments why we omit mm_struct.arg_lock in the
exec(2) path and drop an obsolete comment about removed passed_fileno.
v2: Updated changelog
Cc: Cyrill Gorcunov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
fs/binfmt_elf.c | 10 +++++++++-
fs/exec.c | 3 ++-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8264b468f283..48e169760a9c 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
* Grow the stack manually; some architectures have a limit on how
* far ahead a user-space access may be in order to grow the stack.
*/
+ if (down_read_killable(¤t->mm->mmap_sem))
+ return -EINTR;
vma = find_extend_vma(current->mm, bprm->p);
+ up_read(¤t->mm->mmap_sem);
+
if (!vma)
return -EFAULT;
@@ -1123,11 +1127,15 @@ static int load_elf_binary(struct linux_binprm *bprm)
goto out;
#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
+ /*
+ * Don't take mm->arg_lock. The concurrent change might happen only
+ * from prctl_set_mm but after de_thread we are certainly alone here.
+ */
retval = create_elf_tables(bprm, &loc->elf_ex,
load_addr, interp_load_addr);
if (retval < 0)
goto out;
- /* N.B. passed_fileno might not be initialized? */
+
current->mm->end_code = end_code;
current->mm->start_code = start_code;
current->mm->start_data = start_data;
diff --git a/fs/exec.c b/fs/exec.c
index 89a500bb897a..d5b55c92019a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -212,7 +212,8 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
/*
* We are doing an exec(). 'current' is the process
- * doing the exec and bprm->mm is the new process's mm.
+ * doing the exec and bprm->mm is the new process's mm that is not
+ * shared yet, so no synchronization on mmap_sem.
*/
ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
--
2.21.0