2019-04-17 12:06:43

by Michal Koutný

[permalink] [raw]
Subject: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct")
avoided the coarse use of mmap_sem in similar situations.

get_cmdline can also use arg_lock instead of mmap_sem when it reads the
boundaries.

Signed-off-by: Michal Koutný <[email protected]>
---
mm/util.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index d559bde497a9..568575cceefc 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */

- down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);

len = arg_end - arg_start;

--
2.16.4


2019-04-17 13:43:16

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

On Wed 17-04-19 14:03:47, Michal Koutny wrote:
> The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
> semaphore taken.") added synchronization of reading argument/environment
> boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
> arg_lock to protect arg_start|end and env_start|end in mm_struct")
> avoided the coarse use of mmap_sem in similar situations.
>
> get_cmdline can also use arg_lock instead of mmap_sem when it reads the
> boundaries.

Don't we need to use the lock in prctl_set_mm as well then?

> Signed-off-by: Michal Koutn? <[email protected]>
> ---
> mm/util.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index d559bde497a9..568575cceefc 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
> if (!mm->arg_end)
> goto out_mm; /* Shh! No looking before we're done */
>
> - down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
> env_end = mm->env_end;
> - up_read(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
>
> len = arg_end - arg_start;
>
> --
> 2.16.4

--
Michal Hocko
SUSE Labs

2019-04-17 14:43:15

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko <[email protected]> wrote:
> Don't we need to use the lock in prctl_set_mm as well then?

Correct. The patch alone just moves the race from
get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.

arg_lock could be used in prctl_set_mm but the better idea (IMO) is
complete removal of that code in favor of prctl_set_mm_map [1].

Michal

[1] https://lore.kernel.org/lkml/[email protected]/

2019-04-17 14:57:00

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

On Wed 17-04-19 16:41:42, Michal Koutny wrote:
> On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko <[email protected]> wrote:
> > Don't we need to use the lock in prctl_set_mm as well then?
>
> Correct. The patch alone just moves the race from
> get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.
>
> arg_lock could be used in prctl_set_mm but the better idea (IMO) is
> complete removal of that code in favor of prctl_set_mm_map [1].

Ohh, I have missed that patch. As long as both are merged together then
no objections from me and you can add

Acked-by: Michal Hocko <[email protected]>

> Michal
>
> [1] https://lore.kernel.org/lkml/[email protected]/

--
Michal Hocko
SUSE Labs

2019-04-18 13:52:29

by Michal Koutný

[permalink] [raw]
Subject: [PATCH] prctl_set_mm: downgrade mmap_sem to read lock

I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
[1], so at least downgrade the write acquisition of mmap_sem as in the
patch below (that should be stacked on the previous one or squashed).

Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
it supposed to be [2]? That could be an alternative to this patch after
some refreshments and clarifications.


[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

========

Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
arg_start|end and env_start|end in mm_struct") we use arg_lock for
boundaries modifications. Synchronize prctl_set_mm with this lock and
keep mmap_sem for reading only (analogous to what we already do in
prctl_set_mm_map).

Also, save few cycles by looking up VMA only after performing basic
arguments validation.

Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..bbce0f26d707 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = -EINVAL;

- down_write(&mm->mmap_sem);
- vma = find_vma(mm, addr);
+ /*
+ * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
+ * a) concurrent sys_brk, b) finding VMA for addr validation.
+ */
+ down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);

prctl_map.start_code = mm->start_code;
prctl_map.end_code = mm->end_code;
@@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
if (error)
goto out;

+ vma = find_vma(mm, addr);
switch (opt) {
/*
* If command line arguments and environment
@@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = 0;
out:
- up_write(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
+ up_read(&mm->mmap_sem);
return error;
}

--
2.16.4

2019-04-18 14:12:35

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH] prctl_set_mm: downgrade mmap_sem to read lock

On Thu, Apr 18, 2019 at 03:50:39PM +0200, Michal Koutn? wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.

Yes, seems so. From a glance the patch shold be ok. Michal will review
it more carefully today. Thanks!

2019-04-18 14:16:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] prctl_set_mm: downgrade mmap_sem to read lock

On Thu 18-04-19 15:50:39, Michal Koutny wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutn? <[email protected]>

Looks good to me. Please send both patches in one series once you get a
review feedback from other people.

> ---
> kernel/sys.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..bbce0f26d707 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> - vma = find_vma(mm, addr);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
>
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> @@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> if (error)
> goto out;
>
> + vma = find_vma(mm, addr);
> switch (opt) {
> /*
> * If command line arguments and environment
> @@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;
> }
>
> --
> 2.16.4

--
Michal Hocko
SUSE Labs

2019-04-18 14:29:46

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH] prctl_set_mm: downgrade mmap_sem to read lock

Le 18/04/2019 à 15:50, Michal Koutný a écrit :
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutný <[email protected]>
> ---
> kernel/sys.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..bbce0f26d707 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> - vma = find_vma(mm, addr);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
>
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> @@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> if (error)
> goto out;
>
> + vma = find_vma(mm, addr);

Why is find_vma() called while holding the arg_lock ?

To limit the time the spinlock is held, would it be better to
read_lock(mmap_sem)
find_vma()
spin_lock(arg_lock)
..
out:
spin_unlock()
up_read(mmap_sem)

Not sure this would change a lot the performance anyway.

> switch (opt) {
> /*
> * If command line arguments and environment
> @@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;
> }
>
>

2019-04-18 18:25:10

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH] prctl_set_mm: downgrade mmap_sem to read lock

On Thu, Apr 18, 2019 at 03:50:39PM +0200, Michal Koutn? wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutn? <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>

As Laurent mentioned we might move vma lookup before the spinlock,
but this might be done on top of the series.

2019-04-30 08:20:36

by Michal Koutný

[permalink] [raw]
Subject: [PATCH 0/3] Reduce mmap_sem usage for args manipulation

Hello.

(apologies for late reply) I've aggregated the two previously discussed patches
into one series and based on responses made some changes summed below.

v2
- insert a patch refactoring validate_prctl_map
- move find_vma out of the arg_lock critical section


Michal Koutný (3):
mm: get_cmdline use arg_lock instead of mmap_sem
prctl_set_mm: Refactor checks from validate_prctl_map
prctl_set_mm: downgrade mmap_sem to read lock

kernel/sys.c | 55 ++++++++++++++++++++++++++++---------------------------
mm/util.c | 4 ++--
2 files changed, 30 insertions(+), 29 deletions(-)

--
2.16.4

2019-04-30 08:20:39

by Michal Koutný

[permalink] [raw]
Subject: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct")
avoided the coarse use of mmap_sem in similar situations.

get_cmdline can also use arg_lock instead of mmap_sem when it reads the
boundaries.

Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
Cc: Yang Shi <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
mm/util.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 43a2984bccaa..5cf0e84a0823 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */

- down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);

len = arg_end - arg_start;

--
2.16.4

2019-04-30 08:20:51

by Michal Koutný

[permalink] [raw]
Subject: [PATCH 2/3] prctl_set_mm: Refactor checks from validate_prctl_map

Despite comment of validate_prctl_map claims there are no capability
checks, it is not completely true since commit 4d28df6152aa ("prctl:
Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of
the function and make the function perform purely arithmetic checks.

This patch should not change any behavior, it is mere refactoring for
following patch.

CC: Kirill Tkhai <[email protected]>
CC: Cyrill Gorcunov <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 45 ++++++++++++++++++++-------------------------
1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..e1acb444d7b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1882,10 +1882,12 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
}

/*
+ * Check arithmetic relations of passed addresses.
+ *
* WARNING: we don't require any capability here so be very careful
* in what is allowed for modification from userspace.
*/
-static int validate_prctl_map(struct prctl_mm_map *prctl_map)
+static int validate_prctl_map_addr(struct prctl_mm_map *prctl_map)
{
unsigned long mmap_max_addr = TASK_SIZE;
struct mm_struct *mm = current->mm;
@@ -1949,24 +1951,6 @@ static int validate_prctl_map(struct prctl_mm_map *prctl_map)
prctl_map->start_data))
goto out;

- /*
- * Someone is trying to cheat the auxv vector.
- */
- if (prctl_map->auxv_size) {
- if (!prctl_map->auxv || prctl_map->auxv_size > sizeof(mm->saved_auxv))
- goto out;
- }
-
- /*
- * Finally, make sure the caller has the rights to
- * change /proc/pid/exe link: only local sys admin should
- * be allowed to.
- */
- if (prctl_map->exe_fd != (u32)-1) {
- if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
- goto out;
- }
-
error = 0;
out:
return error;
@@ -1993,11 +1977,17 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
return -EFAULT;

- error = validate_prctl_map(&prctl_map);
+ error = validate_prctl_map_addr(&prctl_map);
if (error)
return error;

if (prctl_map.auxv_size) {
+ /*
+ * Someone is trying to cheat the auxv vector.
+ */
+ if (!prctl_map.auxv || prctl_map.auxv_size > sizeof(mm->saved_auxv))
+ return -EINVAL;
+
memset(user_auxv, 0, sizeof(user_auxv));
if (copy_from_user(user_auxv,
(const void __user *)prctl_map.auxv,
@@ -2010,6 +2000,14 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
}

if (prctl_map.exe_fd != (u32)-1) {
+ /*
+ * Make sure the caller has the rights to
+ * change /proc/pid/exe link: only local sys admin should
+ * be allowed to.
+ */
+ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+ return -EINVAL;
+
error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
if (error)
return error;
@@ -2097,7 +2095,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
unsigned long arg4, unsigned long arg5)
{
struct mm_struct *mm = current->mm;
- struct prctl_mm_map prctl_map;
+ struct prctl_mm_map prctl_map = { .auxv = NULL, .auxv_size = 0, .exe_fd = -1 };
struct vm_area_struct *vma;
int error;

@@ -2139,9 +2137,6 @@ static int prctl_set_mm(int opt, unsigned long addr,
prctl_map.arg_end = mm->arg_end;
prctl_map.env_start = mm->env_start;
prctl_map.env_end = mm->env_end;
- prctl_map.auxv = NULL;
- prctl_map.auxv_size = 0;
- prctl_map.exe_fd = -1;

switch (opt) {
case PR_SET_MM_START_CODE:
@@ -2181,7 +2176,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
goto out;
}

- error = validate_prctl_map(&prctl_map);
+ error = validate_prctl_map_addr(&prctl_map);
if (error)
goto out;

--
2.16.4

2019-04-30 08:21:27

by Michal Koutný

[permalink] [raw]
Subject: [PATCH 3/3] prctl_set_mm: downgrade mmap_sem to read lock

Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
arg_start|end and env_start|end in mm_struct") we use arg_lock for
boundaries modifications. Synchronize prctl_set_mm with this lock and
keep mmap_sem for reading only (analogous to what we already do in
prctl_set_mm_map).

v2: call find_vma without arg_lock held

CC: Cyrill Gorcunov <[email protected]>
CC: Laurent Dufour <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index e1acb444d7b0..641fda756575 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2123,9 +2123,14 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = -EINVAL;

- down_write(&mm->mmap_sem);
+ /*
+ * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
+ * a) concurrent sys_brk, b) finding VMA for addr validation.
+ */
+ down_read(&mm->mmap_sem);
vma = find_vma(mm, addr);

+ spin_lock(&mm->arg_lock);
prctl_map.start_code = mm->start_code;
prctl_map.end_code = mm->end_code;
prctl_map.start_data = mm->start_data;
@@ -2213,7 +2218,8 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = 0;
out:
- up_write(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
+ up_read(&mm->mmap_sem);
return error;
}

--
2.16.4

2019-04-30 08:58:21

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH 3/3] prctl_set_mm: downgrade mmap_sem to read lock

On 30.04.2019 11:18, Michal Koutný wrote:
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> v2: call find_vma without arg_lock held
>
> CC: Cyrill Gorcunov <[email protected]>
> CC: Laurent Dufour <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>
> ---
> kernel/sys.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index e1acb444d7b0..641fda756575 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2123,9 +2123,14 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> vma = find_vma(mm, addr);
>
> + spin_lock(&mm->arg_lock);
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> prctl_map.start_data = mm->start_data;
> @@ -2213,7 +2218,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;

Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
in a separate patch?

Without them, the existing code has a problem at least in get_mm_cmdline().

2019-04-30 09:09:18

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 3/3] prctl_set_mm: downgrade mmap_sem to read lock

On Tue, Apr 30, 2019 at 11:55:45AM +0300, Kirill Tkhai wrote:
> > - up_write(&mm->mmap_sem);
> > + spin_unlock(&mm->arg_lock);
> > + up_read(&mm->mmap_sem);
> > return error;
>
> Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
> in a separate patch?
>
> Without them, the existing code has a problem at least in get_mm_cmdline().

Seems reasonable to merge it into patch 1.

2019-04-30 09:12:33

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On 30.04.2019 11:18, Michal Koutný wrote:
> The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
> semaphore taken.") added synchronization of reading argument/environment
> boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
> arg_lock to protect arg_start|end and env_start|end in mm_struct")
> avoided the coarse use of mmap_sem in similar situations.
>
> get_cmdline can also use arg_lock instead of mmap_sem when it reads the
> boundaries.
>
> Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
> Cc: Yang Shi <[email protected]>
> Cc: Mateusz Guzik <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> mm/util.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index 43a2984bccaa..5cf0e84a0823 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
> if (!mm->arg_end)
> goto out_mm; /* Shh! No looking before we're done */
>
> - down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
> env_end = mm->env_end;
> - up_read(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
>
> len = arg_end - arg_start;

This looks OK for me.

But speaking about existing code it's a secret for me, why we ignore arg_lock
in binfmt code, e.g. in load_elf_binary().

2019-04-30 09:13:17

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH 3/3] prctl_set_mm: downgrade mmap_sem to read lock

On 30.04.2019 12:08, Cyrill Gorcunov wrote:
> On Tue, Apr 30, 2019 at 11:55:45AM +0300, Kirill Tkhai wrote:
>>> - up_write(&mm->mmap_sem);
>>> + spin_unlock(&mm->arg_lock);
>>> + up_read(&mm->mmap_sem);
>>> return error;
>>
>> Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
>> in a separate patch?
>>
>> Without them, the existing code has a problem at least in get_mm_cmdline().
>
> Seems reasonable to merge it into patch 1.

Sounds sensibly.

2019-04-30 09:29:22

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH 2/3] prctl_set_mm: Refactor checks from validate_prctl_map

On 30.04.2019 11:18, Michal Koutný wrote:
> Despite comment of validate_prctl_map claims there are no capability
> checks, it is not completely true since commit 4d28df6152aa ("prctl:
> Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of
> the function and make the function perform purely arithmetic checks.
>
> This patch should not change any behavior, it is mere refactoring for
> following patch.
>
> CC: Kirill Tkhai <[email protected]>
> CC: Cyrill Gorcunov <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>

Reviewed-by: Kirill Tkhai <[email protected]>

> ---
> kernel/sys.c | 45 ++++++++++++++++++++-------------------------
> 1 file changed, 20 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..e1acb444d7b0 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1882,10 +1882,12 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> }
>
> /*
> + * Check arithmetic relations of passed addresses.
> + *
> * WARNING: we don't require any capability here so be very careful
> * in what is allowed for modification from userspace.
> */
> -static int validate_prctl_map(struct prctl_mm_map *prctl_map)
> +static int validate_prctl_map_addr(struct prctl_mm_map *prctl_map)
> {
> unsigned long mmap_max_addr = TASK_SIZE;
> struct mm_struct *mm = current->mm;
> @@ -1949,24 +1951,6 @@ static int validate_prctl_map(struct prctl_mm_map *prctl_map)
> prctl_map->start_data))
> goto out;
>
> - /*
> - * Someone is trying to cheat the auxv vector.
> - */
> - if (prctl_map->auxv_size) {
> - if (!prctl_map->auxv || prctl_map->auxv_size > sizeof(mm->saved_auxv))
> - goto out;
> - }
> -
> - /*
> - * Finally, make sure the caller has the rights to
> - * change /proc/pid/exe link: only local sys admin should
> - * be allowed to.
> - */
> - if (prctl_map->exe_fd != (u32)-1) {
> - if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> - goto out;
> - }
> -
> error = 0;
> out:
> return error;
> @@ -1993,11 +1977,17 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> return -EFAULT;
>
> - error = validate_prctl_map(&prctl_map);
> + error = validate_prctl_map_addr(&prctl_map);
> if (error)
> return error;
>
> if (prctl_map.auxv_size) {
> + /*
> + * Someone is trying to cheat the auxv vector.
> + */
> + if (!prctl_map.auxv || prctl_map.auxv_size > sizeof(mm->saved_auxv))
> + return -EINVAL;
> +
> memset(user_auxv, 0, sizeof(user_auxv));
> if (copy_from_user(user_auxv,
> (const void __user *)prctl_map.auxv,
> @@ -2010,6 +2000,14 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> }
>
> if (prctl_map.exe_fd != (u32)-1) {
> + /*
> + * Make sure the caller has the rights to
> + * change /proc/pid/exe link: only local sys admin should
> + * be allowed to.
> + */
> + if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> + return -EINVAL;
> +
> error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
> if (error)
> return error;
> @@ -2097,7 +2095,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> unsigned long arg4, unsigned long arg5)
> {
> struct mm_struct *mm = current->mm;
> - struct prctl_mm_map prctl_map;
> + struct prctl_mm_map prctl_map = { .auxv = NULL, .auxv_size = 0, .exe_fd = -1 };
> struct vm_area_struct *vma;
> int error;
>
> @@ -2139,9 +2137,6 @@ static int prctl_set_mm(int opt, unsigned long addr,
> prctl_map.arg_end = mm->arg_end;
> prctl_map.env_start = mm->env_start;
> prctl_map.env_end = mm->env_end;
> - prctl_map.auxv = NULL;
> - prctl_map.auxv_size = 0;
> - prctl_map.exe_fd = -1;
>
> switch (opt) {
> case PR_SET_MM_START_CODE:
> @@ -2181,7 +2176,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> goto out;
> }
>
> - error = validate_prctl_map(&prctl_map);
> + error = validate_prctl_map_addr(&prctl_map);
> if (error)
> goto out;
>
>

2019-04-30 09:41:03

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On Tue, Apr 30, 2019 at 12:09:57PM +0300, Kirill Tkhai wrote:
>
> This looks OK for me.
>
> But speaking about existing code it's a secret for me, why we ignore arg_lock
> in binfmt code, e.g. in load_elf_binary().

Well, strictly speaking we probably should but you know setup of
the @arg_start by kernel's elf loader doesn't cause any side
effects as far as I can tell (its been working this lockless
way for years, mmap_sem is taken later in the loader code).
Though for consistency sake we probably should set it up
under the spinlock.

2019-04-30 09:57:06

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On 30.04.2019 12:38, Cyrill Gorcunov wrote:
> On Tue, Apr 30, 2019 at 12:09:57PM +0300, Kirill Tkhai wrote:
>>
>> This looks OK for me.
>>
>> But speaking about existing code it's a secret for me, why we ignore arg_lock
>> in binfmt code, e.g. in load_elf_binary().
>
> Well, strictly speaking we probably should but you know setup of
> the @arg_start by kernel's elf loader doesn't cause any side
> effects as far as I can tell (its been working this lockless
> way for years, mmap_sem is taken later in the loader code).
> Though for consistency sake we probably should set it up
> under the spinlock.

Ok, so elf loader doesn't change these parameters. Thanks for the explanation.

2019-04-30 10:48:30

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On Tue, Apr 30, 2019 at 12:53:51PM +0300, Kirill Tkhai wrote:
> >
> > Well, strictly speaking we probably should but you know setup of
> > the @arg_start by kernel's elf loader doesn't cause any side
> > effects as far as I can tell (its been working this lockless
> > way for years, mmap_sem is taken later in the loader code).
> > Though for consistency sake we probably should set it up
> > under the spinlock.
>
> Ok, so elf loader doesn't change these parameters.
> Thanks for the explanation.

It setups these parameters unconditionally. I need to revisit
this moment. Technically (if only I'm not missing something
obvious) we might have a race here with prctl setting up new
params, but this should be harmless since most of them (except
stack setup) are purely informative data.

2019-04-30 10:58:31

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On Tue, Apr 30, 2019 at 01:45:17PM +0300, Cyrill Gorcunov <[email protected]> wrote:
> It setups these parameters unconditionally. I need to revisit
> this moment. Technically (if only I'm not missing something
> obvious) we might have a race here with prctl setting up new
> params, but this should be harmless since most of them (except
> stack setup) are purely informative data.
FTR, when I reviewed that usage, I noticed it was missing the
synchronization. My understanding was that the mm_struct isn't yet
shared at this moment. I can see some of the operations take place after
flush_old_exec (where current->mm = mm_struct), so potentially it is
shared since then. OTOH, I guess there aren't concurrent parties that
could access the field at this stage of exec.

My 2 cents,
Michal


Attachments:
(No filename) (806.00 B)
signature.asc (849.00 B)
Digital signature
Download all attachments

2019-04-30 13:25:15

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: get_cmdline use arg_lock instead of mmap_sem

On Tue, Apr 30, 2019 at 12:56:10PM +0200, Michal Koutn? wrote:
> On Tue, Apr 30, 2019 at 01:45:17PM +0300, Cyrill Gorcunov <[email protected]> wrote:
> > It setups these parameters unconditionally. I need to revisit
> > this moment. Technically (if only I'm not missing something
> > obvious) we might have a race here with prctl setting up new
> > params, but this should be harmless since most of them (except
> > stack setup) are purely informative data.
>
> FTR, when I reviewed that usage, I noticed it was missing the
> synchronization. My understanding was that the mm_struct isn't yet
> shared at this moment. I can see some of the operations take place after
> flush_old_exec (where current->mm = mm_struct), so potentially it is
> shared since then. OTOH, I guess there aren't concurrent parties that
> could access the field at this stage of exec.

Just revisited this code -- we're either executing prctl, either execve.
Since both operates with current task we're safe.

2019-05-02 12:54:19

by Michal Koutný

[permalink] [raw]
Subject: [PATCH v3 0/2] Reduce mmap_sem usage for args manipulation

Hello.

Just merged the two dicussed patches and fixed an overlooked warning.

v2
- insert a patch refactoring validate_prctl_map
- move find_vma out of the arg_lock critical section

v3
- squash get_cmdline and prctl_set_mm patches (to keep locking
consistency)
- validate_prctl_map_addr: remove unused variable mm

Michal Koutný (2):
prctl_set_mm: Refactor checks from validate_prctl_map
prctl_set_mm: downgrade mmap_sem to read lock

kernel/sys.c | 56 ++++++++++++++++++++++++++++----------------------------
mm/util.c | 4 ++--
2 files changed, 30 insertions(+), 30 deletions(-)

--
2.16.4