The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct")
avoided the coarse use of mmap_sem in similar situations.
get_cmdline can also use arg_lock instead of mmap_sem when it reads the
boundaries.
Signed-off-by: Michal Koutný <[email protected]>
---
mm/util.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index d559bde497a9..568575cceefc 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */
- down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
len = arg_end - arg_start;
--
2.16.4
On Wed 17-04-19 14:03:47, Michal Koutny wrote:
> The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
> semaphore taken.") added synchronization of reading argument/environment
> boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
> arg_lock to protect arg_start|end and env_start|end in mm_struct")
> avoided the coarse use of mmap_sem in similar situations.
>
> get_cmdline can also use arg_lock instead of mmap_sem when it reads the
> boundaries.
Don't we need to use the lock in prctl_set_mm as well then?
> Signed-off-by: Michal Koutn? <[email protected]>
> ---
> mm/util.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index d559bde497a9..568575cceefc 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
> if (!mm->arg_end)
> goto out_mm; /* Shh! No looking before we're done */
>
> - down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
> env_end = mm->env_end;
> - up_read(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
>
> len = arg_end - arg_start;
>
> --
> 2.16.4
--
Michal Hocko
SUSE Labs
On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko <[email protected]> wrote:
> Don't we need to use the lock in prctl_set_mm as well then?
Correct. The patch alone just moves the race from
get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.
arg_lock could be used in prctl_set_mm but the better idea (IMO) is
complete removal of that code in favor of prctl_set_mm_map [1].
Michal
[1] https://lore.kernel.org/lkml/[email protected]/
On Wed 17-04-19 16:41:42, Michal Koutny wrote:
> On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko <[email protected]> wrote:
> > Don't we need to use the lock in prctl_set_mm as well then?
>
> Correct. The patch alone just moves the race from
> get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.
>
> arg_lock could be used in prctl_set_mm but the better idea (IMO) is
> complete removal of that code in favor of prctl_set_mm_map [1].
Ohh, I have missed that patch. As long as both are merged together then
no objections from me and you can add
Acked-by: Michal Hocko <[email protected]>
> Michal
>
> [1] https://lore.kernel.org/lkml/[email protected]/
--
Michal Hocko
SUSE Labs
I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
[1], so at least downgrade the write acquisition of mmap_sem as in the
patch below (that should be stacked on the previous one or squashed).
Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
it supposed to be [2]? That could be an alternative to this patch after
some refreshments and clarifications.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
========
Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
arg_start|end and env_start|end in mm_struct") we use arg_lock for
boundaries modifications. Synchronize prctl_set_mm with this lock and
keep mmap_sem for reading only (analogous to what we already do in
prctl_set_mm_map).
Also, save few cycles by looking up VMA only after performing basic
arguments validation.
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..bbce0f26d707 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
error = -EINVAL;
- down_write(&mm->mmap_sem);
- vma = find_vma(mm, addr);
+ /*
+ * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
+ * a) concurrent sys_brk, b) finding VMA for addr validation.
+ */
+ down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);
prctl_map.start_code = mm->start_code;
prctl_map.end_code = mm->end_code;
@@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
if (error)
goto out;
+ vma = find_vma(mm, addr);
switch (opt) {
/*
* If command line arguments and environment
@@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
error = 0;
out:
- up_write(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
+ up_read(&mm->mmap_sem);
return error;
}
--
2.16.4
On Thu, Apr 18, 2019 at 03:50:39PM +0200, Michal Koutn? wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
Yes, seems so. From a glance the patch shold be ok. Michal will review
it more carefully today. Thanks!
On Thu 18-04-19 15:50:39, Michal Koutny wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutn? <[email protected]>
Looks good to me. Please send both patches in one series once you get a
review feedback from other people.
> ---
> kernel/sys.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..bbce0f26d707 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> - vma = find_vma(mm, addr);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
>
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> @@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> if (error)
> goto out;
>
> + vma = find_vma(mm, addr);
> switch (opt) {
> /*
> * If command line arguments and environment
> @@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;
> }
>
> --
> 2.16.4
--
Michal Hocko
SUSE Labs
Le 18/04/2019 à 15:50, Michal Koutný a écrit :
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutný <[email protected]>
> ---
> kernel/sys.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..bbce0f26d707 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2125,8 +2125,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> - vma = find_vma(mm, addr);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
>
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> @@ -2185,6 +2189,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> if (error)
> goto out;
>
> + vma = find_vma(mm, addr);
Why is find_vma() called while holding the arg_lock ?
To limit the time the spinlock is held, would it be better to
read_lock(mmap_sem)
find_vma()
spin_lock(arg_lock)
..
out:
spin_unlock()
up_read(mmap_sem)
Not sure this would change a lot the performance anyway.
> switch (opt) {
> /*
> * If command line arguments and environment
> @@ -2218,7 +2223,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;
> }
>
>
On Thu, Apr 18, 2019 at 03:50:39PM +0200, Michal Koutn? wrote:
> I learnt, it's, alas, too late to drop the non PRCTL_SET_MM_MAP calls
> [1], so at least downgrade the write acquisition of mmap_sem as in the
> patch below (that should be stacked on the previous one or squashed).
>
> Cyrill, you mentioned lock changes in [1] but the link seems empty. Is
> it supposed to be [2]? That could be an alternative to this patch after
> some refreshments and clarifications.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> ========
>
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> Also, save few cycles by looking up VMA only after performing basic
> arguments validation.
>
> Signed-off-by: Michal Koutn? <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
As Laurent mentioned we might move vma lookup before the spinlock,
but this might be done on top of the series.
Hello.
(apologies for late reply) I've aggregated the two previously discussed patches
into one series and based on responses made some changes summed below.
v2
- insert a patch refactoring validate_prctl_map
- move find_vma out of the arg_lock critical section
Michal Koutný (3):
mm: get_cmdline use arg_lock instead of mmap_sem
prctl_set_mm: Refactor checks from validate_prctl_map
prctl_set_mm: downgrade mmap_sem to read lock
kernel/sys.c | 55 ++++++++++++++++++++++++++++---------------------------
mm/util.c | 4 ++--
2 files changed, 30 insertions(+), 29 deletions(-)
--
2.16.4
The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct")
avoided the coarse use of mmap_sem in similar situations.
get_cmdline can also use arg_lock instead of mmap_sem when it reads the
boundaries.
Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
Cc: Yang Shi <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
mm/util.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 43a2984bccaa..5cf0e84a0823 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */
- down_read(&mm->mmap_sem);
+ spin_lock(&mm->arg_lock);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
len = arg_end - arg_start;
--
2.16.4
Despite comment of validate_prctl_map claims there are no capability
checks, it is not completely true since commit 4d28df6152aa ("prctl:
Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of
the function and make the function perform purely arithmetic checks.
This patch should not change any behavior, it is mere refactoring for
following patch.
CC: Kirill Tkhai <[email protected]>
CC: Cyrill Gorcunov <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 45 ++++++++++++++++++++-------------------------
1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..e1acb444d7b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1882,10 +1882,12 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
}
/*
+ * Check arithmetic relations of passed addresses.
+ *
* WARNING: we don't require any capability here so be very careful
* in what is allowed for modification from userspace.
*/
-static int validate_prctl_map(struct prctl_mm_map *prctl_map)
+static int validate_prctl_map_addr(struct prctl_mm_map *prctl_map)
{
unsigned long mmap_max_addr = TASK_SIZE;
struct mm_struct *mm = current->mm;
@@ -1949,24 +1951,6 @@ static int validate_prctl_map(struct prctl_mm_map *prctl_map)
prctl_map->start_data))
goto out;
- /*
- * Someone is trying to cheat the auxv vector.
- */
- if (prctl_map->auxv_size) {
- if (!prctl_map->auxv || prctl_map->auxv_size > sizeof(mm->saved_auxv))
- goto out;
- }
-
- /*
- * Finally, make sure the caller has the rights to
- * change /proc/pid/exe link: only local sys admin should
- * be allowed to.
- */
- if (prctl_map->exe_fd != (u32)-1) {
- if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
- goto out;
- }
-
error = 0;
out:
return error;
@@ -1993,11 +1977,17 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
return -EFAULT;
- error = validate_prctl_map(&prctl_map);
+ error = validate_prctl_map_addr(&prctl_map);
if (error)
return error;
if (prctl_map.auxv_size) {
+ /*
+ * Someone is trying to cheat the auxv vector.
+ */
+ if (!prctl_map.auxv || prctl_map.auxv_size > sizeof(mm->saved_auxv))
+ return -EINVAL;
+
memset(user_auxv, 0, sizeof(user_auxv));
if (copy_from_user(user_auxv,
(const void __user *)prctl_map.auxv,
@@ -2010,6 +2000,14 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
}
if (prctl_map.exe_fd != (u32)-1) {
+ /*
+ * Make sure the caller has the rights to
+ * change /proc/pid/exe link: only local sys admin should
+ * be allowed to.
+ */
+ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+ return -EINVAL;
+
error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
if (error)
return error;
@@ -2097,7 +2095,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
unsigned long arg4, unsigned long arg5)
{
struct mm_struct *mm = current->mm;
- struct prctl_mm_map prctl_map;
+ struct prctl_mm_map prctl_map = { .auxv = NULL, .auxv_size = 0, .exe_fd = -1 };
struct vm_area_struct *vma;
int error;
@@ -2139,9 +2137,6 @@ static int prctl_set_mm(int opt, unsigned long addr,
prctl_map.arg_end = mm->arg_end;
prctl_map.env_start = mm->env_start;
prctl_map.env_end = mm->env_end;
- prctl_map.auxv = NULL;
- prctl_map.auxv_size = 0;
- prctl_map.exe_fd = -1;
switch (opt) {
case PR_SET_MM_START_CODE:
@@ -2181,7 +2176,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
goto out;
}
- error = validate_prctl_map(&prctl_map);
+ error = validate_prctl_map_addr(&prctl_map);
if (error)
goto out;
--
2.16.4
Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
arg_start|end and env_start|end in mm_struct") we use arg_lock for
boundaries modifications. Synchronize prctl_set_mm with this lock and
keep mmap_sem for reading only (analogous to what we already do in
prctl_set_mm_map).
v2: call find_vma without arg_lock held
CC: Cyrill Gorcunov <[email protected]>
CC: Laurent Dufour <[email protected]>
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/sys.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index e1acb444d7b0..641fda756575 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2123,9 +2123,14 @@ static int prctl_set_mm(int opt, unsigned long addr,
error = -EINVAL;
- down_write(&mm->mmap_sem);
+ /*
+ * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
+ * a) concurrent sys_brk, b) finding VMA for addr validation.
+ */
+ down_read(&mm->mmap_sem);
vma = find_vma(mm, addr);
+ spin_lock(&mm->arg_lock);
prctl_map.start_code = mm->start_code;
prctl_map.end_code = mm->end_code;
prctl_map.start_data = mm->start_data;
@@ -2213,7 +2218,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
error = 0;
out:
- up_write(&mm->mmap_sem);
+ spin_unlock(&mm->arg_lock);
+ up_read(&mm->mmap_sem);
return error;
}
--
2.16.4
On 30.04.2019 11:18, Michal Koutný wrote:
> Since commit 88aa7cc688d4 ("mm: introduce arg_lock to protect
> arg_start|end and env_start|end in mm_struct") we use arg_lock for
> boundaries modifications. Synchronize prctl_set_mm with this lock and
> keep mmap_sem for reading only (analogous to what we already do in
> prctl_set_mm_map).
>
> v2: call find_vma without arg_lock held
>
> CC: Cyrill Gorcunov <[email protected]>
> CC: Laurent Dufour <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>
> ---
> kernel/sys.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index e1acb444d7b0..641fda756575 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2123,9 +2123,14 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = -EINVAL;
>
> - down_write(&mm->mmap_sem);
> + /*
> + * arg_lock protects concurent updates of arg boundaries, we need mmap_sem for
> + * a) concurrent sys_brk, b) finding VMA for addr validation.
> + */
> + down_read(&mm->mmap_sem);
> vma = find_vma(mm, addr);
>
> + spin_lock(&mm->arg_lock);
> prctl_map.start_code = mm->start_code;
> prctl_map.end_code = mm->end_code;
> prctl_map.start_data = mm->start_data;
> @@ -2213,7 +2218,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
>
> error = 0;
> out:
> - up_write(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
> + up_read(&mm->mmap_sem);
> return error;
Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
in a separate patch?
Without them, the existing code has a problem at least in get_mm_cmdline().
On Tue, Apr 30, 2019 at 11:55:45AM +0300, Kirill Tkhai wrote:
> > - up_write(&mm->mmap_sem);
> > + spin_unlock(&mm->arg_lock);
> > + up_read(&mm->mmap_sem);
> > return error;
>
> Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
> in a separate patch?
>
> Without them, the existing code has a problem at least in get_mm_cmdline().
Seems reasonable to merge it into patch 1.
On 30.04.2019 11:18, Michal Koutný wrote:
> The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
> semaphore taken.") added synchronization of reading argument/environment
> boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
> arg_lock to protect arg_start|end and env_start|end in mm_struct")
> avoided the coarse use of mmap_sem in similar situations.
>
> get_cmdline can also use arg_lock instead of mmap_sem when it reads the
> boundaries.
>
> Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
> Cc: Yang Shi <[email protected]>
> Cc: Mateusz Guzik <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> mm/util.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index 43a2984bccaa..5cf0e84a0823 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -758,12 +758,12 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
> if (!mm->arg_end)
> goto out_mm; /* Shh! No looking before we're done */
>
> - down_read(&mm->mmap_sem);
> + spin_lock(&mm->arg_lock);
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
> env_end = mm->env_end;
> - up_read(&mm->mmap_sem);
> + spin_unlock(&mm->arg_lock);
>
> len = arg_end - arg_start;
This looks OK for me.
But speaking about existing code it's a secret for me, why we ignore arg_lock
in binfmt code, e.g. in load_elf_binary().
On 30.04.2019 12:08, Cyrill Gorcunov wrote:
> On Tue, Apr 30, 2019 at 11:55:45AM +0300, Kirill Tkhai wrote:
>>> - up_write(&mm->mmap_sem);
>>> + spin_unlock(&mm->arg_lock);
>>> + up_read(&mm->mmap_sem);
>>> return error;
>>
>> Hm, shouldn't spin_lock()/spin_unlock() pair go as a fixup to existing code
>> in a separate patch?
>>
>> Without them, the existing code has a problem at least in get_mm_cmdline().
>
> Seems reasonable to merge it into patch 1.
Sounds sensibly.
On 30.04.2019 11:18, Michal Koutný wrote:
> Despite comment of validate_prctl_map claims there are no capability
> checks, it is not completely true since commit 4d28df6152aa ("prctl:
> Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of
> the function and make the function perform purely arithmetic checks.
>
> This patch should not change any behavior, it is mere refactoring for
> following patch.
>
> CC: Kirill Tkhai <[email protected]>
> CC: Cyrill Gorcunov <[email protected]>
> Signed-off-by: Michal Koutný <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
> ---
> kernel/sys.c | 45 ++++++++++++++++++++-------------------------
> 1 file changed, 20 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..e1acb444d7b0 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1882,10 +1882,12 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> }
>
> /*
> + * Check arithmetic relations of passed addresses.
> + *
> * WARNING: we don't require any capability here so be very careful
> * in what is allowed for modification from userspace.
> */
> -static int validate_prctl_map(struct prctl_mm_map *prctl_map)
> +static int validate_prctl_map_addr(struct prctl_mm_map *prctl_map)
> {
> unsigned long mmap_max_addr = TASK_SIZE;
> struct mm_struct *mm = current->mm;
> @@ -1949,24 +1951,6 @@ static int validate_prctl_map(struct prctl_mm_map *prctl_map)
> prctl_map->start_data))
> goto out;
>
> - /*
> - * Someone is trying to cheat the auxv vector.
> - */
> - if (prctl_map->auxv_size) {
> - if (!prctl_map->auxv || prctl_map->auxv_size > sizeof(mm->saved_auxv))
> - goto out;
> - }
> -
> - /*
> - * Finally, make sure the caller has the rights to
> - * change /proc/pid/exe link: only local sys admin should
> - * be allowed to.
> - */
> - if (prctl_map->exe_fd != (u32)-1) {
> - if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> - goto out;
> - }
> -
> error = 0;
> out:
> return error;
> @@ -1993,11 +1977,17 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> return -EFAULT;
>
> - error = validate_prctl_map(&prctl_map);
> + error = validate_prctl_map_addr(&prctl_map);
> if (error)
> return error;
>
> if (prctl_map.auxv_size) {
> + /*
> + * Someone is trying to cheat the auxv vector.
> + */
> + if (!prctl_map.auxv || prctl_map.auxv_size > sizeof(mm->saved_auxv))
> + return -EINVAL;
> +
> memset(user_auxv, 0, sizeof(user_auxv));
> if (copy_from_user(user_auxv,
> (const void __user *)prctl_map.auxv,
> @@ -2010,6 +2000,14 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> }
>
> if (prctl_map.exe_fd != (u32)-1) {
> + /*
> + * Make sure the caller has the rights to
> + * change /proc/pid/exe link: only local sys admin should
> + * be allowed to.
> + */
> + if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> + return -EINVAL;
> +
> error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
> if (error)
> return error;
> @@ -2097,7 +2095,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> unsigned long arg4, unsigned long arg5)
> {
> struct mm_struct *mm = current->mm;
> - struct prctl_mm_map prctl_map;
> + struct prctl_mm_map prctl_map = { .auxv = NULL, .auxv_size = 0, .exe_fd = -1 };
> struct vm_area_struct *vma;
> int error;
>
> @@ -2139,9 +2137,6 @@ static int prctl_set_mm(int opt, unsigned long addr,
> prctl_map.arg_end = mm->arg_end;
> prctl_map.env_start = mm->env_start;
> prctl_map.env_end = mm->env_end;
> - prctl_map.auxv = NULL;
> - prctl_map.auxv_size = 0;
> - prctl_map.exe_fd = -1;
>
> switch (opt) {
> case PR_SET_MM_START_CODE:
> @@ -2181,7 +2176,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> goto out;
> }
>
> - error = validate_prctl_map(&prctl_map);
> + error = validate_prctl_map_addr(&prctl_map);
> if (error)
> goto out;
>
>
On Tue, Apr 30, 2019 at 12:09:57PM +0300, Kirill Tkhai wrote:
>
> This looks OK for me.
>
> But speaking about existing code it's a secret for me, why we ignore arg_lock
> in binfmt code, e.g. in load_elf_binary().
Well, strictly speaking we probably should but you know setup of
the @arg_start by kernel's elf loader doesn't cause any side
effects as far as I can tell (its been working this lockless
way for years, mmap_sem is taken later in the loader code).
Though for consistency sake we probably should set it up
under the spinlock.
On 30.04.2019 12:38, Cyrill Gorcunov wrote:
> On Tue, Apr 30, 2019 at 12:09:57PM +0300, Kirill Tkhai wrote:
>>
>> This looks OK for me.
>>
>> But speaking about existing code it's a secret for me, why we ignore arg_lock
>> in binfmt code, e.g. in load_elf_binary().
>
> Well, strictly speaking we probably should but you know setup of
> the @arg_start by kernel's elf loader doesn't cause any side
> effects as far as I can tell (its been working this lockless
> way for years, mmap_sem is taken later in the loader code).
> Though for consistency sake we probably should set it up
> under the spinlock.
Ok, so elf loader doesn't change these parameters. Thanks for the explanation.
On Tue, Apr 30, 2019 at 12:53:51PM +0300, Kirill Tkhai wrote:
> >
> > Well, strictly speaking we probably should but you know setup of
> > the @arg_start by kernel's elf loader doesn't cause any side
> > effects as far as I can tell (its been working this lockless
> > way for years, mmap_sem is taken later in the loader code).
> > Though for consistency sake we probably should set it up
> > under the spinlock.
>
> Ok, so elf loader doesn't change these parameters.
> Thanks for the explanation.
It setups these parameters unconditionally. I need to revisit
this moment. Technically (if only I'm not missing something
obvious) we might have a race here with prctl setting up new
params, but this should be harmless since most of them (except
stack setup) are purely informative data.
On Tue, Apr 30, 2019 at 01:45:17PM +0300, Cyrill Gorcunov <[email protected]> wrote:
> It setups these parameters unconditionally. I need to revisit
> this moment. Technically (if only I'm not missing something
> obvious) we might have a race here with prctl setting up new
> params, but this should be harmless since most of them (except
> stack setup) are purely informative data.
FTR, when I reviewed that usage, I noticed it was missing the
synchronization. My understanding was that the mm_struct isn't yet
shared at this moment. I can see some of the operations take place after
flush_old_exec (where current->mm = mm_struct), so potentially it is
shared since then. OTOH, I guess there aren't concurrent parties that
could access the field at this stage of exec.
My 2 cents,
Michal
On Tue, Apr 30, 2019 at 12:56:10PM +0200, Michal Koutn? wrote:
> On Tue, Apr 30, 2019 at 01:45:17PM +0300, Cyrill Gorcunov <[email protected]> wrote:
> > It setups these parameters unconditionally. I need to revisit
> > this moment. Technically (if only I'm not missing something
> > obvious) we might have a race here with prctl setting up new
> > params, but this should be harmless since most of them (except
> > stack setup) are purely informative data.
>
> FTR, when I reviewed that usage, I noticed it was missing the
> synchronization. My understanding was that the mm_struct isn't yet
> shared at this moment. I can see some of the operations take place after
> flush_old_exec (where current->mm = mm_struct), so potentially it is
> shared since then. OTOH, I guess there aren't concurrent parties that
> could access the field at this stage of exec.
Just revisited this code -- we're either executing prctl, either execve.
Since both operates with current task we're safe.
Hello.
Just merged the two dicussed patches and fixed an overlooked warning.
v2
- insert a patch refactoring validate_prctl_map
- move find_vma out of the arg_lock critical section
v3
- squash get_cmdline and prctl_set_mm patches (to keep locking
consistency)
- validate_prctl_map_addr: remove unused variable mm
Michal Koutný (2):
prctl_set_mm: Refactor checks from validate_prctl_map
prctl_set_mm: downgrade mmap_sem to read lock
kernel/sys.c | 56 ++++++++++++++++++++++++++++----------------------------
mm/util.c | 4 ++--
2 files changed, 30 insertions(+), 30 deletions(-)
--
2.16.4