Keno Fischer reported that when a binray loaded via
ld-linux-x the prctl(PR_SET_MM_MAP) doesn't allow to
setup brk value because it lays before mm:end_data.
For example a test program shows
| # ~/t
|
| start_code 401000
| end_code 401a15
| start_stack 7ffce4577dd0
| start_data 403e10
| end_data 40408c
| start_brk b5b000
| sbrk(0) b5b000
and when executed via ld-linux
| # /lib64/ld-linux-x86-64.so.2 ~/t
|
| start_code 7fc25b0a4000
| end_code 7fc25b0c4524
| start_stack 7fffcc6b2400
| start_data 7fc25b0ce4c0
| end_data 7fc25b0cff98
| start_brk 55555710c000
| sbrk(0) 55555710c000
This of course prevent criu from restoring such programs.
Looking into how kernel operates with brk/start_brk inside
brk() syscall I don't see any problem if we allow to setup
brk/start_brk without checking for end_data. Even if someone
pass some weird address here on a purpose then the worst
possible result will be an unexpected unmapping of existing
vma (own vma, since prctl works with the callers memory) but
test for RLIMIT_DATA is still valid and a user won't be able
to gain more memory in case of expanding VMAs via new values
shipped with prctl call.
Reported-by: Keno Fischer <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Dmitry Safonov <[email protected]>
CC: Andrey Vagin <[email protected]>
CC: Kirill Tkhai <[email protected]>
CC: Eric W. Biederman <[email protected]>
---
Guys, take a look please once time permit. Hopefully I didn't
miss something 'cause made this patch via code reading only.
Andrey, do we still have a criu container which tests new kernels,
right? Would be great to run criu tests with this patch applied
to make sure everything is intact.
kernel/sys.c | 7 -------
1 file changed, 7 deletions(-)
Index: linux-tip.git/kernel/sys.c
===================================================================
--- linux-tip.git.orig/kernel/sys.c
+++ linux-tip.git/kernel/sys.c
@@ -1943,13 +1943,6 @@ static int validate_prctl_map_addr(struc
error = -EINVAL;
/*
- * @brk should be after @end_data in traditional maps.
- */
- if (prctl_map->start_brk <= prctl_map->end_data ||
- prctl_map->brk <= prctl_map->end_data)
- goto out;
-
- /*
* Neither we should allow to override limits if they set.
*/
if (check_data_rlimit(rlimit(RLIMIT_DATA), prctl_map->brk,
On Thu, Jan 21, 2021 at 2:12 PM Cyrill Gorcunov <[email protected]> wrote:
>
> Keno Fischer reported that when a binray loaded via
> ld-linux-x the prctl(PR_SET_MM_MAP) doesn't allow to
> setup brk value because it lays before mm:end_data.
>
> For example a test program shows
>
> | # ~/t
> |
> | start_code 401000
> | end_code 401a15
> | start_stack 7ffce4577dd0
> | start_data 403e10
> | end_data 40408c
> | start_brk b5b000
> | sbrk(0) b5b000
>
> and when executed via ld-linux
>
> | # /lib64/ld-linux-x86-64.so.2 ~/t
> |
> | start_code 7fc25b0a4000
> | end_code 7fc25b0c4524
> | start_stack 7fffcc6b2400
> | start_data 7fc25b0ce4c0
> | end_data 7fc25b0cff98
> | start_brk 55555710c000
> | sbrk(0) 55555710c000
>
> This of course prevent criu from restoring such programs.
> Looking into how kernel operates with brk/start_brk inside
> brk() syscall I don't see any problem if we allow to setup
> brk/start_brk without checking for end_data. Even if someone
> pass some weird address here on a purpose then the worst
> possible result will be an unexpected unmapping of existing
> vma (own vma, since prctl works with the callers memory) but
> test for RLIMIT_DATA is still valid and a user won't be able
> to gain more memory in case of expanding VMAs via new values
> shipped with prctl call.
>
> Reported-by: Keno Fischer <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: Dmitry Safonov <[email protected]>
> CC: Andrey Vagin <[email protected]>
Acked-by: Andrey Vagin <[email protected]>
Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing
direct loader exec")
> CC: Kirill Tkhai <[email protected]>
> CC: Eric W. Biederman <[email protected]>
> ---
> Guys, take a look please once time permit. Hopefully I didn't
> miss something 'cause made this patch via code reading only.
>
> Andrey, do we still have a criu container which tests new kernels,
> right? Would be great to run criu tests with this patch applied
> to make sure everything is intact.
Sorry for the delay. I run tests and everything works as expected.
Thanks,
Andrei
On Tue, Jul 20, 2021 at 12:33:11AM -0700, Andrei Vagin wrote:
> >
> > Reported-by: Keno Fischer <[email protected]>
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
> > CC: Andrew Morton <[email protected]>
> > CC: Dmitry Safonov <[email protected]>
> > CC: Andrey Vagin <[email protected]>
>
> Acked-by: Andrey Vagin <[email protected]>
> Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing
> direct loader exec")
Thanks for review, Andrew! I reviseted this patch recently again and
indeed we still need it.
On Fri, 22 Jan 2021 01:12:07 +0300 Cyrill Gorcunov <[email protected]> wrote:
> Keno Fischer reported that when a binray loaded via
> ld-linux-x the prctl(PR_SET_MM_MAP) doesn't allow to
> setup brk value because it lays before mm:end_data.
>
> For example a test program shows
>
> | # ~/t
> |
> | start_code 401000
> | end_code 401a15
> | start_stack 7ffce4577dd0
> | start_data 403e10
> | end_data 40408c
> | start_brk b5b000
> | sbrk(0) b5b000
>
> and when executed via ld-linux
>
> | # /lib64/ld-linux-x86-64.so.2 ~/t
> |
> | start_code 7fc25b0a4000
> | end_code 7fc25b0c4524
> | start_stack 7fffcc6b2400
> | start_data 7fc25b0ce4c0
> | end_data 7fc25b0cff98
> | start_brk 55555710c000
> | sbrk(0) 55555710c000
>
> This of course prevent criu from restoring such programs.
> Looking into how kernel operates with brk/start_brk inside
> brk() syscall I don't see any problem if we allow to setup
> brk/start_brk without checking for end_data. Even if someone
> pass some weird address here on a purpose then the worst
> possible result will be an unexpected unmapping of existing
> vma (own vma, since prctl works with the callers memory) but
> test for RLIMIT_DATA is still valid and a user won't be able
> to gain more memory in case of expanding VMAs via new values
> shipped with prctl call.
So... do you recall why you added that test originally?
This is under prctl(CAP_SET_MM), yes? What capabilities does this
require?
On Tue, Jul 20, 2021 at 02:51:48PM -0700, Andrew Morton wrote:
> >
> > This of course prevent criu from restoring such programs.
> > Looking into how kernel operates with brk/start_brk inside
> > brk() syscall I don't see any problem if we allow to setup
> > brk/start_brk without checking for end_data. Even if someone
> > pass some weird address here on a purpose then the worst
> > possible result will be an unexpected unmapping of existing
> > vma (own vma, since prctl works with the callers memory) but
> > test for RLIMIT_DATA is still valid and a user won't be able
> > to gain more memory in case of expanding VMAs via new values
> > shipped with prctl call.
>
> So... do you recall why you added that test originally?
To be honest, when I added this test in first place I simply forgot
about et_dyn executables because we usually run executables via
traditional exec call (where brk map sits before end_data VMA),
not via loader and that's the reason why I didn't hit this problem
before and why this get revealed only after a couple of years.
This is simply rarely used.
>
> This is under prctl(CAP_SET_MM), yes? What capabilities does this
> require?
Yes, it is for prctl(PR_SET_MM_MAP) and requires no additional
caps. The most important thing here is check_data_rlimit() function
which called at the end of memory map verification -- we make sure
the user won't get more memory than been granted by RLIMIT_DATA limit
even if he passes some bad brk value here on a purpose.
/*
* Neither we should allow to override limits if they set.
*/
if (check_data_rlimit(rlimit(RLIMIT_DATA), prctl_map->brk,
prctl_map->start_brk, prctl_map->end_data,
prctl_map->start_data))
goto out;
which expands to (I wrapped code to make it a bit more readable)
static inline int check_data_rlimit(unsigned long rlim,
unsigned long new,
unsigned long start,
unsigned long end_data,
unsigned long start_data)
{
if (rlimit(RLIMIT_DATA) < RLIM_INFINITY) {
if (((prctl_map->brk - prctl_map->start_brk) +
(prctl_map->end_data - prctl_map->start_data)) > rlimit(RLIMIT_DATA))
return -ENOSPC;
}
return 0;
}