process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the
two processes (in one direction). Granting ptrace attach capability
even to a system process is considered dangerous since it creates an
attack surface. This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data
is physically located (and therefore, how fast it can be accessed).
What we want is the ability for one process to influence another process
in order to optimize performance across the entire system while leaving
the security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
and CAP_SYS_NICE for influencing process performance.
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/madvise.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 6a660858784b..c2d600386902 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
goto release_task;
}
- mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+ /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+ mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
if (IS_ERR_OR_NULL(mm)) {
ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
goto release_task;
}
+ /*
+ * Require CAP_SYS_NICE for influencing process performance. Note that
+ * only non-destructive hints are currently supported.
+ */
+ if (!capable(CAP_SYS_NICE)) {
+ ret = -EPERM;
+ goto release_task;
+ }
+
total_len = iov_iter_count(&iter);
while (iov_iter_count(&iter)) {
--
2.30.0.284.gd98b1dd5eaa7-goog
On Fri, Jan 08, 2021 at 12:58:57PM -0800, Suren Baghdasaryan wrote:
> process_madvise currently requires ptrace attach capability.
> PTRACE_MODE_ATTACH gives one process complete control over another
> process. It effectively removes the security boundary between the
> two processes (in one direction). Granting ptrace attach capability
> even to a system process is considered dangerous since it creates an
> attack surface. This severely limits the usage of this API.
> The operations process_madvise can perform do not affect the correctness
> of the operation of the target process; they only affect where the data
> is physically located (and therefore, how fast it can be accessed).
> What we want is the ability for one process to influence another process
> in order to optimize performance across the entire system while leaving
> the security boundary intact.
> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> and CAP_SYS_NICE for influencing process performance.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
It sounds logical to me.
If security folks don't see any concern and fix below,
Acked-by: Minchan Kim <[email protected]>
> @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> goto release_task;
> }
>
> - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> if (IS_ERR_OR_NULL(mm)) {
> ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> goto release_task;
> }
>
> + /*
> + * Require CAP_SYS_NICE for influencing process performance. Note that
> + * only non-destructive hints are currently supported.
> + */
> + if (!capable(CAP_SYS_NICE)) {
> + ret = -EPERM;
> + goto release_task;
mmput?
> + }
> +
> total_len = iov_iter_count(&iter);
>
> while (iov_iter_count(&iter)) {
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
On Fri, Jan 8, 2021 at 2:15 PM Minchan Kim <[email protected]> wrote:
>
> On Fri, Jan 08, 2021 at 12:58:57PM -0800, Suren Baghdasaryan wrote:
> > process_madvise currently requires ptrace attach capability.
> > PTRACE_MODE_ATTACH gives one process complete control over another
> > process. It effectively removes the security boundary between the
> > two processes (in one direction). Granting ptrace attach capability
> > even to a system process is considered dangerous since it creates an
> > attack surface. This severely limits the usage of this API.
> > The operations process_madvise can perform do not affect the correctness
> > of the operation of the target process; they only affect where the data
> > is physically located (and therefore, how fast it can be accessed).
> > What we want is the ability for one process to influence another process
> > in order to optimize performance across the entire system while leaving
> > the security boundary intact.
> > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> > and CAP_SYS_NICE for influencing process performance.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> It sounds logical to me.
> If security folks don't see any concern and fix below,
>
> Acked-by: Minchan Kim <[email protected]>
>
> > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > goto release_task;
> > }
> >
> > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > if (IS_ERR_OR_NULL(mm)) {
> > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > goto release_task;
> > }
> >
> > + /*
> > + * Require CAP_SYS_NICE for influencing process performance. Note that
> > + * only non-destructive hints are currently supported.
> > + */
> > + if (!capable(CAP_SYS_NICE)) {
> > + ret = -EPERM;
> > + goto release_task;
>
> mmput?
Ouch! Thanks for pointing it out! Will include in the next respin.
>
> > + }
> > +
> > total_len = iov_iter_count(&iter);
> >
> > while (iov_iter_count(&iter)) {
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Fri, 8 Jan 2021, Suren Baghdasaryan wrote:
> > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > > goto release_task;
> > > }
> > >
> > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > > if (IS_ERR_OR_NULL(mm)) {
> > > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > > goto release_task;
> > > }
> > >
> > > + /*
> > > + * Require CAP_SYS_NICE for influencing process performance. Note that
> > > + * only non-destructive hints are currently supported.
> > > + */
> > > + if (!capable(CAP_SYS_NICE)) {
> > > + ret = -EPERM;
> > > + goto release_task;
> >
> > mmput?
>
> Ouch! Thanks for pointing it out! Will include in the next respin.
>
With the fix, feel free to add:
Acked-by: David Rientjes <[email protected]>
Thanks Suren!
On Fri, Jan 8, 2021 at 5:02 PM David Rientjes <[email protected]> wrote:
>
> On Fri, 8 Jan 2021, Suren Baghdasaryan wrote:
>
> > > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > > > goto release_task;
> > > > }
> > > >
> > > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > > > if (IS_ERR_OR_NULL(mm)) {
> > > > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > > > goto release_task;
> > > > }
> > > >
> > > > + /*
> > > > + * Require CAP_SYS_NICE for influencing process performance. Note that
> > > > + * only non-destructive hints are currently supported.
> > > > + */
> > > > + if (!capable(CAP_SYS_NICE)) {
> > > > + ret = -EPERM;
> > > > + goto release_task;
> > >
> > > mmput?
> >
> > Ouch! Thanks for pointing it out! Will include in the next respin.
> >
>
> With the fix, feel free to add:
>
> Acked-by: David Rientjes <[email protected]>
Thanks! Will post a new version with the fix on Monday.
>
> Thanks Suren!
* Suren Baghdasaryan:
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6a660858784b..c2d600386902 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> goto release_task;
> }
>
> - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> if (IS_ERR_OR_NULL(mm)) {
> ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> goto release_task;
> }
Shouldn't this depend on the requested behavior? Several operations
directly result in observable changes, and go beyond performance tuning.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On Mon, Jan 11, 2021 at 2:20 AM Florian Weimer <[email protected]> wrote:
>
> * Suren Baghdasaryan:
>
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6a660858784b..c2d600386902 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > goto release_task;
> > }
> >
> > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > if (IS_ERR_OR_NULL(mm)) {
> > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > goto release_task;
> > }
>
> Shouldn't this depend on the requested behavior? Several operations
> directly result in observable changes, and go beyond performance tuning.
Thanks for the comment Florian.
process_madvise supports only MADV_COLD and MADV_PAGEOUT hints which
are both non-destructive (see process_madvise_behavior_valid()
function). Maybe you meant something else by "observable changes", if
so please clarify.
Thanks,
Suren.
>
> Thanks,
> Florian
> --
> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Mon, Jan 11, 2021 at 9:05 AM Suren Baghdasaryan <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 2:20 AM Florian Weimer <[email protected]> wrote:
> >
> > * Suren Baghdasaryan:
> >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 6a660858784b..c2d600386902 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > > goto release_task;
> > > }
> > >
> > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > > if (IS_ERR_OR_NULL(mm)) {
> > > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > > goto release_task;
> > > }
> >
> > Shouldn't this depend on the requested behavior? Several operations
> > directly result in observable changes, and go beyond performance tuning.
>
> Thanks for the comment Florian.
> process_madvise supports only MADV_COLD and MADV_PAGEOUT hints which
> are both non-destructive (see process_madvise_behavior_valid()
> function). Maybe you meant something else by "observable changes", if
> so please clarify.
> Thanks,
> Suren.
>
V2 with Minchan's fix is posted at:
https://lore.kernel.org/lkml/[email protected]/T/#u
> >
> > Thanks,
> > Florian
> > --
> > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
> > Commercial register: Amtsgericht Muenchen, HRB 153243,
> > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> >