Hello.
As was agreed[1][2], clone3 should fail if the provided exit_signal
value fails valid_signal() check, hence the new version.
Changees since v2[3][4]:
- Rewrite the check to check exit_signal against valid_signal().
Changes since v1[5]:
- Check changed to comparison against negated CSIGNAL to address
the bug reported by Oleg[6].
- Added a comment to _do_fork that exit_signal has to be checked
by the caller.
[1] https://lkml.org/lkml/2019/9/11/503
[2] https://lkml.org/lkml/2019/9/11/518
[3] https://lkml.org/lkml/2019/9/10/764
[4] https://lkml.org/lkml/2019/9/10/765
[5] https://lkml.org/lkml/2019/9/10/411
[6] https://lkml.org/lkml/2019/9/10/467
Eugene Syromiatnikov (1):
fork: check exit_signal passed in clone3() call
kernel/fork.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--
2.1.4
Previously, higher 32 bits of exit_signal fields were lost when
copied to the kernel args structure (that uses int as a type for the
respective field). Moreover, as Oleg has noted[1], exit_signal is used
unchecked, so it has to be checked for sanity before use; for the legacy
syscalls, applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that can
break at least thread_group_leader.
Adding checks that user-passed exit_signal fits into int and passes
valid_signal() check solves both of these problems.
[1] https://lkml.org/lkml/2019/9/10/467
* kernel/fork.c (copy_clone_args_from_user): Fail with -EINVAL if
args.exit_signal is greater than UINT_MAX or is not a valid signal.
(_do_fork): Note that exit_signal is expected to be checked for the
sanity by the caller.
Fixes: 7f192e3cd316 ("fork: add clone3")
Reported-by: Oleg Nesterov <[email protected]>
Co-authored-by: Oleg Nesterov <[email protected]>
Co-authored-by: Dmitry V. Levin <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
---
kernel/fork.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index 2852d0e..f98314b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2338,6 +2338,8 @@ struct mm_struct *copy_init_mm(void)
*
* It copies the process, and if successful kick-starts
* it and waits for it to finish using the VM if required.
+ *
+ * args->exit_signal is expected to be checked for sanity by the caller.
*/
long _do_fork(struct kernel_clone_args *args)
{
@@ -2562,6 +2564,15 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
if (copy_from_user(&args, uargs, size))
return -EFAULT;
+ /*
+ * Two separate checks are needed, as valid_signal() takes unsigned long
+ * as an argument, and struct kernel_clone_args uses int type
+ * for the exit_signal field.
+ */
+ if (unlikely((args.exit_signal > UINT_MAX) ||
+ !valid_signal(args.exit_signal)))
+ return -EINVAL;
+
*kargs = (struct kernel_clone_args){
.flags = args.flags,
.pidfd = u64_to_user_ptr(args.pidfd),
--
2.1.4
On 09/11, Eugene Syromiatnikov wrote:
>
> @@ -2562,6 +2564,15 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> if (copy_from_user(&args, uargs, size))
> return -EFAULT;
>
> + /*
> + * Two separate checks are needed, as valid_signal() takes unsigned long
> + * as an argument, and struct kernel_clone_args uses int type
> + * for the exit_signal field.
> + */
> + if (unlikely((args.exit_signal > UINT_MAX) ||
> + !valid_signal(args.exit_signal)))
> + return -EINVAL;
OK, I equally agree with this version. Although I'd simply do
if (args.exit_signal > _NSIG)
return -EINVAL;
but this is cosmetic.
Acked-by: Oleg Nesterov <[email protected]>
On Wed, Sep 11, 2019 at 06:45:40PM +0100, Eugene Syromiatnikov wrote:
> Previously, higher 32 bits of exit_signal fields were lost when
> copied to the kernel args structure (that uses int as a type for the
> respective field). Moreover, as Oleg has noted[1], exit_signal is used
> unchecked, so it has to be checked for sanity before use; for the legacy
> syscalls, applying CSIGNAL mask guarantees that it is at least non-negative;
> however, there's no such thing is done in clone3() code path, and that can
> break at least thread_group_leader.
>
> Adding checks that user-passed exit_signal fits into int and passes
> valid_signal() check solves both of these problems.
>
> [1] https://lkml.org/lkml/2019/9/10/467
>
> * kernel/fork.c (copy_clone_args_from_user): Fail with -EINVAL if
> args.exit_signal is greater than UINT_MAX or is not a valid signal.
> (_do_fork): Note that exit_signal is expected to be checked for the
> sanity by the caller.
>
> Fixes: 7f192e3cd316 ("fork: add clone3")
> Reported-by: Oleg Nesterov <[email protected]>
> Co-authored-by: Oleg Nesterov <[email protected]>
> Co-authored-by: Dmitry V. Levin <[email protected]>
> Signed-off-by: Eugene Syromiatnikov <[email protected]>
For the sake of posterity I appended a reproducer to the patch (cf. [1])
that I pushed to mainline which illustrates how without this patch you
can cause a crash. I'll also explain how I think this happens here.
By passing in a negative signal you can cause a segfault in
proc_flush_task(). The reason is that at process creation time
static inline bool thread_group_leader(struct task_struct *p)
{
return p->exit_signal >= 0;
}
will return false even though it should return true because exit_signal
overflowed in copy_clone_args_from_user. This means, the kernel
thinks this is a thread and doesn't make the process a thread-group
leader. In proc_flush_task() the kernel will then try to retrieve the
thread group leader but there'll be none ultimately causing a segfault.
Specifically:
void proc_flush_task(struct task_struct *task)
{
int i;
struct pid *pid, *tgid;
struct upid *upid;
pid = task_pid(task);
tgid = task_tgid(task); #### tgid is NULL ####
for (i = 0; i <= pid->level; i++) {
upid = &pid->numbers[i];
proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
tgid->numbers[i].nr); #### NULL pointer deref ####
}
}
[1]:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
pid_t pid = -1;
struct clone_args args = {0};
args.exit_signal = -1;
pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
if (pid < 0)
exit(EXIT_FAILURE);
if (pid == 0)
exit(EXIT_SUCCESS);
wait(NULL);
exit(EXIT_SUCCESS);
}
Christian