2022-09-13 10:41:03

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

This reverts commits:
133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec")
6342140db660 ("selftests/timens: add a test for vfork+exit")

Alexey pointed out a few undesirable side effects of the reverted change.
First, it doesn't take into account that CLONE_VFORK can be used with
CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
if its parent dies before the child calls exec. It happens because the parent
clears vfork_done.

Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
It includes all new processes cloned without CLONE_VM and all tasks that call
exec(). This is an user API change, but we think there aren't users that depend
on the old behavior.

It is too late to make such changes in this release, so let's roll back
this patch and introduce the right one in the next release.

Andrei Vagin (2):
Revert "selftests/timens: add a test for vfork+exit"
Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

fs/exec.c | 7 --
kernel/fork.c | 5 +-
kernel/nsproxy.c | 3 +-
tools/testing/selftests/timens/Makefile | 2 +-
tools/testing/selftests/timens/vfork_exec.c | 90 ---------------------
5 files changed, 3 insertions(+), 104 deletions(-)
delete mode 100644 tools/testing/selftests/timens/vfork_exec.c

--
2.37.2.789.g6183377224-goog


2022-09-13 10:51:49

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

From: Andrei Vagin <[email protected]>

This reverts commit 133e2d3e81de5d9706cab2dd1d52d231c27382e5.

Alexey pointed out a few undesirable side effects of the reverted change.
First, it doesn't take into account that CLONE_VFORK can be used with
CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
if its parent dies before the child calls exec. It happens because the parent
clears vfork_done.

Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
It includes all new processes cloned without CLONE_VM and all tasks that call
exec(). This is an user API change, but we think there aren't users that depend
on the old behavior.

It is too late to make such changes in this release, so let's roll back
this patch and introduce the right one in the next release.

Cc: Alexey Izbyshev <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
fs/exec.c | 7 -------
kernel/fork.c | 5 +----
kernel/nsproxy.c | 3 +--
3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 9a5ca7b82bfc..d046dbb9cbd0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -65,7 +65,6 @@
#include <linux/io_uring.h>
#include <linux/syscall_user_dispatch.h>
#include <linux/coredump.h>
-#include <linux/time_namespace.h>

#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -979,12 +978,10 @@ static int exec_mmap(struct mm_struct *mm)
{
struct task_struct *tsk;
struct mm_struct *old_mm, *active_mm;
- bool vfork;
int ret;

/* Notify parent that we're no longer interested in the old VM */
tsk = current;
- vfork = !!tsk->vfork_done;
old_mm = current->mm;
exec_mm_release(tsk, old_mm);
if (old_mm)
@@ -1029,10 +1026,6 @@ static int exec_mmap(struct mm_struct *mm)
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
-
- if (vfork)
- timens_on_fork(tsk->nsproxy, tsk);
-
if (old_mm) {
mmap_read_unlock(old_mm);
BUG_ON(active_mm != old_mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index 8a9e92068b15..2b6bd511c6ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2047,11 +2047,8 @@ static __latent_entropy struct task_struct *copy_process(
/*
* If the new process will be in a different time namespace
* do not allow it to share VM or a thread group with the forking task.
- *
- * On vfork, the child process enters the target time namespace only
- * after exec.
*/
- if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) {
+ if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
if (nsp->time_ns != nsp->time_ns_for_children)
return ERR_PTR(-EINVAL);
}
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b4cbb406bc28..eec72ca962e2 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -179,8 +179,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
if (IS_ERR(new_ns))
return PTR_ERR(new_ns);

- if ((flags & CLONE_VM) == 0)
- timens_on_fork(new_ns, tsk);
+ timens_on_fork(new_ns, tsk);

tsk->nsproxy = new_ns;
return 0;
--
2.37.2.789.g6183377224-goog

2022-09-13 10:59:19

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit"

The next patch reverts the code that this test verified.

This reverts commit 6342140db6609a0c7d34f68c52b2947468e0e630.

Signed-off-by: Andrei Vagin <[email protected]>
---
tools/testing/selftests/timens/Makefile | 2 +-
tools/testing/selftests/timens/vfork_exec.c | 90 ---------------------
2 files changed, 1 insertion(+), 91 deletions(-)
delete mode 100644 tools/testing/selftests/timens/vfork_exec.c

diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index f0d51d4d2c87..3a5936cc10ab 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex
TEST_GEN_PROGS_EXTENDED := gettime_perf

CFLAGS := -Wall -Werror -pthread
diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c
deleted file mode 100644
index e6ccd900f30a..000000000000
--- a/tools/testing/selftests/timens/vfork_exec.c
+++ /dev/null
@@ -1,90 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#define _GNU_SOURCE
-#include <errno.h>
-#include <fcntl.h>
-#include <sched.h>
-#include <stdio.h>
-#include <stdbool.h>
-#include <sys/stat.h>
-#include <sys/syscall.h>
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <time.h>
-#include <unistd.h>
-#include <string.h>
-
-#include "log.h"
-#include "timens.h"
-
-#define OFFSET (36000)
-
-int main(int argc, char *argv[])
-{
- struct timespec now, tst;
- int status, i;
- pid_t pid;
-
- if (argc > 1) {
- if (sscanf(argv[1], "%ld", &now.tv_sec) != 1)
- return pr_perror("sscanf");
-
- for (i = 0; i < 2; i++) {
- _gettime(CLOCK_MONOTONIC, &tst, i);
- if (abs(tst.tv_sec - now.tv_sec) > 5)
- return pr_fail("%ld %ld\n", now.tv_sec, tst.tv_sec);
- }
- return 0;
- }
-
- nscheck();
-
- ksft_set_plan(1);
-
- clock_gettime(CLOCK_MONOTONIC, &now);
-
- if (unshare_timens())
- return 1;
-
- if (_settime(CLOCK_MONOTONIC, OFFSET))
- return 1;
-
- for (i = 0; i < 2; i++) {
- _gettime(CLOCK_MONOTONIC, &tst, i);
- if (abs(tst.tv_sec - now.tv_sec) > 5)
- return pr_fail("%ld %ld\n",
- now.tv_sec, tst.tv_sec);
- }
-
- pid = vfork();
- if (pid < 0)
- return pr_perror("fork");
-
- if (pid == 0) {
- char now_str[64];
- char *cargv[] = {"exec", now_str, NULL};
- char *cenv[] = {NULL};
-
- // Check that we are still in the source timens.
- for (i = 0; i < 2; i++) {
- _gettime(CLOCK_MONOTONIC, &tst, i);
- if (abs(tst.tv_sec - now.tv_sec) > 5)
- return pr_fail("%ld %ld\n",
- now.tv_sec, tst.tv_sec);
- }
-
- /* Check for proper vvar offsets after execve. */
- snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET);
- execve("/proc/self/exe", cargv, cenv);
- return pr_perror("execve");
- }
-
- if (waitpid(pid, &status, 0) != pid)
- return pr_perror("waitpid");
-
- if (status)
- ksft_exit_fail();
-
- ksft_test_result_pass("exec\n");
- ksft_exit_pass();
- return 0;
-}
--
2.37.2.789.g6183377224-goog

2022-09-13 12:19:02

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

On Tue, Sep 13, 2022 at 03:25:49AM -0700, Andrei Vagin wrote:
> This reverts commits:
> 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec")
> 6342140db660 ("selftests/timens: add a test for vfork+exit")
>
> Alexey pointed out a few undesirable side effects of the reverted change.
> First, it doesn't take into account that CLONE_VFORK can be used with
> CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
> if its parent dies before the child calls exec. It happens because the parent
> clears vfork_done.
>
> Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
> It includes all new processes cloned without CLONE_VM and all tasks that call
> exec(). This is an user API change, but we think there aren't users that depend
> on the old behavior.

Can we include that patch here as well?

> It is too late to make such changes in this release, so let's roll back
> this patch and introduce the right one in the next release.

Do you mean you'd like this revert to land for v6.0, and we should wait
for the new API for later?

--
Kees Cook

2022-09-13 17:56:40

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

On Tue, Sep 13, 2022 at 5:18 AM Kees Cook <[email protected]> wrote:
>
> On Tue, Sep 13, 2022 at 03:25:49AM -0700, Andrei Vagin wrote:
> > This reverts commits:
> > 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec")
> > 6342140db660 ("selftests/timens: add a test for vfork+exit")
> >
> > Alexey pointed out a few undesirable side effects of the reverted change.
> > First, it doesn't take into account that CLONE_VFORK can be used with
> > CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
> > if its parent dies before the child calls exec. It happens because the parent
> > clears vfork_done.
> >
> > Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
> > It includes all new processes cloned without CLONE_VM and all tasks that call
> > exec(). This is an user API change, but we think there aren't users that depend
> > on the old behavior.
>
> Can we include that patch here as well?

It is attached. I need to test it and then I will send it properly.

>
> > It is too late to make such changes in this release, so let's roll back
> > this patch and introduce the right one in the next release.
>
> Do you mean you'd like this revert to land for v6.0, and we should wait
> for the new API for later?

Yes, I mean this. I think we should merge the new patch in v6.1-rc1 so
it sits there for a while.

Thanks,
Andrei


Attachments:
0001-fs-exec-switch-timens-when-a-task-gets-a-new-mm.patch (5.18 kB)

2022-09-13 18:31:15

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

On Tue, 13 Sep 2022 03:25:49 -0700, Andrei Vagin wrote:
> This reverts commits:
> 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec")
> 6342140db660 ("selftests/timens: add a test for vfork+exit")
>
> Alexey pointed out a few undesirable side effects of the reverted change.
> First, it doesn't take into account that CLONE_VFORK can be used with
> CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
> if its parent dies before the child calls exec. It happens because the parent
> clears vfork_done.
>
> [...]

Applied to for-linus/execve, thanks!

[1/2] Revert "selftests/timens: add a test for vfork+exit"
https://git.kernel.org/kees/c/2b1e8921fc35
[2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"
https://git.kernel.org/kees/c/33a2d6bc3480

--
Kees Cook