Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp4645122rwb; Tue, 20 Sep 2022 17:48:22 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5hGPqowoazRuCypa99Q8P6crD6S7a05mGAyItfx2ywkOu8o3vZEcIebzh0M0xR9giiARrV X-Received: by 2002:a17:907:a0c7:b0:77c:a049:7d10 with SMTP id hw7-20020a170907a0c700b0077ca0497d10mr18630116ejc.467.1663721302449; Tue, 20 Sep 2022 17:48:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663721302; cv=none; d=google.com; s=arc-20160816; b=MlElv6ujALWc1A9gWFf/B0ZrNfnGXz2MA4o24uqPGOMBXuLyGNnkZMM+IUAOcaSUCD z2A5v2pt/4D0MFpYrHPqFZ88r05RGrpieuVrFZHBzAiZC8ior0keiqzeTvWq1zzAmzSX ThPAEQdo6KT+yKutMkNmmsTCkKLJjpb9E4DNLXnAGM8SRi0I9krEWFWgXeX0fnALNkwm r+RaqKWEGCyOPmV2hIpVdE5hWLSdhdmeJ0ch2tDyO6+nJIQI6hp4noWbqGq7WhT48Ne/ qbw643dnWev1iyFm80yWPQG4GlkB3kBxKWJkuPZ5TEumVggFFFYrUIy9/4dtQ6vKE/bC DMig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=qGRWDcFaYPLFs5iWMNHMwoViduFR4vpceIvIvEWEb0c=; b=nz5NMMXqV8Z4x5MdvpuTmFkcjB5eRxSKrsmxeSAV0x8Bcd0hd+qhmfM3vfa8qkZWx9 ieexqV0A4tJoCIgryYikoyE7MvlwEjx+IKaMA/hE+MSVMOvMZRIKhSQrBrLN459nUrMx RXvFRwX0zZaLDPldSw7ZcvAOPumqrBDJl+y6ECRGry7S2M1qsnO9cdoMt7WUg1a1exnd 2Z1e7iMXWQoOwtjt5CdV2G95cz5m5+NOOXfUJD9PLtWefs5M16vVocf6bwYGuIdhL0Hn kvdmPearDlfXriP2uyIVmClp/MfcWRr6sDG9zDNdaUlDCau/YaLPmxvrIHh41asNXuZG xiCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="G/i8U4vs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i25-20020a50d759000000b00448db2ab374si1149989edj.596.2022.09.20.17.47.57; Tue, 20 Sep 2022 17:48:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="G/i8U4vs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230466AbiIUAbc (ORCPT + 99 others); Tue, 20 Sep 2022 20:31:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230415AbiIUAb1 (ORCPT ); Tue, 20 Sep 2022 20:31:27 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B91FA6E89D for ; Tue, 20 Sep 2022 17:31:23 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id j11-20020a056a00234b00b005415b511595so2570157pfj.12 for ; Tue, 20 Sep 2022 17:31:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date; bh=qGRWDcFaYPLFs5iWMNHMwoViduFR4vpceIvIvEWEb0c=; b=G/i8U4vsIWYH1IV3aQaZ8DdVienjwPw+dGLNUpALgd9qJL9wRtqvDlpY2jR2q0Gp47 dQF6hd8ktTKq8bkBWYsSbPpTjkq8PxlniHwl8mxkV5bFucQ5ldvyyW/ZQXO1jluuWYEM ENub1isdMitvr5tLh+nytU+E9lfIQ08H/nN0anELye/l5GyImtPxh7jBvhs/jxakPeuZ EPfabH75hZf5j5pKqpoTFqIq5OlhjZ+0QUVk+K26FKvYxpeD7rXW1ENEs1Oqpy0e6yOf KLYEWMEXYDPEksnzj67jt9o3rXGFpecCkK5Orv0JoRJ+ks/e9BBzWvl2vBMrJb+DfJxk s4jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date; bh=qGRWDcFaYPLFs5iWMNHMwoViduFR4vpceIvIvEWEb0c=; b=RRzU3Sblj6OCVHm+2Zc/tCU4LGG1/G0boxmf3XuHgTpuiUIAJoAVYiuQqDVuzXNxHR 4TatTLcrgZrXaLp+Gvb9/1mil++D3ltIXIy+B89i723h0cYP2soCd8/4hWceG8O2TXkb D0ew4Hp8sSD7mnToudy6iqX3VGXs8yS3n7+9CmJUT8bL4frUz/tG1G55Fl1rsgak1p5m 4pXJmPnR98qO/omOgXxnONaf0E9jTlU2TrRa/rvR2CAouPSl6CD/mSgPRGq/M09fXEnM grCWhvdkA36MjWxmBvddgPeiQYiRLyzwRmScYxTpYtnnL7+2jCddlYWQ/m7S5cexBrWx vpOA== X-Gm-Message-State: ACrzQf3FQfsnrzFUN977f4pY266zfqGgGn8hCLcHhQhdYF5mDckQbcOA 1xWhsmGj1xpbK/d+kpeohs2EmgYmPjk= X-Received: from avagin.kir.corp.google.com ([2620:15c:29:204:2436:675:9889:e5ed]) (user=avagin job=sendgmr) by 2002:aa7:90d4:0:b0:544:9a9c:f563 with SMTP id k20-20020aa790d4000000b005449a9cf563mr27182997pfk.70.1663720283342; Tue, 20 Sep 2022 17:31:23 -0700 (PDT) Date: Tue, 20 Sep 2022 17:31:19 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.37.3.968.ga6b4b080e4-goog Message-ID: <20220921003120.209637-1-avagin@google.com> Subject: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm From: Andrei Vagin To: Kees Cook Cc: linux-kernel@vger.kernel.org, Andrei Vagin , Alexey Izbyshev , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, "Eric W. Biederman" , Florian Weimer Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrei Vagin Changing a time namespace requires remapping a vvar page, so we don't want to allow doing that if any other tasks can use the same mm. Currently, we install a time namespace when a task is created with a new vm. exec() is another case when a task gets a new mm and so it can switch a time namespace safely, but it isn't handled now. One more issue of the current interface is that clone() with CLONE_VM isn't allowed if the current task has unshared a time namespace (timens_for_children doesn't match the current timens). Both these issues make some inconvenience for users. For example, Alexey and Florian reported that posix_spawn() uses vfork+exec and this pattern doesn't work with time namespaces due to the both described issues. LXC needed to workaround the exec() issue by calling setns. In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But it adds extra complexity and some undesirable side effects. Eric suggested fixing the issues properly because here are all the reasons to suppose that there are no users that depend on the old behavior. Cc: Alexey Izbyshev Cc: Christian Brauner Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" Cc: Florian Weimer Cc: Kees Cook Suggested-by: "Eric W. Biederman" Origin-author: "Eric W. Biederman" Signed-off-by: Andrei Vagin --- fs/exec.c | 5 +++++ include/linux/nsproxy.h | 1 + kernel/fork.c | 9 --------- kernel/nsproxy.c | 23 +++++++++++++++++++++-- 4 files changed, 27 insertions(+), 11 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index d046dbb9cbd0..71284188b96d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include @@ -1296,6 +1297,10 @@ int begin_new_exec(struct linux_binprm * bprm) bprm->mm = NULL; + retval = exec_task_namespaces(); + if (retval) + goto out_unlock; + #ifdef CONFIG_POSIX_TIMERS spin_lock_irq(&me->sighand->siglock); posix_cpu_timers_exit(me); diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index cdb171efc7cb..fee881cded01 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -94,6 +94,7 @@ static inline struct cred *nsset_cred(struct nsset *set) int copy_namespaces(unsigned long flags, struct task_struct *tsk); void exit_task_namespaces(struct task_struct *tsk); void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new); +int exec_task_namespaces(void); void free_nsproxy(struct nsproxy *ns); int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, struct cred *, struct fs_struct *); diff --git a/kernel/fork.c b/kernel/fork.c index 2b6bd511c6ed..4eb803f75225 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2044,15 +2044,6 @@ static __latent_entropy struct task_struct *copy_process( return ERR_PTR(-EINVAL); } - /* - * If the new process will be in a different time namespace - * do not allow it to share VM or a thread group with the forking task. - */ - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { - if (nsp->time_ns != nsp->time_ns_for_children) - return ERR_PTR(-EINVAL); - } - if (clone_flags & CLONE_PIDFD) { /* * - CLONE_DETACHED is blocked so that we can potentially diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index eec72ca962e2..a487ff24129b 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -157,7 +157,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWCGROUP | CLONE_NEWTIME)))) { - if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) { + if ((flags & CLONE_VM) || + likely(old_ns->time_ns_for_children == old_ns->time_ns)) { get_nsproxy(old_ns); return 0; } @@ -179,7 +180,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (IS_ERR(new_ns)) return PTR_ERR(new_ns); - timens_on_fork(new_ns, tsk); + if ((flags & CLONE_VM) == 0) + timens_on_fork(new_ns, tsk); tsk->nsproxy = new_ns; return 0; @@ -254,6 +256,23 @@ void exit_task_namespaces(struct task_struct *p) switch_task_namespaces(p, NULL); } +int exec_task_namespaces(void) +{ + struct task_struct *tsk = current; + struct nsproxy *new; + + if (tsk->nsproxy->time_ns_for_children == tsk->nsproxy->time_ns) + return 0; + + new = create_new_namespaces(0, tsk, current_user_ns(), tsk->fs); + if (IS_ERR(new)) + return PTR_ERR(new); + + timens_on_fork(new, tsk); + switch_task_namespaces(tsk, new); + return 0; +} + static int check_setns_flags(unsigned long flags) { if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | -- 2.37.3.968.ga6b4b080e4-goog