Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp291878pxb; Wed, 6 Oct 2021 05:13:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwqGCPv/g/hAp2n7VaTVNknolw2IF2ISbe4ONFMv9JPKVPPZh1BQuUHx/PUf+EbVhk0ovAY X-Received: by 2002:a17:902:760b:b0:13b:122:5ff0 with SMTP id k11-20020a170902760b00b0013b01225ff0mr10688728pll.22.1633522417380; Wed, 06 Oct 2021 05:13:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633522417; cv=none; d=google.com; s=arc-20160816; b=z6fK99/VrLPoxV+HRk9df6CbcvJoLDElFwswPOMV9LLt/mXgbLfYAuvwGaV0PoUFRl 842WPmeCj/Pk5EgvOnGZm96C81tFTUWBwjH8t9tzhaOyP/xoj3VcGNVeVtjHS9F+eg3n 6/uO1ggViOkvb31amBfxrBESOCGdEQZeolL7gj+pTmRaVtrKbtoeuqoMZmnc/bqRXVHB UMjJJWNZROhqW7DvnplQfa40WoYWmXXPfKI/BFJvzYqJRtMl84C7exWSJiW1Pm2paFYp 16W0QkLEatHI9qxn6OkXaSO2tNh/UhYJwfDwI999vOkrWmeN3X2BnnXFA1/u1x08hYJz Ww+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=dAJaVdZCYTqasw7i898yV4YzunezijYYApzT5KBVsGs=; b=pvU2KjuNrMdw2SOVNyfCszMMl7znEeXZTxORIVp7/7sbAknNe19ts4q1IvRkGsFTso 1ufrreC3G+H2vpDWR4U9IBO+6SBCV7weXsFNhoRMlQGEqGf1DwN073H7g1rmCrTxe6jh FWrXiZAMB+e5Jnuh06/X94Uvh2lwvqd9og/B6w07ehLbY3Twz8BFl+UYlBeQ4UCW8+qU +wNrh1U9ApgT2deUTUBfMEjGOZSa9LVUW4WbCRiEMgxl2ZkY2MmYa6uxk+kkVirOUG5v zq1EsX3mqghe/96uPnxh3mPk73ROt1FVtV/+PLgEk/SQHBKd6Iah8Iu+E6QZVsbloHNs WqPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ot14si4750957pjb.102.2021.10.06.05.13.20; Wed, 06 Oct 2021 05:13:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238211AbhJFMN6 (ORCPT + 99 others); Wed, 6 Oct 2021 08:13:58 -0400 Received: from mail.kernel.org ([198.145.29.99]:51274 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238117AbhJFMN5 (ORCPT ); Wed, 6 Oct 2021 08:13:57 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id CA3D861076; Wed, 6 Oct 2021 12:12:02 +0000 (UTC) Date: Wed, 6 Oct 2021 14:12:00 +0200 From: Christian Brauner To: Mike Christie Cc: geert@linux-m68k.org, vverma@digitalocean.com, hdanton@sina.com, hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com, mst@redhat.com, sgarzare@redhat.com, virtualization@lists.linux-foundation.org, axboe@kernel.dk, linux-kernel@vger.kernel.org Subject: Re: [PATCH V3 5/9] fork: add helper to clone a process Message-ID: <20211006121200.udx2skkwllmjor4v@wittgenstein> References: <20211004192128.381453-1-michael.christie@oracle.com> <20211004192128.381453-6-michael.christie@oracle.com> <20211005125018.5nzmhwdxivyye5on@wittgenstein> <00d724df-5781-035f-54ad-e0432ec92646@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <00d724df-5781-035f-54ad-e0432ec92646@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 05, 2021 at 12:10:55PM -0500, Mike Christie wrote: > On 10/5/21 7:50 AM, Christian Brauner wrote: > > On Mon, Oct 04, 2021 at 02:21:24PM -0500, Mike Christie wrote: > >> The vhost layer has similar requirements as io_uring where its worker > >> threads need to access the userspace thread's memory, want to inherit the > >> parents's cgroups and namespaces, and be checked against the parent's > >> RLIMITs. Right now, the vhost layer uses the kthread API which has > >> kthread_use_mm for mem access, and those threads can use > >> cgroup_attach_task_all for v1 cgroups, but there are no helpers for the > >> other items. > >> > >> This adds a helper to clone a process so we can inherit everything we > >> want in one call. It's a more generic version of create_io_thread which > >> will be used by the vhost layer and io_uring in later patches in this set. > >> > >> Signed-off-by: Mike Christie > >> Acked-by: Christian Brauner > >> --- > >> include/linux/sched/task.h | 6 ++++- > >> kernel/fork.c | 48 ++++++++++++++++++++++++++++++++++++++ > >> 2 files changed, 53 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > >> index e165cc67fd3c..ba0499b6627c 100644 > >> --- a/include/linux/sched/task.h > >> +++ b/include/linux/sched/task.h > >> @@ -87,7 +87,11 @@ extern void exit_files(struct task_struct *); > >> extern void exit_itimers(struct signal_struct *); > >> > >> extern pid_t kernel_clone(struct kernel_clone_args *kargs); > >> -struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node); > >> +struct task_struct *create_io_thread(int (*fn)(void *i), void *arg, int node); > >> +struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node, > >> + unsigned long clone_flags, u32 worker_flags); > >> +__printf(2, 3) > >> +void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...); > >> struct task_struct *fork_idle(int); > >> struct mm_struct *copy_init_mm(void); > >> extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); > >> diff --git a/kernel/fork.c b/kernel/fork.c > >> index 98264cf1d6a6..3f3fcabffa5f 100644 > >> --- a/kernel/fork.c > >> +++ b/kernel/fork.c > >> @@ -2540,6 +2540,54 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) > >> return copy_process(NULL, 0, node, &args); > >> } > >> > >> +/** > >> + * kernel_worker - create a copy of a process to be used by the kernel > >> + * @fn: thread stack > >> + * @arg: data to be passed to fn > >> + * @node: numa node to allocate task from > >> + * @clone_flags: CLONE flags > >> + * @worker_flags: KERN_WORKER flags > >> + * > >> + * This returns a created task, or an error pointer. The returned task is > >> + * inactive, and the caller must fire it up through kernel_worker_start(). If > >> + * this is an PF_IO_WORKER all singals but KILL and STOP are blocked. > >> + */ > >> +struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node, > >> + unsigned long clone_flags, u32 worker_flags) > >> +{ > >> + struct kernel_clone_args args = { > >> + .flags = ((lower_32_bits(clone_flags) | CLONE_VM | > >> + CLONE_UNTRACED) & ~CSIGNAL), > >> + .exit_signal = (lower_32_bits(clone_flags) & CSIGNAL), > >> + .stack = (unsigned long)fn, > >> + .stack_size = (unsigned long)arg, > >> + .worker_flags = KERN_WORKER_USER | worker_flags, > >> + }; > >> + > >> + return copy_process(NULL, 0, node, &args); > >> +} > >> +EXPORT_SYMBOL_GPL(kernel_worker); > >> + > >> +/** > >> + * kernel_worker_start - Start a task created with kernel_worker > >> + * @tsk: task to wake up > >> + * @namefmt: printf-style format string for the thread name > >> + * @arg: arguments for @namefmt > >> + */ > >> +void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...) > >> +{ > >> + char name[TASK_COMM_LEN]; > >> + va_list args; > > > > You could think about reporting an error from this function if > > KERN_WORK_USER isn't set or only call the below when KERN_WORK_USER is > > set. Both options are fine. > > > > I'm not sure how to handle this comment, because I might have misread > an older comment or made it up in my head. > > KERN_WORK_USER is only set on the kernel_clone_args, so at this point we > don't have that struct available anymore. Ah, right. > > I didn't add a new PF_KTHREAD_WORK_USER flag to sched.h, because I thought > I had got a review comment to not add another PF flag for this. However, I > can't seem to find that comment now so I'm not sure if maybe I misread a > comment or made it up. > > If it's ok I could add a PF_KTHREAD_WORK_USER, then do a: > > WARN_ON(!(tsk->flags & PF_KTHREAD_WORK_USER) > > so future developers get loud feedback they are doing the > wrong thing right away. I think a PF_USER_WORKER might just do fine as it fits with the naming of PF_IO_WORKER. Christian