Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2282918pxb; Fri, 17 Sep 2021 06:28:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxISuLit3Y9PdFgVK1cHxnMuT0/w5KLH/d1k/nBvQnojl7OY3hzyc0EmFL+z5D80/XVsDVb X-Received: by 2002:a05:6402:2919:: with SMTP id ee25mr12404484edb.395.1631885316359; Fri, 17 Sep 2021 06:28:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631885316; cv=none; d=google.com; s=arc-20160816; b=LxVTvmS2SaUUuB6bcEJRA9QglNYNuxJqLcQZOeXK+wzb/i6TsPb7NcajHNVy4a32RL eBM+JTMC9KVTCx0WHGGbJyRYdH/5rtJRIQ1OGab4npkwL6E5UgT4elF4hZ86W5zh/Wf7 f8cWn3PeotUINyTrm4BG2oE/+Mb89E9SyIIfWEcZOPFdwhnQGQpwwKH2mO563deSQ+WE sI6tzxHadkwuxCWroQBtjL6neNrIoIBskO2PpOYKgqIuYAVjIaG5h5OiKwcIYwCX0xzk nSPPLAlkJQxHYsWrkNB6GxJvzNI481YS/vpr8F53MGMNJuDOMT9wtE7mapiA1xsEls92 Qxuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=evpI/34uFhrThXhRiJF+HjyOxlj0dZVYKoxB5GmoxNs=; b=ZOhys4Uax0KGSlvFqxbLGdSLgVTkW0TqHwVGzH3FQgihO75gXZfmjp/rc7xIzzl8+z DHhasefP1O3JGafvV1M3qiShIxTc4kgMgwiHcnZvGXzq5cBiGEQPygXtYsetjE4MyeSl Y1oinJs1qRnl3cwdshOI31+dpPa7WSYNl1LFrQ2gzPH/sGZDzEsYt2iiusbyf5xq1Lvf x6yi2mGoclq0ZAJu3Pwcg/PeodDo8n4CiDxQ/ShXMv2antoaDcxWsf56rUpU5vOBzGHf AsPR2KK2dc8HcZfC1K82gB7yVe7Kz7h8tcSRzWK7u3bnlXs1jI/pPCGit/iQ/YomJuVo v4og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o23si6760027eds.422.2021.09.17.06.28.12; Fri, 17 Sep 2021 06:28:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245541AbhIQI4V (ORCPT + 99 others); Fri, 17 Sep 2021 04:56:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:58536 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233979AbhIQI4U (ORCPT ); Fri, 17 Sep 2021 04:56:20 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8754161056; Fri, 17 Sep 2021 08:54:57 +0000 (UTC) Date: Fri, 17 Sep 2021 10:54:55 +0200 From: Christian Brauner To: Mike Christie Cc: stefanha@redhat.com, jasowang@redhat.com, mst@redhat.com, sgarzare@redhat.com, virtualization@lists.linux-foundation.org, axboe@kernel.dk, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/8] fork: add option to not clone or dup files Message-ID: <20210917085455.xggfayhxdq6eejz4@wittgenstein> References: <20210916212051.6918-1-michael.christie@oracle.com> <20210916212051.6918-4-michael.christie@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210916212051.6918-4-michael.christie@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 16, 2021 at 04:20:46PM -0500, Mike Christie wrote: > Each vhost device gets a thread that is used to perform IO and management > operations. Instead of a thread that is accessing a device, the thread is > part of the device, so when it calls kernel_copy_process we can't dup or > clone the parent's (Qemu thread that does the VHOST_SET_OWNER ioctl) > files/FDS because it would do an extra increment on ourself. > > Later, when we do: > > Qemu process exits: > do_exit -> exit_files -> put_files_struct -> close_files > > we would leak the device's resources because of that extra refcount > on the fd or file_struct. > > This patch adds a no_files option so these worker threads can prevent > taking an extra refcount on themselves. > > Signed-off-by: Mike Christie > --- > include/linux/sched/task.h | 3 ++- > kernel/fork.c | 14 +++++++++++--- > 2 files changed, 13 insertions(+), 4 deletions(-) > > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > index c55f1eb69d41..d0b0872f56cc 100644 > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -32,6 +32,7 @@ struct kernel_clone_args { > size_t set_tid_size; > int cgroup; > int io_thread; > + int no_files; > struct cgroup *cgrp; > struct css_set *cset; > }; > @@ -86,7 +87,7 @@ extern pid_t kernel_clone(struct kernel_clone_args *kargs); > struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node); > struct task_struct *kernel_copy_process(int (*fn)(void *), void *arg, int node, > unsigned long clone_flags, > - int io_thread); > + int io_thread, int no_files); > struct task_struct *fork_idle(int); > struct mm_struct *copy_init_mm(void); > extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); > diff --git a/kernel/fork.c b/kernel/fork.c > index cec7b6011beb..a0468e30b27e 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1532,7 +1532,8 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk) > return 0; > } > > -static int copy_files(unsigned long clone_flags, struct task_struct *tsk) > +static int copy_files(unsigned long clone_flags, struct task_struct *tsk, > + int no_files) > { > struct files_struct *oldf, *newf; > int error = 0; > @@ -1544,6 +1545,11 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk) > if (!oldf) > goto out; > > + if (no_files) { > + tsk->files = NULL; > + goto out; > + } > + > if (clone_flags & CLONE_FILES) { > atomic_inc(&oldf->count); > goto out; > @@ -2179,7 +2185,7 @@ static __latent_entropy struct task_struct *copy_process( > retval = copy_semundo(clone_flags, p); > if (retval) > goto bad_fork_cleanup_security; > - retval = copy_files(clone_flags, p); > + retval = copy_files(clone_flags, p, args->no_files); > if (retval) > goto bad_fork_cleanup_semundo; > retval = copy_fs(clone_flags, p); > @@ -2539,6 +2545,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) > * @node: numa node to allocate task from > * @clone_flags: CLONE flags > * @io_thread: 1 if this will be a PF_IO_WORKER else 0. > + * @no_files: Do not duplicate or copy the parent's open files. > * > * This returns a created task, or an error pointer. The returned task is > * inactive, and the caller must fire it up through wake_up_new_task(p). If > @@ -2546,7 +2553,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) > */ > struct task_struct *kernel_copy_process(int (*fn)(void *), void *arg, int node, > unsigned long clone_flags, > - int io_thread) > + int io_thread, int no_files) I think that the addition of no_files together with io_thread might be a sign to introduce a set of kernel internal clone flags in kernel_clone_args.