Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp548731ybl; Wed, 29 Jan 2020 05:28:46 -0800 (PST) X-Google-Smtp-Source: APXvYqwmLM1KJ/Vsb7RSTHDe91MYFkTh+0Q0APfNmh3ukcjm8aht+DQnpBstkQGcmrhsChJ8OtI9 X-Received: by 2002:a9d:7b50:: with SMTP id f16mr20579149oto.18.1580304526522; Wed, 29 Jan 2020 05:28:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580304526; cv=none; d=google.com; s=arc-20160816; b=W0CfNvLBbPGKYx30UfL3ihfA1I0svwGip7AwROv45jznuhbr22G0h4+up8LzO1wghK +0g47x6ifmI2LPBvyCS4ohTnEZtD0fhDrPBf4LvoiAxgibVd0MUiKkacY56KLnRCY9ZE EGOIRf8WoVfVn31Yf8MRTuwx+VbKx+cLPeMhgDOQ11qKn2NDdXCRPSbz/k7lWQ7UGUie p0alDZwmHn0Lx+UeaTdbYYZN6NvRVMQOr6Df2rymViDQaJgOkY6p0fOAwi5Ibjgm9Gkz BeUFTLY2jr1BsF1eEPVGasD6SaCdeyGpAT9zBGnf063OUrR1s5rYU/txVr1OWm0Inp9e SH1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=0li0t6wJAMSejykymu4s6TJ/W+N4xT/KttOc9OGmMak=; b=wNzabrtxLIY6g2rUVQ1W2s0hHlirgcxyW7/RZEy1b3lojQ5Vh6UrlNN4eYuGE6If60 VKcmGXeGAutuWiEWt4JO1vBRm6V2dFjv5QqExEKMUa8PcZoGRl53QaI7iWPsGtR4UaYl IArlOfGc886BXajF1NiHs2L7/0N+9SRSzq8XHv0m6VrjqSUr+6WXACh2hDOYmRYe7Xha a85P1ge3gbufixRwFQTUj3OV1Rq0MyMyJDI6xhsFp5vTiWnwsgeTvUdME1xGUzV62a++ p0K03sfx8TBTjpV0PFuIqLxhtAAK5PVf1esLpAaM6he8MLK3DcUPIbqX66StpGaeAXNy pOPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m8si994035oim.180.2020.01.29.05.28.34; Wed, 29 Jan 2020 05:28:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726591AbgA2N1Y (ORCPT + 99 others); Wed, 29 Jan 2020 08:27:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:54060 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726140AbgA2N1X (ORCPT ); Wed, 29 Jan 2020 08:27:23 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9439EAC46; Wed, 29 Jan 2020 13:27:20 +0000 (UTC) Date: Wed, 29 Jan 2020 14:27:19 +0100 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Christian Brauner Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Tejun Heo , Oleg Nesterov , Ingo Molnar , Johannes Weiner , Li Zefan , Peter Zijlstra , cgroups@vger.kernel.org Subject: Re: [PATCH v5 5/6] clone3: allow spawning processes into cgroups Message-ID: <20200129132719.GD11384@blackbody.suse.cz> References: <20200121154844.411-1-christian.brauner@ubuntu.com> <20200121154844.411-6-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200121154844.411-6-christian.brauner@ubuntu.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello. On Tue, Jan 21, 2020 at 04:48:43PM +0100, Christian Brauner wrote: > +static int cgroup_css_set_fork(struct kernel_clone_args *kargs) > + __acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem) > +{ > + int ret; > + struct cgroup *dst_cgrp = NULL; > + struct css_set *cset; > + struct super_block *sb; > + struct file *f; > + > + if (kargs->flags & CLONE_INTO_CGROUP) > + mutex_lock(&cgroup_mutex); > + > + cgroup_threadgroup_change_begin(current); > + > + spin_lock_irq(&css_set_lock); > + cset = task_css_set(current); > + get_css_set(cset); > + spin_unlock_irq(&css_set_lock); > + > + if (!(kargs->flags & CLONE_INTO_CGROUP)) { > + kargs->cset = cset; Where is this css_set put when CLONE_INTO_CGROUP isn't used? (Aha, it's passed to child's tsk->cgroups but see my other note below.) > + dst_cgrp = cgroup_get_from_file(f); > + if (IS_ERR(dst_cgrp)) { > + ret = PTR_ERR(dst_cgrp); > + dst_cgrp = NULL; > + goto err; > + } > + > + /* > + * Verify that we the target cgroup is writable for us. This is > + * usually done by the vfs layer but since we're not going through > + * the vfs layer here we need to do it "manually". > + */ > + ret = cgroup_may_write(dst_cgrp, sb); > + if (ret) > + goto err; > + > + ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb, > + !!(kargs->flags & CLONE_THREAD)); > + if (ret) > + goto err; > + > + kargs->cset = find_css_set(cset, dst_cgrp); > + if (!kargs->cset) { > + ret = -ENOMEM; > + goto err; > + } > + > + if (cgroup_is_dead(dst_cgrp)) { > + ret = -ENODEV; > + goto err; > + } I'd move this check right after cgroup_get_from_file. The fork-migration path is synchrinized via cgroup_mutex with cgroup_destroy_locked and there's no need checking permissions on cgroup that's going away anyway. > +static void cgroup_css_set_put_fork(struct kernel_clone_args *kargs) > + __releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex) > +{ > + cgroup_threadgroup_change_end(current); > + > + if (kargs->flags & CLONE_INTO_CGROUP) { > + struct cgroup *cgrp = kargs->cgrp; > + struct css_set *cset = kargs->cset; > + > + mutex_unlock(&cgroup_mutex); > + > + if (cset) { > + put_css_set(cset); > + kargs->cset = NULL; > + } > + > + if (cgrp) { > + cgroup_put(cgrp); > + kargs->cgrp = NULL; > + } > + } I don't see any function problem with this ordering, however, I'd prefer symmetry with the "allocation" path (in cgroup_css_set_fork), i.e. cgroup_put, put_css_set and lastly mutex_unlock. > +void cgroup_post_fork(struct task_struct *child, > + struct kernel_clone_args *kargs) > + __releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex) > { > struct cgroup_subsys *ss; > - struct css_set *cset; > + struct css_set *cset = kargs->cset; > int i; > > spin_lock_irq(&css_set_lock); > > WARN_ON_ONCE(!list_empty(&child->cg_list)); > - cset = task_css_set(current); /* current is @child's parent */ > - get_css_set(cset); > cset->nr_tasks++; > css_set_move_task(child, NULL, cset, false); So, the reference is passed over from kargs->cset to task->cgroups. I think it's necessary to zero kargs->cset in order to prevent droping the reference in cgroup_css_set_put_fork. Perhaps, a general comment about css_set whereabouts during fork and kargs passing would be useful. > @@ -6016,6 +6146,17 @@ void cgroup_post_fork(struct task_struct *child) > } while_each_subsys_mask(); > > cgroup_threadgroup_change_end(current); > + > + if (kargs->flags & CLONE_INTO_CGROUP) { > + mutex_unlock(&cgroup_mutex); > + > + cgroup_put(kargs->cgrp); > + kargs->cgrp = NULL; > + } > + > + /* Make the new cset the root_cset of the new cgroup namespace. */ > + if (kargs->flags & CLONE_NEWCGROUP) > + child->nsproxy->cgroup_ns->root_cset = cset; root_cset reference (from copy_cgroup_ns) seems leaked here and where is the additional reference to new cset obtained? Thanks, Michal