Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp913149pxb; Wed, 6 Apr 2022 04:06:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/oWIinLcyeFlZ7GIIIwMinIFKRUE8TlD+2FAMzEqzmgdTxaqQMp9ko1keeYu0QHfJuck/ X-Received: by 2002:a17:90b:4f86:b0:1c9:b52d:9713 with SMTP id qe6-20020a17090b4f8600b001c9b52d9713mr9224097pjb.98.1649243184511; Wed, 06 Apr 2022 04:06:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649243184; cv=none; d=google.com; s=arc-20160816; b=IO8CcLFx+llC9VH0obV2WqOpTGmJTHmnBiuS+ylS1T+vzhnaYM6dTm6K9dlTYoKa+a HGPx5sAwoyXZ84UWuyKZm0kyX0Zp55XsrALwreN9Rafi+2YLvbNWf5BbHNks6rLh9QCb Hz6q0ArfDiLBcAf4cwVhq00dR106lynqNz9L3zi/4gueG01xPJccSWks+iTOZJnn64sS +iISI+1kUWHWi88qg4pBexHGyPpmtG1Rqd0bBY9n1ZyxPFfU5I2sLCD9pUVs5hPDKFSv IOmvknYJ7ZyRR55ZLsSeEo5QXbHB4bOjazHUnUAFe3PSvjwRwI2INzWEhwb8YZ0UswGe HP8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=iE/Y9WscNLHu6w2KEIZCPXwIElVoVHM0BzES9hjgoSw=; b=WgMqcp5pLHyAzxUGZpq3wD0I7X+UX3j/pQkznD6D8U+oHpi1OpwCbhq8NVgtIxsUob yFRs9XTqjK98B3V7A8zQTGJLWNmZ5YUdEAcYokNE4++YXqZIq68d4C/c+o1oDPqllDsy L5u1fxzlgie6Hl8leAN3fazMXTs+ELqAf+IIS8B9yMnNK8s9kYftoSB/Mm/+3RUb7dAN NWqngJsuYTKrW9R9LLzBjhJWuw977DuguSHLWyPIfEduHkgNxzzsabHoCU0w5xGInMAc hlt1jTI3qzfkbSWjsMvPUQYJJUBe1oBaBIKvytrUQDFNtaBwk2mEaOGSwAm/nr2POTfs ClbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=phlbuxVt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id bc17-20020a656d91000000b0039945a301d1si6113800pgb.347.2022.04.06.04.06.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 04:06:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=phlbuxVt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DF32C5C2792; Wed, 6 Apr 2022 02:28:20 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574882AbiDEXBm (ORCPT + 99 others); Tue, 5 Apr 2022 19:01:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1452913AbiDEPzm (ORCPT ); Tue, 5 Apr 2022 11:55:42 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0CB6E29DC; Tue, 5 Apr 2022 07:58:07 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id m16so409786plx.3; Tue, 05 Apr 2022 07:58:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=iE/Y9WscNLHu6w2KEIZCPXwIElVoVHM0BzES9hjgoSw=; b=phlbuxVtZrq+ny/cBitrnv1EdI2+ifY+dU7QMLrWR4vTUYhhs5DmvkW8pGnjEt00dY aeztRgO/rJiLIbsJOMkp2X5p+zBpqygo3MmguP1fKSF3+Lf8WmwLmYa414Z6tbkq7PtB ZlM8l2daOyMhZ2URU1jsuj+cAFosdKMB1hJVwUE82ZFfIxAwOpVLrf4SjgSs9s9MuxmX RLSMFtHho+fNSTZvCXpwoVQPSjCeD+aUdFQhTpIpZuUCgFohEgx7bCH/+TM6bTLv0uvp 1FIYjib+PWttI6jjFrcNZVQV5hky9BB97FMuLw/RLCV3Wb1dJv9c8ovyUbgUvR0hSBfX 5MtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=iE/Y9WscNLHu6w2KEIZCPXwIElVoVHM0BzES9hjgoSw=; b=4Y8cYfRBm59eJPxQZU/tfjiJh2iYvkFpS2w6Cw3TAkdNcCX3u2LyUo8GMWZH//+OjG cLXm69gDW4ukUPmtV0q2ivaUkRyYS9m9doGmGeiz1AVNqZX+/Q26DJZZ2EgmmYsCiz0k oR7b3xh0wRbVgeziLHhj14Sn38b2G5N9NX+GD80/KAwh6dTMesB13cAeHTsZLVs80sBD gnhpf3ZMB8uVT7fXozx1w/CFPj/L8078T8gSwsf2pT3B/aekQuFjYMaC+sv08R6at7ll UdhbQ1uBFcUfMWwE8PXXpZhWEHOhtMAXB8s/o2Mweyu84JALipPluLf0R99rlDUSedmX d7dg== X-Gm-Message-State: AOAM531EBN1gGfLY4w/4kxFsWIlTw99VuukBrtP/UG1HjS/FqVmClVRL l6gcNM6UWywcb7Chrcj3pdQ= X-Received: by 2002:a17:90b:1e0e:b0:1c7:5b03:1d8b with SMTP id pg14-20020a17090b1e0e00b001c75b031d8bmr4558131pjb.121.1649170687106; Tue, 05 Apr 2022 07:58:07 -0700 (PDT) Received: from [192.168.0.115] ([113.173.105.8]) by smtp.gmail.com with ESMTPSA id p10-20020a056a000b4a00b004fd9a6a2a39sm16660710pfo.184.2022.04.05.07.58.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Apr 2022 07:58:06 -0700 (PDT) Message-ID: Date: Tue, 5 Apr 2022 21:58:01 +0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v2] cgroup: Kill the parent controller when its last child is killed Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Tejun Heo Cc: cgroups@vger.kernel.org, kernel test robot , Zefan Li , Johannes Weiner , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org References: <20220404142535.145975-1-minhquangbui99@gmail.com> <20220405091158.GA13806@blackbody.suse.cz> From: Bui Quang Minh In-Reply-To: <20220405091158.GA13806@blackbody.suse.cz> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/5/22 16:11, Michal Koutný wrote: > On Mon, Apr 04, 2022 at 07:37:24AM -1000, Tejun Heo wrote: >> And the suggested behavior doesn't make much sense to me. It doesn't >> actually solve the underlying problem but instead always make css >> destructions recursive which can lead to surprises for normal use cases. > > I also don't like the nested special-case use percpu_ref_kill(). After thinking more carefully, I agree with your points. The recursive css destruction only does not fixup the previous parents' metadata correctly and it is not a desirable behavior too. > I looked at this and my supposed solution turned out to be a revert of > commit 3c606d35fe97 ("cgroup: prevent mount hang due to memory > controller lifetime"). So at the unmount time it's necessary to distinguish > children that are in the process of removal from children than are online or > pinned indefinitely. > > What about: > > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -2205,11 +2205,14 @@ static void cgroup_kill_sb(struct super_block *sb) > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > > /* > - * If @root doesn't have any children, start killing it. > + * If @root doesn't have any children held by residual state (e.g. > + * memory controller), start killing it, flush workqueue to filter out > + * transiently offlined children. > * This prevents new mounts by disabling percpu_ref_tryget_live(). > * > * And don't kill the default root. > */ > + flush_workqueue(cgroup_destroy_wq); > if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && > !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { > cgroup_bpf_offline(&root->cgrp); > > (I suspect there's technically still possible a race between concurrent unmount > and the last rmdir but the flush on kill_sb path should be affordable and it > prevents unnecessarily conserved cgroup roots.) Your proposed solution looks good to me. As with my example the flush will guarantee the rmdir and its deferred work has been executed before cleaning up in umount path. But what do you think about diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index f01ff231a484..5578ee76e789 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2215,6 +2215,7 @@ static void cgroup_kill_sb(struct super_block *sb) cgroup_bpf_offline(&root->cgrp); percpu_ref_kill(&root->cgrp.self.refcnt); } + root->cgrp.flags |= CGRP_UMOUNT; cgroup_put(&root->cgrp); kernfs_kill_sb(sb); } @@ -5152,12 +5153,28 @@ static void css_release_work_fn(struct work_struct *work) container_of(work, struct cgroup_subsys_state, destroy_work); struct cgroup_subsys *ss = css->ss; struct cgroup *cgrp = css->cgroup; + struct cgroup *parent = cgroup_parent(cgrp); mutex_lock(&cgroup_mutex); css->flags |= CSS_RELEASED; list_del_rcu(&css->sibling); + /* + * If parent doesn't have any children, start killing it. + * And don't kill the default root. + */ + if (parent && list_empty(&parent->self.children) && + parent->flags & CGRP_UMOUNT && + parent != &cgrp_dfl_root.cgrp && + !percpu_ref_is_dying(&parent->self.refcnt)) { +#ifdef CONFIG_CGROUP_BPF + if (!percpu_ref_is_dying(&cgrp->bpf.refcnt)) + cgroup_bpf_offline(parent); +#endif + percpu_ref_kill(&parent->self.refcnt); + } + if (ss) { /* css release path */ if (!list_empty(&css->rstat_css_node)) { The idea is to set a flag in the umount path, in the rmdir it will destroy the css in case its direct parent is umounted, no recursive here. This is just an incomplete example, we may need to reset that flag when remounting. Thanks, Quang Minh.