Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp5928932ioo; Wed, 1 Jun 2022 16:15:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzso3a37YpCxB9p7AUAoGhyJcWfILKxVxDj1Zr9GeHDbN9Wo8fcalEugr+fsC++A//qAyIk X-Received: by 2002:a17:90b:1098:b0:1e2:f4bc:d0 with SMTP id gj24-20020a17090b109800b001e2f4bc00d0mr19812898pjb.125.1654125325898; Wed, 01 Jun 2022 16:15:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654125325; cv=none; d=google.com; s=arc-20160816; b=N4AHVx6k4wwgJ9dgs5lh8I95oxrx1Upci4uAK1MhGAE732c3/zi3uHy0syHR0CwSWi fuHUKX+6nAA9uhLbUMpC+ptf+zpSDzYgWEVAXWOXciGseGNQmqE9g+flM2pqvg8yt22X lDZNfUIr8fowZExwHzLZJ1+44o6FYP9TC64bBobvrRr9/iydaAoXiKTd9ak0yrIVgiiU gfJAzU8/yFa9exnlx2fVJIeQpmnKstDpiGlfUc7eap4PLaa4OWL2cjHjk3RnylBtLOLX 3WuUruwbVl8UI84ABY89WcKgfXyl2UpbpL3u87umSmIIwti2j5tLFHsh9QW5YHleouHo 2H5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=avJEEUv9TwoQxp5JDkd8KBVEKkaG/hcczSaXXKCQTHE=; b=s078CAYCykfSWdx4U7r5ZwL0aCB446DDC5YTqgVlzJExbkTQid9HgqshLXXjRMZ2dI EQGkrEcefVk1c3lpCl3wq+9LfumTG1KUlpZAzMLOU5RFz+/KDYCgfETvBHS1j2Sk2AKW Avl+o48Eto612RXbss1aO8J2REQ0qE3gjkvenpwCUkcda/IPU+w+r5XbWdq2WS7SK/mU Zb0HaeAdvQaXlt9y9aODLSCrdIW1GJPNpcHq3WaU83SPrzM4iVkeYLAvJ7y7ivtlf9bF cRms2Edj9e/2DRaQmOTzQpohkZpy1WyDyriTOAP2BNOzMvgoPmGnyiNsOrtSFTk8u9/4 SbXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=h6rYPumu; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id cp26-20020a056a00349a00b0051bbe189e23si1543497pfb.337.2022.06.01.16.15.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 16:15:25 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=h6rYPumu; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 36FBE11447; Wed, 1 Jun 2022 16:13:44 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232505AbiFAXNh (ORCPT + 99 others); Wed, 1 Jun 2022 19:13:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230099AbiFAXNg (ORCPT ); Wed, 1 Jun 2022 19:13:36 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D205EE35 for ; Wed, 1 Jun 2022 16:13:35 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id n10so3347746pjh.5 for ; Wed, 01 Jun 2022 16:13:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=message-id:date:mime-version:user-agent:content-language:to:cc :references:from:subject:in-reply-to:content-transfer-encoding; bh=avJEEUv9TwoQxp5JDkd8KBVEKkaG/hcczSaXXKCQTHE=; b=h6rYPumu3aj7CwpyNNPdUBXXKfiKD8gGz3B045bl8MP0WngMSGl8ix8vfn1QpNRDcP Ac1tUfpAbNsIYSozT3egxW8Gs72wfXxjoR414paoffvEDq2yomjCnROY+FnfRXriMFaV 3zqE0L/mXRymN13gHNbzn+4Qlhh5f+QVP9Ab2rKwsxos1HFwstgJUYtfWX1v/n1IMSP4 9utIf6WHEWywo8eukE6uJOVedlTD0LRCllXWYs5bDUzrmQp3o0xF7rL2ZXa2P24qXL6Y OML1E941w/IBG/gK3vrnI/Cz/RhLiGAHokjs3WJfpVO92zUZ1NMNxQEzlRL8X1+pvuww Z/Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:subject:in-reply-to :content-transfer-encoding; bh=avJEEUv9TwoQxp5JDkd8KBVEKkaG/hcczSaXXKCQTHE=; b=IkVF/i2KQ+uCOeMi/UlwupJfQ8Js2hFy3gVN5FO1M+nmtPpIyAawCjMiv1c6KmOUMR GW/ZfQG2YKmDe//0xtDJ1sNSGdqlIDfcVuEJOzMOJn9nXb6K/nCTpP/5Aanw6gwIkaB+ qqQ4oxLnwpIK3nngNOtr38DzqVGNXAKO6KuVbJUWaGTELqFh3UIb0TsskayQD6QjVK+1 s8JOnlylKIpDyTpUvuc22NGjvGy9kxx/fAk7YpXlZ1xvf6OkiNPs9DwEDtMkO94p+twY P1W4WkabS3ujy432cEkZgwEYc1d2ano7jgEqSHQS5Z6yN2XTgsKghHbvraxKeKQL9cj6 zHiA== X-Gm-Message-State: AOAM530uYJDTr/IKsDj8jTH+GLzwqQRiv5dMH/LWiAX/5HY4I5p7A1aI 6wbMBsVzNyzWKQghTdui7f2QQg== X-Received: by 2002:a17:902:bd42:b0:164:bea:65bf with SMTP id b2-20020a170902bd4200b001640bea65bfmr1745502plx.111.1654125214527; Wed, 01 Jun 2022 16:13:34 -0700 (PDT) Received: from [192.168.254.36] ([50.39.160.154]) by smtp.gmail.com with ESMTPSA id n19-20020aa78a53000000b0051bb0be7109sm1990292pfa.78.2022.06.01.16.13.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Jun 2022 16:13:33 -0700 (PDT) Message-ID: <0babd7df-bdef-9edc-3682-1144bc0c2d2b@linaro.org> Date: Wed, 1 Jun 2022 16:13:32 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: Tejun Heo , =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Zefan Li , Johannes Weiner , Bui Quang Minh References: <20220525151517.8430-1-mkoutny@suse.com> <20220525151517.8430-3-mkoutny@suse.com> <20220525161455.GA16134@blackbody.suse.cz> From: Tadeusz Struk Subject: Re: [PATCH 2/2] cgroup: Use separate work structs on css release path In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/26/22 11:15, Tejun Heo wrote: > Hello, Michal. > > On Thu, May 26, 2022 at 11:56:34AM +0200, Michal Koutný wrote: >> // ref=A: initial state >> kill_css() >> css_get // ref+=F == A+F: fuse >> percpu_ref_kill_and_confirm >> __percpu_ref_switch_to_atomic >> percpu_ref_get >> // ref += 1 == A+F+1: atomic mode, self-protection >> percpu_ref_put >> // ref -= 1 == A+F: kill the base reference >> [via rcu] >> percpu_ref_switch_to_atomic_rcu >> percpu_ref_call_confirm_rcu >> css_killed_ref_fn == refcnt.confirm_switch >> queue_work(css->destroy_work) (1) >> [via css->destroy_work] >> css_killed_work_fn == wq.func >> offline_css() // needs fuse >> css_put // ref -= F == A: de-fuse >> percpu_ref_put >> // ref -= 1 == A-1: remove self-protection >> css_release // A <= 1 -> 2nd queue_work explodes! > > I'm not sure I'm following it but it's perfectly fine to re-use the work > item at this point. The work item actually can be re-cycled from the very > beginning of the work function. The only thing we need to make sure is that > we don't css_put() prematurely to avoid it being freed while we're using it. Yes, it is ok to reuse a work struct, but it's not ok to have the same work struct enqueued twice on the same WQ when list debug is enabled. That's why we are getting this "BUG: corrupted list.." > For the sharing to be a problem, we should be queueing the release work item > while the destroy instance is still pending, and if that is the case, it > doesn't really matter whether we use two separate work items or not. We're > already broken and would just be shifting the problem to explode elsewhere. > > The only possibility that I can think of is that somehow we're ending up > with an extra css_put() somewhere thus triggering the release path > prematurely. If that's the case, we'll prolly need to trace get/puts to find > out who's causing the ref imbalance. That's right. Michal was on the right track for the kill_css() part. What I think is going on is that once css_create() fails then cgroup_subtree_control_write() ends up calling first kill_css() and then css_put() on the same css, I think it's &cgrp->self of the kernfs_node. The each_live_descendant_post() also iterates on the root. Here is the call flow (sorry for long lines): cgroup_subtree_control_write(of)->cgroup_apply_control(cgrp)->cgroup_apply_control_enable(cgrp)->css_create() <- fails here and returns error | |-> cgroup_finalize_control(cgrp)->cgroup_apply_control_disable(cgrp)->each_live_descendant_post(cgrp)->kill_css()->percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn) <- this triggers css_killed_ref_fn() to be called | | css_killed_ref_fn() <- first css->destroy_work enqueue | | | |-> INIT_WORK(&css->destroy_work, css_killed_work_fn); queue_work(cgroup_destroy_wq, &css->destroy_work); | | |-> goto out_unlock; | | | |-> cgroup_kn_unlock(kernfs_node)->cgroup_put(cgrp)->css_put(&cgrp->self)->percpu_ref_put(&css->refcnt) <- this triggers css_release() to be called | | css_release(percpu_ref) <- second css->destroy_work enqueue | |-> INIT_WORK(&css->destroy_work, css_release_work_fn); queue_work(cgroup_destroy_wq, &css->destroy_work) <- and it fails here with BUG: corrupted list in insert_work; list_add corruption. What seems to work for me as the simplest fix is to prevent enqueuing a dying css in css_release() as below. Please let me know if that makes sense to you. diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1779ccddb734..5618211487cc 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5210,8 +5210,10 @@ static void css_release(struct percpu_ref *ref) struct cgroup_subsys_state *css = container_of(ref, struct cgroup_subsys_state, refcnt); - INIT_WORK(&css->destroy_work, css_release_work_fn); - queue_work(cgroup_destroy_wq, &css->destroy_work); + if (!(css->flags & CSS_DYING)) { + INIT_WORK(&css->destroy_work, css_release_work_fn); + queue_work(cgroup_destroy_wq, &css->destroy_work); + } } static void init_and_link_css(struct cgroup_subsys_state *css, -- Thanks, Tadeusz