Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp5944872ioo; Wed, 1 Jun 2022 16:41:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxD0331R6uLkw0fNM0+PIimxUPwqlACNqYdlffnYB6NN4KeNduCbcXu6p8jkyu8u3wrcu/J X-Received: by 2002:a63:69c5:0:b0:3fb:a606:819a with SMTP id e188-20020a6369c5000000b003fba606819amr1588288pgc.81.1654126887510; Wed, 01 Jun 2022 16:41:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654126887; cv=none; d=google.com; s=arc-20160816; b=DYDyMNtyeIzLYcauSHwNwGv79TvFzxuBBYSrh+VsMrsidnwSb5uxrHdfHu3ccF7jAF M+iPlb+Sa9Xj9iuiWDVF2F0R+akD82j0P2Wr6VyFnCg07KhJVh0+w1RaS4f06VhqcN2F zV3nvI5zf84HEBSAsNFnLO6hE+SmEcKzdRfUCoJQ+EZW0pGJPRrWC9SQiA3YfraHe8m2 RSfrxqqu97629xzq05jVAt457yBNDarg0OwVtStgUom7kBr62e/6fCUiH1TOYIqmfbJo p3apyBSpozgt9WprYYrwtD/VI/vtgpGDGqDAJ1a3KofD3QfMz2tAQKljH9IdKdJHTvNV q6kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=TEAUJ/ucXI4eKAlLgmvtjNFHQlDUXBOnD+LHuiraz3w=; b=NgbeL1YQDZ3IlWDexF3YEySLfkGj1tIzN1ZQLKXc9OZkPEcC3j2oz1/L4QHcJ4eHOW VYw1CvHhfZ7WW8YI4KS5ekVNQSD8SQmAmX+BdTZK7RTSX1ULxKXi7c++BeDhoguKht/M 9k6MYzrDU1ZCMQluwwmapV1V1F3O7z/WME4M3dt1L0WLzOdvBgT+biQXqfPoBIl0CaG7 9RCfPoTLSk35O871WmVJF3Pmn7CRG9YzA0S6neC3uMEFwB6v6ptWLYHxAXa6Klz0kYnh AiMHD++5uiEVnKg7UCVuXUrnjLyAPuDnuvGTQsoBnhmPrmFv4BDGaz999Jwaeb5qHcnw LF9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="g+g2a/h5"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id lx12-20020a17090b4b0c00b001e67c8309a5si1311967pjb.9.2022.06.01.16.41.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 16:41:27 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="g+g2a/h5"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 273361339D2; Wed, 1 Jun 2022 16:37:32 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232541AbiFAXhW (ORCPT + 99 others); Wed, 1 Jun 2022 19:37:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbiFAXhV (ORCPT ); Wed, 1 Jun 2022 19:37:21 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A21251090 for ; Wed, 1 Jun 2022 16:37:19 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id l20-20020a17090a409400b001dd2a9d555bso3436170pjg.0 for ; Wed, 01 Jun 2022 16:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=message-id:date:mime-version:user-agent:content-language:to:cc :references:from:subject:in-reply-to:content-transfer-encoding; bh=TEAUJ/ucXI4eKAlLgmvtjNFHQlDUXBOnD+LHuiraz3w=; b=g+g2a/h5fj4ekdhiBZVoem39IesHnm9DOBXVdr0u+GBT1FfE/6soJZwVcANNucrA2v 9gLc+e6E0h/HUqewW/gcDTlpIKOE9B/nvi5VkwCdzUboCkGQ8HoHp+k1nY6S6no+hpdd R85LIIF3A+LnmFfmdZrp2gXNHbADZQMBIqyAxaLPbFeBYC7LfXf05cPAIJ/HB+YxrG8+ 0AuR6f8wWikFigxTB9vlidalsm5Ip4SDKp+5Jve4ENy8SNmi3hitf5WMdN1FvdBjp6dE WbwdwFOlfOUbT+9ZoCMY9tktSqMUi4+SKtRo1qnmshxELME/PknA+zT+oXorZKxdTZtk ymBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:subject:in-reply-to :content-transfer-encoding; bh=TEAUJ/ucXI4eKAlLgmvtjNFHQlDUXBOnD+LHuiraz3w=; b=xkuWuJvkNKt02b0yIP8iGQS2P4nZVFxFfYE8Hzz69Qtdrc4pMs5iInJgoPTzJuUuG/ AjBJYdJI1xmn7PELp2171jedWfKeLaqP5Hk1uU9cSSQhbIV0dUCPtKyuYvwBhlQiCK6H LiozbTyVD3eRf1Xlp2H2qIe+TeYgZZ3huFucJHX4MXKAMfhmgoMxpJmIHr5/xhdbTsWN H7WORHQEdmCXg/x4rsl8Quj9bLR94kDGq2SC9TSGVYiZ1Tt5CmaJf1lFMcu/vy49YN+r +0uvQbvzZLmwpIMc4wsBTBGePtLKkwkelx3uBzbwS9sZJ0WwDgjwNSZhaL8SJqgNGcrH 9m7g== X-Gm-Message-State: AOAM5307BS28m5tdz9Vq8u3YuRhNCqU96kAXcgN7qJz3kYbZPWIlKy// Yim+IZlbqqKrQH6W1SXgTkufgQ== X-Received: by 2002:a17:902:7594:b0:15e:bbac:8d49 with SMTP id j20-20020a170902759400b0015ebbac8d49mr1856801pll.124.1654126639097; Wed, 01 Jun 2022 16:37:19 -0700 (PDT) Received: from [192.168.254.36] ([50.39.160.154]) by smtp.gmail.com with ESMTPSA id m11-20020a62f20b000000b0050dc76281fasm1976744pfh.212.2022.06.01.16.37.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Jun 2022 16:37:18 -0700 (PDT) Message-ID: <1fb4d8d7-ccc0-b020-715e-38c2dfd94c23@linaro.org> Date: Wed, 1 Jun 2022 16:37:17 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: Tejun Heo Cc: =?UTF-8?Q?Michal_Koutn=c3=bd?= , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Zefan Li , Johannes Weiner , Bui Quang Minh References: <20220525151517.8430-1-mkoutny@suse.com> <20220525151517.8430-3-mkoutny@suse.com> <20220525161455.GA16134@blackbody.suse.cz> <0babd7df-bdef-9edc-3682-1144bc0c2d2b@linaro.org> From: Tadeusz Struk Subject: Re: [PATCH 2/2] cgroup: Use separate work structs on css release path In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/1/22 16:20, Tejun Heo wrote: > On Wed, Jun 01, 2022 at 04:13:32PM -0700, Tadeusz Struk wrote: >>> On Thu, May 26, 2022 at 11:56:34AM +0200, Michal Koutný wrote: >>>> // ref=A: initial state >>>> kill_css() >>>> css_get // ref+=F == A+F: fuse >>>> percpu_ref_kill_and_confirm >>>> __percpu_ref_switch_to_atomic >>>> percpu_ref_get >>>> // ref += 1 == A+F+1: atomic mode, self-protection >>>> percpu_ref_put >>>> // ref -= 1 == A+F: kill the base reference >>>> [via rcu] >>>> percpu_ref_switch_to_atomic_rcu >>>> percpu_ref_call_confirm_rcu >>>> css_killed_ref_fn == refcnt.confirm_switch >>>> queue_work(css->destroy_work) (1) >>>> [via css->destroy_work] >>>> css_killed_work_fn == wq.func >>>> offline_css() // needs fuse >>>> css_put // ref -= F == A: de-fuse >>>> percpu_ref_put >>>> // ref -= 1 == A-1: remove self-protection >>>> css_release // A <= 1 -> 2nd queue_work explodes! >>> >>> I'm not sure I'm following it but it's perfectly fine to re-use the work >>> item at this point. The work item actually can be re-cycled from the very >>> beginning of the work function. The only thing we need to make sure is that >>> we don't css_put() prematurely to avoid it being freed while we're using it. >> >> Yes, it is ok to reuse a work struct, but it's not ok to have the same >> work struct enqueued twice on the same WQ when list debug is enabled. >> That's why we are getting this "BUG: corrupted list.." > > The above scenario isn't that tho. Once the work item starts executing, wq > doesn't care about what happens to it and as killed_work_fn is holding a > reference, the release scheduling shouldn't happen before it starts > executing unless somebody is screwing up the refcnting. > >> That's right. Michal was on the right track for the kill_css() part. >> What I think is going on is that once css_create() fails then >> cgroup_subtree_control_write() ends up calling first kill_css() and >> then css_put() on the same css, I think it's &cgrp->self of the kernfs_node. >> The each_live_descendant_post() also iterates on the root. >> Here is the call flow (sorry for long lines): >> >> cgroup_subtree_control_write(of)->cgroup_apply_control(cgrp)->cgroup_apply_control_enable(cgrp)->css_create() <- fails here and returns error >> | >> |-> cgroup_finalize_control(cgrp)->cgroup_apply_control_disable(cgrp)->each_live_descendant_post(cgrp)->kill_css()->percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn) <- this triggers css_killed_ref_fn() to be called >> | >> | css_killed_ref_fn() <- first css->destroy_work enqueue >> | | >> | |-> INIT_WORK(&css->destroy_work, css_killed_work_fn); queue_work(cgroup_destroy_wq, &css->destroy_work); >> | >> | >> |-> goto out_unlock; >> | | >> | |-> cgroup_kn_unlock(kernfs_node)->cgroup_put(cgrp)->css_put(&cgrp->self)->percpu_ref_put(&css->refcnt) <- this triggers css_release() to be called >> | >> | >> css_release(percpu_ref) <- second css->destroy_work enqueue >> | >> |-> INIT_WORK(&css->destroy_work, css_release_work_fn); queue_work(cgroup_destroy_wq, &css->destroy_work) <- and it fails here with BUG: corrupted list in insert_work; list_add corruption. >> >> >> What seems to work for me as the simplest fix is to prevent enqueuing a dying >> css in css_release() as below. Please let me know if that makes sense to you. >> >> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c >> index 1779ccddb734..5618211487cc 100644 >> --- a/kernel/cgroup/cgroup.c >> +++ b/kernel/cgroup/cgroup.c >> @@ -5210,8 +5210,10 @@ static void css_release(struct percpu_ref *ref) >> struct cgroup_subsys_state *css = >> container_of(ref, struct cgroup_subsys_state, refcnt); >> - INIT_WORK(&css->destroy_work, css_release_work_fn); >> - queue_work(cgroup_destroy_wq, &css->destroy_work); >> + if (!(css->flags & CSS_DYING)) { >> + INIT_WORK(&css->destroy_work, css_release_work_fn); >> + queue_work(cgroup_destroy_wq, &css->destroy_work); >> + } > > When the problem is ref imbalance, how can above be the solution? Of course > release path won't cause an issue if they don't run, but we still need to > free the thing, right? Yes, but as far as I can see the percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn) doesn't change the value of the refcnt, it just causes the css_killed_ref_fn() to be called on it. Only css_get() & css_put() modify the refcnt value. And for the "free the thing" the css_killed_work_fn() does that. It calls offline_css(css) and css_put(css) for the whole css hierarchy. -- Thanks, Tadeusz