Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp5545073rwl; Sun, 8 Jan 2023 17:35:52 -0800 (PST) X-Google-Smtp-Source: AMrXdXtOi/9Hoxr+bKVYhZJlur5NLFQzN2Lde1dA9Pq8Mogm061cEqDJNkzaDPEHoPxwDB9s+HHY X-Received: by 2002:a05:6a21:1691:b0:a3:9f32:a9d1 with SMTP id np17-20020a056a21169100b000a39f32a9d1mr69679853pzb.31.1673228152491; Sun, 08 Jan 2023 17:35:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673228152; cv=none; d=google.com; s=arc-20160816; b=olxGN9ucV2XWC1Pcpu+MepdimxFFy0NF/HNaS/SvPsStr8+HBRJesok6jwG9rWLzZj y2qzvidRmpqTGvNDfEJDGQF4OmWcjFmdHzzM1E2AZ0j6DWozUFZfnPKbdcms/4PV+IF4 23JBqBwC+UvhHMIG1RZHG7yXrlzXrNrGbyYUxXxiI7lqs3P72w4vhXGx4bcszm6jmJxi KEcRad+WBGIVOibNvX31jAjc6DcNRdnIZLA5tUo4fijVQ58c1kcpOo3W3TOv+jRVyzUB 3ttWcLinm0cUe2mCmW6sI5w0FP4a4attnncPDFwKI4L9FeHr0RGm8iIpxyGc9rwEHRb2 xcfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=Y1T/VBDghsOsQAK4ql+r7nrFjII98IhcCtUidFl2PMM=; b=tnKM/eD5C2HZCrXv3RSkS7ClLWOETHd4xzAFNVsrz0Crb+r4lAjYXMP96YtxBTyhoY klUjGy+DWTajuAuDpFPr24GAeco///osl0zcWUQnIKZn5V3g+0u6/1z1qkXhswia0RJY bHW7ib0PyA2sI6zuvOmOPmgUdUruGLaGnrdl9CGxvdRQvXwq6s791os41wY+rta3z6Au +07nRtcLP/837Vr4Z0W1oUOdM1eAoWxjPiZ9teSDBSraPogG1ooSzOJ7HqAGMJpoH1Ns 6Ky5Z4DXP52ehdB7YVupLZNgjlIQLyYa4rHj+x5DT6/PMKx41mm6dWclwGpDrApDTeGw SkHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e20-20020a631e14000000b00498006f20aasi8660573pge.259.2023.01.08.17.35.45; Sun, 08 Jan 2023 17:35:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233690AbjAIBcy (ORCPT + 52 others); Sun, 8 Jan 2023 20:32:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231325AbjAIBcx (ORCPT ); Sun, 8 Jan 2023 20:32:53 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72C7AB86E; Sun, 8 Jan 2023 17:32:51 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4NqxJs70Wzz4f3nTc; Mon, 9 Jan 2023 09:32:45 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAHvbC+brtjXdp4BQ--.42807S3; Mon, 09 Jan 2023 09:32:48 +0800 (CST) Subject: Re: [PATCH v2 1/2] blk-iocost: add refcounting for iocg To: Tejun Heo , Yu Kuai Cc: hch@infradead.org, josef@toxicpanda.com, axboe@kernel.dk, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, "yukuai (C)" References: <20221227125502.541931-1-yukuai1@huaweicloud.com> <20221227125502.541931-2-yukuai1@huaweicloud.com> <7dcdaef3-65c1-8175-fea7-53076f39697f@huaweicloud.com> From: Yu Kuai Message-ID: <875eb43e-202d-5b81-0bff-ef0434358d99@huaweicloud.com> Date: Mon, 9 Jan 2023 09:32:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAHvbC+brtjXdp4BQ--.42807S3 X-Coremail-Antispam: 1UD129KBjvJXoWxXFyrCF4xXw1DCFy8tr4fZrb_yoW5Ww1rpF WfK3W5urs2kr97KFnrK3W8WFyFvrZ8JFW5t393Wr9Iyr1Dur1IkrW7trZ8uFyrXFs3CF4S vr4rAry8AF1DAFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9Y14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCT nIWIevJa73UjIFyTuYvjfUoOJ5UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, 在 2023/01/07 4:18, Tejun Heo 写道: > On Fri, Jan 06, 2023 at 09:08:45AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/01/06 2:32, Tejun Heo 写道: >>> On Thu, Jan 05, 2023 at 09:14:07AM +0800, Yu Kuai wrote: >>>> 1) is related to blkg, while 2) is not, hence refcnting from blkg can't >>>> fix the problem. refcnting from blkcg_policy_data should be ok, but I >>>> see that bfq already has the similar refcnting, while other policy >>>> doesn't require such refcnting. >>> >>> Hmm... taking a step back, wouldn't this be solved by moving the first part >>> of ioc_pd_free() to pd_offline_fn()? The ordering is strictly defined there, >>> right? >>> >> >> Moving first part to pd_offline_fn() has some requirements, like what I >> did in the other thread: >> >> iocg can be activated again after pd_offline_fn(), which is possible >> because bio can be dispatched when cgroup is removed. I tried to avoid >> that by: >> >> 1) dispatch all throttled bio io ioc_pd_offline() >> 2) don't throttle bio after ioc_pd_offline() >> >> However, you already disagreed with that. ???? > > Okay, I was completely wrong while I was replying to your original patch. > Should have looked at the code closer, my apologies. > > What I missed is that pd_offline doesn't happen when the cgroup goes > offline. Please take a look at the following two commits: > > 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished") > d866dbf61787 ("blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it") > These two commits are applied for three years, I don't check the details yet but they seem can't guarantee that no io will be handled by rq_qos_throttle() after pd_offline_fn(), because I just reproduced this in another problem: f02be9002c48 ("block, bfq: fix null pointer dereference in bfq_bio_bfqg()") User thread can issue async io, and io can be throttled by blk-throttle(not writeback), then user thread can exit and cgroup can be removed before such io is dispatched to rq_qos_throttle. > After the above two commits, ->pd_offline_fn() is called only after all > possible writebacks are complete, so it shouldn't allow mass escapes to > root. With writebacks out of the picture, it might be that there can be no > further IOs once ->pd_offline_fn() is called too as there can be no tasks > left in it and no dirty pages, but best to confirm that. > > So, yeah, the original approach you took should work although I'm not sure > the patches that you added to make offline blkg to bypass are necessary > (that also contributed to my assumption that there will be more IOs on those > blkg's). Have you seen more IOs coming down the pipeline after offline? If > so, can you dump some backtraces and see where they're coming from? Currently I'm sure such IOs can come from blk-throttle, and I'm not sure yet but I also suspect io_uring can do this. Thanks, Kuai