Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp6630936rwl; Mon, 9 Jan 2023 10:47:33 -0800 (PST) X-Google-Smtp-Source: AMrXdXse6AYZ/rOBZHtcUUK0N4PtuYDhcIf/yJcVOanb2PXGqHNIaESiY8OOnX49JGq7B+BCnXzV X-Received: by 2002:a62:4e94:0:b0:56b:3758:a2d9 with SMTP id c142-20020a624e94000000b0056b3758a2d9mr64473946pfb.21.1673290053002; Mon, 09 Jan 2023 10:47:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673290052; cv=none; d=google.com; s=arc-20160816; b=qCojBPZS+kcD/E0yRorRc2RzgIgq/OsKHdXr00OEsqVUP3afIGJ7t/8z57JGqPode0 IC5l7iVgl1Nkb93oBRk2bPep1MP6WICVJqBJgklKWWyLffhAnE+RKLoPZkV2+CuZ70/C paOJoDC7xAsgs2CAc0W3LwI9OvMmYL7+zl7fuYfsOvYMKy0km8swcXeEUIuhT9uWOlYa 1QGIn2zo5IpgsZrBHaGnQcixbsrUuYJmmFnINeRQXA8xd0DK6//ExKMIe0kFaNEBKrsh iNIa/tx0/TFISI7X9Edqb9YS+466WKMpNbGs3ve85kcCu0OVZFs+EWx4FzhtkNTCsUAI yJmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=NgVKC3NmcxxKrnQWmSerqqDaGhLOMoPMCbrmW5GVKVk=; b=soOuR1MbQ5x6FdQh70CEM5yMOK0K/5olPWEkYqxIslMeAvgyUfoehXiZYVq8jxE55a SP/LGz7kdSiNwRwny/Xmf3PaLKn57S6V5WGEZwOETQCZZe/0ranfXzgPrMpS6/OxsjKS idteVbuYOkH8HZEbocnzP4WYmc5ufSNG6ZxblZEy/6XNqZzDxeHWLSD2pE2nWY96j1kC ahRDhuPFJ1zehLqh8spMED01p3cqPFr2fG60+XogmGLFlYSWhOr7v04zDNblYDB3YrFT Pm/BfAeb/kceU9UNKdGjqkjDAel1Id/RvW3hDU2VUk/K1cH4UFSJwMPHeR8fURn2VYCp H2gA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qewQu0vk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f22-20020a056a0022d600b005785dbe7066si10120357pfj.187.2023.01.09.10.47.26; Mon, 09 Jan 2023 10:47:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qewQu0vk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237290AbjAIS0l (ORCPT + 55 others); Mon, 9 Jan 2023 13:26:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234723AbjAIS0T (ORCPT ); Mon, 9 Jan 2023 13:26:19 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F5A8AE62; Mon, 9 Jan 2023 10:23:37 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id 36so6459978pgp.10; Mon, 09 Jan 2023 10:23:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=NgVKC3NmcxxKrnQWmSerqqDaGhLOMoPMCbrmW5GVKVk=; b=qewQu0vk4Nwlo4LFhzKWR6EMnRJh/XhktdhcuuMLJzHXnLsWf8U7fgWVovAVzLQgLe DeAEVAeyCjNwGPETUlQL6Nb17w8G/1bsiS/0/qc7clbmBVPMWLaVQ9JljHwXzt5cJA2+ aSxCPG8HvjEcZvwjL+wRu1fZKx1Qx/DpAB/9v4gjX+99NiJv/8/866g3KaKAM6DAehOY 7TbHUhRsPDtNNqHcqjxNHyGFPqMQ6WIhMiuFxm0AjfzwupH8ibQAs/oPDTIPyoHYIMsx 1GpJDL5vWZJu1nQq+wpHacVi3pRyjumovgtqe7jEPXeDvyWSmldn+4FUP6xZBBocggZx rWLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NgVKC3NmcxxKrnQWmSerqqDaGhLOMoPMCbrmW5GVKVk=; b=1kbkWecXl+EsNQfkkVYRjt6iDa6t4VpxEJUGAPPOHp3z29wUFuIrYR/hH4XbK8FhZ0 JUibNrgrenh+djkNxp6Jv7hvGDeFuHm1U2RDGczaoBl6VLRZ66U60M5aky373NLZz8Yx i23No5we+DvUX/nJyGO4HJEz4UcTW9AS+WHhik5TRYB02GcsoMcpJ1tKbdhEmkA705+C fy66gPouTrpxcORD2kyGDXFcASYRPKO7amACJhZvDto4hfkeOaXFbpGDshdvIopRfbP7 sM2x6C/6v1P9YKLds/bbFo82h/HRy2/G/B52pL/KxtZV9iBqS+GZoLxYR+G1i+xxxoZu mJwg== X-Gm-Message-State: AFqh2kqpSqbwFjq6ZHbsWy72AifVNv/gj5JjtvBfGJG6KVRgZqMjpU8I L+UpmNCY1iWtIjcG8yCqTHM= X-Received: by 2002:a62:506:0:b0:56c:7216:fbc6 with SMTP id 6-20020a620506000000b0056c7216fbc6mr65035427pff.30.1673288616438; Mon, 09 Jan 2023 10:23:36 -0800 (PST) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id z24-20020aa79498000000b005821db4fd84sm6361045pfk.131.2023.01.09.10.23.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Jan 2023 10:23:36 -0800 (PST) Sender: Tejun Heo Date: Mon, 9 Jan 2023 08:23:34 -1000 From: Tejun Heo To: Yu Kuai Cc: hch@infradead.org, josef@toxicpanda.com, axboe@kernel.dk, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, "yukuai (C)" Subject: Re: [PATCH v2 1/2] blk-iocost: add refcounting for iocg Message-ID: References: <20221227125502.541931-1-yukuai1@huaweicloud.com> <20221227125502.541931-2-yukuai1@huaweicloud.com> <7dcdaef3-65c1-8175-fea7-53076f39697f@huaweicloud.com> <875eb43e-202d-5b81-0bff-ef0434358d99@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <875eb43e-202d-5b81-0bff-ef0434358d99@huaweicloud.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Jan 09, 2023 at 09:32:46AM +0800, Yu Kuai wrote: > > 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished") > > d866dbf61787 ("blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it") > > These two commits are applied for three years, I don't check the details > yet but they seem can't guarantee that no io will be handled by > rq_qos_throttle() after pd_offline_fn(), because I just reproduced this > in another problem: > > f02be9002c48 ("block, bfq: fix null pointer dereference in bfq_bio_bfqg()") > > User thread can issue async io, and io can be throttled by > blk-throttle(not writeback), then user thread can exit and cgroup can be > removed before such io is dispatched to rq_qos_throttle. I see. > > After the above two commits, ->pd_offline_fn() is called only after all > > possible writebacks are complete, so it shouldn't allow mass escapes to > > root. With writebacks out of the picture, it might be that there can be no > > further IOs once ->pd_offline_fn() is called too as there can be no tasks > > left in it and no dirty pages, but best to confirm that. > > > > So, yeah, the original approach you took should work although I'm not sure > > the patches that you added to make offline blkg to bypass are necessary > > (that also contributed to my assumption that there will be more IOs on those > > blkg's). Have you seen more IOs coming down the pipeline after offline? If > > so, can you dump some backtraces and see where they're coming from? > > Currently I'm sure such IOs can come from blk-throttle, and I'm not sure > yet but I also suspect io_uring can do this. Yeah, that's unfortunate. There are several options here: 1. Do what you originally suggested - bypass to root after offline. I feel uneasy about this. Both iolatency and throtl clear their configs on offline but that's punting to the parent. For iocost it'd be bypassing all controls, which can actually be exploited. 2. Make all possible IO issuers use blkcg_[un]pin_online() and shift the iocost shutdown to pd_offline_fn(). This likely is the most canonical solution given the current situation but it's kinda nasty to add another layer of refcnting all over the place. 3. Order blkg free so that parents are never freed before children. You did this by adding refcnts in iocost but shouldn't it be possible to simply shift blkg_put(blkg->parent) in __blkg_release() to blkg_free_workfn()? #3 seems the most logical to me. What do you thinK? Thanks. -- tejun