Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp642105pxb; Tue, 5 Apr 2022 17:02:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzOv8A5x+LM1gNApOb/59VL3TYTCTI2jwDAxt7GfLA2HUE/SeKAECIqBhrhJNl5AFoxJj08 X-Received: by 2002:a17:902:ba8e:b0:151:ed65:fda4 with SMTP id k14-20020a170902ba8e00b00151ed65fda4mr5975127pls.127.1649203334096; Tue, 05 Apr 2022 17:02:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649203334; cv=none; d=google.com; s=arc-20160816; b=qA87a2BzgGhqP/jrhR5NEFOaQJZ4Irz6A3CEq/uy+SMy3dd8RgD2UcqUApsRzvioJi maeeRLQWwrNY/4vEZcAJxLqmca1XEnh8nEsFWfRPVQNskOerXqFERC4q9pJhTDUJCuRA OwpFrvSPl40XBzzui7/MOoxhCW+7GrRb2Yb8eBJnwnlN0is1XU5Fjk4KsG2W4pu98inl pkwI9eeH/Wx6FjuoUFFovDQNELFdL0r9xqrNSRleb6hxbBGYPxOv92bxxekupkmkVnkJ O/Rfhi6lW182g5TK0rjUp1YNJijDfZL+0O4fmMuGnp3t7RnVxm2UCFsp+2dfLplqxYjP uYvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=tDz/ooOA70/Dur+V7XxRJlFZ9jS2KGDURocU4NtuJEE=; b=Jj+4KY8Ez5ZcTgoi15F8Qo64VJ+Bma/ef7LW6xwKbf6QICp6TFfL06e9tHYl1n/CVG hghy0CFjMr0bpV6DcoYDEaQbsfsrpay92f6LWpDQb5gIvvmljk2BGBECq/L2hfdjqXd8 KVpBiEO2jy6j/aPXs6aQ6zqjQ2CFCZQDx0fkcnZN4dcTmYmKkYzZJyFWpXDzV86zu0/Z BZ8PVmaswW3flBcIF015QoRZm5l6zCskWLNUANg8PypvJwAkpD04qHuoDHEXuFIy+ofJ eeMriuei34aC9U628Sk9gbSe0mPMZJXFse7NuG238cElJ4CCx6bSx6nHqOGAVvrgFn2x +OHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xCUqsfv1; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id p10-20020a056a000b4a00b004fa3a8dff4fsi15255414pfo.6.2022.04.05.17.02.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 17:02:14 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xCUqsfv1; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EEC6C1DFDF8; Tue, 5 Apr 2022 16:47:39 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234539AbiDELZk (ORCPT + 99 others); Tue, 5 Apr 2022 07:25:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242965AbiDEIto (ORCPT ); Tue, 5 Apr 2022 04:49:44 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40EF68CCC0; Tue, 5 Apr 2022 01:38:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 84FDCB81C6C; Tue, 5 Apr 2022 08:37:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3E59C385A1; Tue, 5 Apr 2022 08:37:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649147858; bh=rWC6esy9jObbQvfxjLMvxZkldHaHmYvKo5ECDkq4WzA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xCUqsfv1WrE6EBTL6I3vOm5bfacJXA/NZcV7EbXAv5nLNGSurjaOErABgAuZ5jqaV kBFjtZZ37X1lLS1gAUnjvnie4uT/ESU35DqZB+lM48kmgs3VOxApp2JfkJ82MRSHjW ulpQD3dpvRwU2/cQsVBwcBiMXy/7tzMV8hHYWU4w= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tejun Heo , Ming Lei , Yu Kuai , Jens Axboe Subject: [PATCH 5.16 0146/1017] block: fix rq-qos breakage from skipping rq_qos_done_bio() Date: Tue, 5 Apr 2022 09:17:39 +0200 Message-Id: <20220405070358.538499361@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070354.155796697@linuxfoundation.org> References: <20220405070354.155796697@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Tejun Heo commit aa1b46dcdc7baaf5fec0be25782ef24b26aa209e upstream. a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set. While this fixed a potential oops, it also broke blk-iocost by skipping the done_bio callback for merged bios. Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(), rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED distinguishing the former from the latter. rq_qos_done_bio() is not called for bios which wenth through rq_qos_merge(). This royally confuses blk-iocost as the merged bios never finish and are considered perpetually in-flight. One reliably reproducible failure mode is an intermediate cgroup geting stuck active preventing its children from being activated due to the leaf-only rule, leading to loss of control. The following is from resctl-bench protection scenario which emulates isolating a web server like workload from a memory bomb run on an iocost configuration which should yield a reasonable level of protection. # cat /sys/block/nvme2n1/device/model Samsung SSD 970 PRO 512GB # cat /sys/fs/cgroup/io.cost.model 259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025 # cat /sys/fs/cgroup/io.cost.qos 259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00 # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96% The isolation result of 58.12% is close to what this device would show without any IO control. Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and calling rq_qos_done_bio() on them too. For consistency and clarity, rename BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into rq_qos_done_bio() so that it's next to the code paths that set the flags. With the patch applied, the above same benchmark shows: # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81 lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68 Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0% Signed-off-by: Tejun Heo Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") Cc: stable@vger.kernel.org # v5.15+ Cc: Ming Lei Cc: Yu Kuai Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/bio.c | 3 +-- block/blk-iolatency.c | 2 +- block/blk-rq-qos.h | 20 +++++++++++--------- include/linux/blk_types.h | 3 ++- 4 files changed, 15 insertions(+), 13 deletions(-) --- a/block/bio.c +++ b/block/bio.c @@ -1462,8 +1462,7 @@ again: if (!bio_integrity_endio(bio)) return; - if (bio->bi_bdev && bio_flagged(bio, BIO_TRACKED)) - rq_qos_done_bio(bdev_get_queue(bio->bi_bdev), bio); + rq_qos_done_bio(bio); if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) { trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio); --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -598,7 +598,7 @@ static void blkcg_iolatency_done_bio(str int inflight = 0; blkg = bio->bi_blkg; - if (!blkg || !bio_flagged(bio, BIO_TRACKED)) + if (!blkg || !bio_flagged(bio, BIO_QOS_THROTTLED)) return; iolat = blkg_to_lat(bio->bi_blkg); --- a/block/blk-rq-qos.h +++ b/block/blk-rq-qos.h @@ -177,20 +177,20 @@ static inline void rq_qos_requeue(struct __rq_qos_requeue(q->rq_qos, rq); } -static inline void rq_qos_done_bio(struct request_queue *q, struct bio *bio) +static inline void rq_qos_done_bio(struct bio *bio) { - if (q->rq_qos) - __rq_qos_done_bio(q->rq_qos, bio); + if (bio->bi_bdev && (bio_flagged(bio, BIO_QOS_THROTTLED) || + bio_flagged(bio, BIO_QOS_MERGED))) { + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + if (q->rq_qos) + __rq_qos_done_bio(q->rq_qos, bio); + } } static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio) { - /* - * BIO_TRACKED lets controllers know that a bio went through the - * normal rq_qos path. - */ if (q->rq_qos) { - bio_set_flag(bio, BIO_TRACKED); + bio_set_flag(bio, BIO_QOS_THROTTLED); __rq_qos_throttle(q->rq_qos, bio); } } @@ -205,8 +205,10 @@ static inline void rq_qos_track(struct r static inline void rq_qos_merge(struct request_queue *q, struct request *rq, struct bio *bio) { - if (q->rq_qos) + if (q->rq_qos) { + bio_set_flag(bio, BIO_QOS_MERGED); __rq_qos_merge(q->rq_qos, rq, bio); + } } static inline void rq_qos_queue_depth_changed(struct request_queue *q) --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -317,7 +317,8 @@ enum { BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion * of this bio. */ BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ - BIO_TRACKED, /* set if bio goes through the rq_qos path */ + BIO_QOS_THROTTLED, /* bio went through rq_qos throttle path */ + BIO_QOS_MERGED, /* but went through rq_qos merge path */ BIO_REMAPPED, BIO_ZONE_WRITE_LOCKED, /* Owns a zoned device zone write lock */ BIO_PERCPU_CACHE, /* can participate in per-cpu alloc cache */