Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp2295143imc; Tue, 12 Mar 2019 10:47:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqxWTPEUSxUFJbkboHEYJRqx8F0sYv73U4tzwsZhR9VsO7O8YZ+/psebIH1ajoXeDIeU3buN X-Received: by 2002:a62:3681:: with SMTP id d123mr39904882pfa.242.1552412866981; Tue, 12 Mar 2019 10:47:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552412866; cv=none; d=google.com; s=arc-20160816; b=FYOxN9KKV6Lqbpc0iAcyK0nWNGnvvI26YALneHJmO+fmB+ky3K2+kA4LpczQdnVwEW GtoRnpbjCbS0BSZ7t4l48/tctQPliuGj/cBivFusXFirVYh/snypEjE4KvJsdo0sbxdm yDoU8LYuNQ/2nkkyoskPIxlsbHEx/QHJRpy223o24extpHqbXsiGtcOIcL1PhIxws/k0 4N3iTRfLRbBWoW9wqw2LlJpJ7v6/sqFIebOkxdbj7RtYKokdBHYjTqN2beJOqoFO0/LL 5LcOZmVvtki+euzhCl/I6AI2W5VUzCAVi9dfn/w0tv71UMyh+iKespVw5Rwdt5EMV4jc y/ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=T6lZKiIBVTkaaFMcTLKQpLmr1GNwSyuT5D/HLwbDWM4=; b=A2YSXivGhe1nmvvepIcuMwa54+pWcn2ti2O5/zYtgQv7trP3MKXzQTtpy/WOQNDv2i vVoS5dwiN8zUL25VIvmLMBHpc8aaHsHBgBYTpT6BcjvlF9MPVB6d722H8WgfCpeGZKQ5 qZeek7NE3Sb5WvacNX701YBgw9MYtnvCMx7D+zrkN4RrcncPJljI68w6thTxr/ch9gDE p384T/o09Pc3U0EojWkS8Pc4gvB/n7KOsW6FAklvXt8VFNGeOsyttDiIO4fkKSCu/ASd LmB7pFz0The/n6fp1+P4SyX2yPPVeewxm3/JTvqaRHk22tfliJDZIi8nSsMsXaGafySp KO0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ddRVVyHD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7si7920935pgp.513.2019.03.12.10.47.31; Tue, 12 Mar 2019 10:47:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ddRVVyHD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729069AbfCLRpU (ORCPT + 99 others); Tue, 12 Mar 2019 13:45:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:56054 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728742AbfCLRPb (ORCPT ); Tue, 12 Mar 2019 13:15:31 -0400 Received: from localhost (unknown [104.133.8.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0AB71217D8; Tue, 12 Mar 2019 17:15:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1552410930; bh=PwDcnXjQzTdZLnEkiTjrAYW9ex7OKCuOxglj770SoXY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ddRVVyHDm4di6K85Ga8l7p3GQD0p44Flt1d9OF4eN8Z7bmysdWkY7rBckoYnkdocI MbWAEoVR8N9pMQjAaf2UIVq5V/pYegP3ZYUwvMajLT94skYnCwZf8WpRfrDoTcebGb Pe+joyiQkRxmRNitrHO4mamHIC2XexjtkEQOfrQU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Liu Bo , Jens Axboe , Sasha Levin Subject: [PATCH 4.19 121/149] blk-iolatency: fix IO hang due to negative inflight counter Date: Tue, 12 Mar 2019 10:08:59 -0700 Message-Id: <20190312170359.583172835@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190312170349.421581206@linuxfoundation.org> References: <20190312170349.421581206@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.19-stable review patch. If anyone has any objections, please let me know. ------------------ [ Upstream commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b ] Our test reported the following stack, and vmcore showed that ->inflight counter is -1. [ffffc9003fcc38d0] __schedule at ffffffff8173d95d [ffffc9003fcc3958] schedule at ffffffff8173de26 [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6 [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3 [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49 [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4] [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4] [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188 [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301 [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4] [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50 [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04 [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e The ->inflight counter may be negative (-1) if 1) blk-iolatency was disabled when the IO was issued, 2) blk-iolatency was enabled before this IO reached its endio, 3) the ->inflight counter is decreased from 0 to -1 in endio() In fact the hang can be easily reproduced by the below script, H=/sys/fs/cgroup/unified/ P=/sys/fs/cgroup/unified/test echo "+io" > $H/cgroup.subtree_control mkdir -p $P echo $$ > $P/cgroup.procs xfs_io -f -d -c "pwrite 0 4k" /dev/sdg echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency xfs_io -f -d -c "pwrite 0 4k" /dev/sdg This fixes the problem by freezing the queue so that while enabling/disabling iolatency, there is no inflight rq running. Note that quiesce_queue is not needed as this only updating iolatency configuration about which dispatching request_queue doesn't care. Signed-off-by: Liu Bo Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- block/blk-iolatency.c | 52 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index 19923f8a029d..b154e057ca67 100644 --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -72,6 +72,7 @@ #include #include #include +#include #include "blk-rq-qos.h" #include "blk-stat.h" @@ -568,6 +569,9 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio) return; enabled = blk_iolatency_enabled(iolat->blkiolat); + if (!enabled) + return; + while (blkg && blkg->parent) { iolat = blkg_to_lat(blkg); if (!iolat) { @@ -577,7 +581,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio) rqw = &iolat->rq_wait; atomic_dec(&rqw->inflight); - if (!enabled || iolat->min_lat_nsec == 0) + if (iolat->min_lat_nsec == 0) goto next; iolatency_record_time(iolat, &bio->bi_issue, now, issue_as_root); @@ -721,10 +725,13 @@ int blk_iolatency_init(struct request_queue *q) return 0; } -static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val) +/* + * return 1 for enabling iolatency, return -1 for disabling iolatency, otherwise + * return 0. + */ +static int iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val) { struct iolatency_grp *iolat = blkg_to_lat(blkg); - struct blk_iolatency *blkiolat = iolat->blkiolat; u64 oldval = iolat->min_lat_nsec; iolat->min_lat_nsec = val; @@ -733,9 +740,10 @@ static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val) BLKIOLATENCY_MAX_WIN_SIZE); if (!oldval && val) - atomic_inc(&blkiolat->enabled); + return 1; if (oldval && !val) - atomic_dec(&blkiolat->enabled); + return -1; + return 0; } static void iolatency_clear_scaling(struct blkcg_gq *blkg) @@ -768,6 +776,7 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf, u64 lat_val = 0; u64 oldval; int ret; + int enable = 0; ret = blkg_conf_prep(blkcg, &blkcg_policy_iolatency, buf, &ctx); if (ret) @@ -803,7 +812,12 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf, blkg = ctx.blkg; oldval = iolat->min_lat_nsec; - iolatency_set_min_lat_nsec(blkg, lat_val); + enable = iolatency_set_min_lat_nsec(blkg, lat_val); + if (enable) { + WARN_ON_ONCE(!blk_get_queue(blkg->q)); + blkg_get(blkg); + } + if (oldval != iolat->min_lat_nsec) { iolatency_clear_scaling(blkg); } @@ -811,6 +825,24 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf, ret = 0; out: blkg_conf_finish(&ctx); + if (ret == 0 && enable) { + struct iolatency_grp *tmp = blkg_to_lat(blkg); + struct blk_iolatency *blkiolat = tmp->blkiolat; + + blk_mq_freeze_queue(blkg->q); + + if (enable == 1) + atomic_inc(&blkiolat->enabled); + else if (enable == -1) + atomic_dec(&blkiolat->enabled); + else + WARN_ON_ONCE(1); + + blk_mq_unfreeze_queue(blkg->q); + + blkg_put(blkg); + blk_put_queue(blkg->q); + } return ret ?: nbytes; } @@ -910,8 +942,14 @@ static void iolatency_pd_offline(struct blkg_policy_data *pd) { struct iolatency_grp *iolat = pd_to_lat(pd); struct blkcg_gq *blkg = lat_to_blkg(iolat); + struct blk_iolatency *blkiolat = iolat->blkiolat; + int ret; - iolatency_set_min_lat_nsec(blkg, 0); + ret = iolatency_set_min_lat_nsec(blkg, 0); + if (ret == 1) + atomic_inc(&blkiolat->enabled); + if (ret == -1) + atomic_dec(&blkiolat->enabled); iolatency_clear_scaling(blkg); } -- 2.19.1