Received: by 2002:a05:7412:8d11:b0:fa:4934:9f with SMTP id bj17csp481495rdb; Mon, 15 Jan 2024 03:55:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IHgiJXtwAl1gV5lkBE9mqquMiSr8UI9LjUIPSe3duWcU6kKf6z1GlChKElCnvtFm8XiRC4Z X-Received: by 2002:a17:906:b788:b0:a28:2782:eb4d with SMTP id dt8-20020a170906b78800b00a282782eb4dmr2480347ejb.26.1705319710949; Mon, 15 Jan 2024 03:55:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705319710; cv=none; d=google.com; s=arc-20160816; b=kpouptURRJ1G2ASEKGbYN5gY5MkBeBxmj9Mgbu/U5oVH88WwBpKDLiJrCtt19/wxdw qQWDSljpIDTPROYCBWiB/i4B85+TdzqL+yBx1JMq+Wx3s9DhCXO5KbLCLfqPXU+dgHba KkPjzScAHhvHGBgkuFTytxrogkqwG1F/M4yBEwVq/AITgW8cu6dgJKj+ck8XluNxFz4i odQMKi4Nmh46LdezMQ/0C+d3WFHFLbfRY94CflfquWZAk3G1/X9RT0WxbETh1K6Ga9y8 TFdJcYKsHG4OQY5k7+ZaqEbqcVkaoV/rg8yjbzxHfHpQTM8fp29zGaKg+2EBhXU+eTZs gytA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:user-agent:date:message-id:from :references:cc:to:subject; bh=jZzPtN73vhJRInObRB8EW3AGQ6WszwxzGZ4/2w6wzwk=; fh=4/TCkxbPHEgY7vYWc2/iqkaSAbklIgY7+ZP+6peb84A=; b=KoPgfGXZ7rGtok7ww6YA9SsAS+cCai0FfWoRMg284Lsc3M3UMiGlg6oyEriMKXqD0o g2gY5G7E0MBTAVR0F9EFr+/61tf0S9UBG3CMh0ITtWRgk+pJ5PL8LMpIxEHaHmHhIsi6 cg4W6PfNbQqW3MOHh+XZeEfy1TkGZazI/maiet2w5CGmzQKDHQFv90J3FEotkajAl9Fm ssgchxDYySuendipLArkgAJRSnzMDMt49jju+FsBpfqR39QigwvHnHaYEDMwmw+lolNh XVSgD/201iAHfn56w/x88uffZVgb1IBlIKf3omaUXrGVV9cQinSyl1xQEWRWsieuZT22 qjOA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25941-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id o13-20020a17090608cd00b00a2de4b8edfcsi599030eje.21.2024.01.15.03.55.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jan 2024 03:55:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25941-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id B095A1F224BF for ; Mon, 15 Jan 2024 11:55:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 21A0A2C6A7; Mon, 15 Jan 2024 11:54:53 +0000 (UTC) Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD36028E3E; Mon, 15 Jan 2024 11:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TD9Y54qVhz4f3kpH; Mon, 15 Jan 2024 19:54:33 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 5984C1A0948; Mon, 15 Jan 2024 19:54:37 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP1 (Coremail) with SMTP id cCh0CgBXKBH7HKVlbDqkAw--.1828S3; Mon, 15 Jan 2024 19:54:37 +0800 (CST) Subject: Re: [PATCH for-6.8/block] block: support to account io_ticks precisely To: Ming Lei , Yu Kuai Cc: hch@lst.de, bvanassche@acm.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20240109071332.2216253-1-yukuai1@huaweicloud.com> From: Yu Kuai Message-ID: Date: Mon, 15 Jan 2024 19:54:35 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:cCh0CgBXKBH7HKVlbDqkAw--.1828S3 X-Coremail-Antispam: 1UD129KBjvJXoWxGFWxCryDXr1xtFWrZrWxtFb_yoWrZrW8pr y8G3ZxKFnaqFy7uFsFva17tF1xX395Cr45JrsxGryayr1DWr1fZrs2qrWF9FZ2vrZ2ya18 Zr18uFyUCw4j9a7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9014x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s 0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xII jxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr 1lF7xvr2IY64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2IY 04v7Mxk0xIA0c2IEe2xFo4CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7 v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF 1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIx AIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWrZr1j6s0D MIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIda VFxhVjvjDU0xZFpf9x0JUdHUDUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Hi, ?? 2024/01/15 19:38, Ming Lei ะด??: > On Tue, Jan 09, 2024 at 03:13:32PM +0800, Yu Kuai wrote: >> From: Yu Kuai >> >> Currently, io_ticks is accounted based on sampling, specifically >> update_io_ticks() will always account io_ticks by 1 jiffies from >> bdev_start_io_acct()/blk_account_io_start(), and the result can be >> inaccurate, for example(HZ is 250): >> >> Test script: >> fio -filename=/dev/sda -bs=4k -rw=write -direct=1 -name=test -thinktime=4ms >> >> Test result: util is about 90%, while the disk is really idle. > > Just be curious, what is result with this patch? 0%? No, it's not 0%, this actually depends on how many IO really start from one jiffies and complete at the next jiffies. Given that the probability is related to IO latency, so the result should be relatively accurate(Around 10% in my environment). I think we can live with that unless we improve time precision from jiffies to ns. > >> >> In order to account io_ticks precisely, update_io_ticks() must know if >> there are IO inflight already, and this requires overhead slightly, >> hence precise io accounting is disabled by default, and user can enable >> it through sysfs entry. >> >> Noted that for rq-based devcie, part_stat_local_inc/dec() and >> part_in_flight() is used to track inflight instead of iterating tags, >> which is not supposed to be used in fast path because 'tags->lock' is >> grabbed in blk_mq_find_and_get_req(). >> >> Signed-off-by: Yu Kuai >> --- >> Changes from RFC v1: >> - remove the new parameter for update_io_ticks(); >> - simplify update_io_ticks(); >> - use swith in queue_iostats_store(); >> - add missing part_stat_local_dec() in blk_account_io_merge_request(); >> Changes from RFC v2: >> - fix that precise is ignored for the first io in update_io_ticks(); >> >> Documentation/ABI/stable/sysfs-block | 8 ++++-- >> block/blk-core.c | 10 +++++-- >> block/blk-merge.c | 3 ++ >> block/blk-mq-debugfs.c | 2 ++ >> block/blk-mq.c | 11 +++++++- >> block/blk-sysfs.c | 42 ++++++++++++++++++++++++++-- >> block/blk.h | 1 + >> block/genhd.c | 2 +- >> include/linux/blk-mq.h | 1 + >> include/linux/blkdev.h | 3 ++ >> 10 files changed, 74 insertions(+), 9 deletions(-) >> >> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block >> index 1fe9a553c37b..79027bf2661a 100644 >> --- a/Documentation/ABI/stable/sysfs-block >> +++ b/Documentation/ABI/stable/sysfs-block >> @@ -358,8 +358,12 @@ What: /sys/block//queue/iostats >> Date: January 2009 >> Contact: linux-block@vger.kernel.org >> Description: >> - [RW] This file is used to control (on/off) the iostats >> - accounting of the disk. >> + [RW] This file is used to control the iostats accounting of the >> + disk. If this value is 0, iostats accounting is disabled; If >> + this value is 1, iostats accounting is enabled, but io_ticks is >> + accounted by sampling and the result is not accurate; If this >> + value is 2, iostats accounting is enabled and io_ticks is >> + accounted precisely, but there will be slightly more overhead. >> >> >> What: /sys/block//queue/logical_block_size >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 9520ccab3050..c70dc311e3b7 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -954,11 +954,15 @@ EXPORT_SYMBOL_GPL(iocb_bio_iopoll); >> void update_io_ticks(struct block_device *part, unsigned long now, bool end) >> { >> unsigned long stamp; >> + bool precise = blk_queue_precise_io_stat(part->bd_queue); >> again: >> stamp = READ_ONCE(part->bd_stamp); >> - if (unlikely(time_after(now, stamp))) { >> - if (likely(try_cmpxchg(&part->bd_stamp, &stamp, now))) >> - __part_stat_add(part, io_ticks, end ? now - stamp : 1); >> + if (unlikely(time_after(now, stamp)) && >> + likely(try_cmpxchg(&part->bd_stamp, &stamp, now))) { >> + if (end || (precise && part_in_flight(part))) >> + __part_stat_add(part, io_ticks, now - stamp); >> + else if (!precise) >> + __part_stat_add(part, io_ticks, 1); > > It should be better or readable to move 'bool precise' into the above branch, > given we only need to read the flag once in each tick. > > Otherwise, this patch looks fine. Thanks for your advice, will change that in next version. Kuai > > Thanks, > Ming > > . >