Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
From:   John Garry <john.garry@huawei.com>
To:     <axboe@kernel.dk>
CC:     <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
        <ming.lei@redhat.com>, <kashyap.desai@broadcom.com>,
        <hare@suse.de>, "John Garry" <john.garry@huawei.com>
Subject: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags
Date:   Tue, 2 Nov 2021 19:27:32 +0800
Message-ID: <1635852455-39935-1-git-send-email-john.garry@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

In [0], Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
and callees for shared tags.

Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
shared tags, but it was not optimum previously.

So I would like this series tested, and also to know what is triggering
blk_mq_queue_tag_busy_iter() from userspace to cause such high CPU
loading.

As suggested by Ming, reading /proc/diskstats in a while true loop
can trigger blk_mq_queue_tag_busy_iter(); I do so in a test with 2x
separate consoles, and here are the results:

v5.15
blk_mq_queue_tag_busy_iter() 6.2%
part_stat_read_all() 6.7

pre-v5.16 (Linus' master branch @ commit bfc484fe6abb)
blk_mq_queue_tag_busy_iter() 4.5%
part_stat_read_all() 6.2

pre-v5.16+this series
blk_mq_queue_tag_busy_iter() not shown in top users
part_stat_read_all() 7.5%

These results are from perf top, on a system with 7x
disks, with hisi_sas which has 16x HW queues.

[0] https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/

John Garry (3):
  blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument
  blk-mq: Delete busy_iter_fn
  blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

 block/blk-mq-tag.c     | 58 +++++++++++++++++++++++++++---------------
 block/blk-mq-tag.h     |  2 +-
 block/blk-mq.c         | 17 ++++++-------
 include/linux/blk-mq.h |  2 --
 4 files changed, 47 insertions(+), 32 deletions(-)

-- 
2.17.1