LinuxLists.cc - [PATCH v2 0/3] blk-mq: fix start_time_ns and alloc_time

2023-06-26 05:12:50

Subject: [PATCH v2 0/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

From: Chengming Zhou <[email protected]>

Hello,

This patchset is the updated version of [1], which fix start_time_ns
and alloc_time_ns for pre-allocated rq.

patch 1 and 2 is preparation that we ktime_get_ns() only once for
batched requests start_time_ns setting.

patch 3 is the fix patch that we set alloc_time_ns and start_time_ns
to now when the pre-allocated rq is actually used.

[1] https://lore.kernel.org/all/[email protected]/

Chengming Zhou (3):
blk-mq: always use __blk_mq_alloc_requests() to alloc and init rq
blk-mq: ktime_get_ns() only once for batched requests init
blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

block/blk-mq.c | 89 ++++++++++++++++++++++++------------------
include/linux/blk-mq.h | 6 +--
2 files changed, 55 insertions(+), 40 deletions(-)

--
2.39.2

2023-06-26 05:28:23

by Chengming Zhou

[permalink] [raw]

Subject: [PATCH v2 3/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

From: Chengming Zhou <[email protected]>

The iocost rely on rq start_time_ns and alloc_time_ns to tell saturation
state of the block device. Most of the time request is allocated after
rq_qos_throttle() and its alloc_time_ns or start_time_ns won't be affected.

But for plug batched allocation introduced by the commit 47c122e35d7e
("block: pre-allocate requests if plug is started and is a batch"), we can
rq_qos_throttle() after the allocation of the request. This is what the
blk_mq_get_cached_request() does.

In this case, the cached request alloc_time_ns or start_time_ns is much
ahead if blocked in any qos ->throttle().

This patch fix it by setting alloc_time_ns and start_time_ns to now
when the pre-allocated rq is actually used.

Note we don't skip setting alloc_time_ns and start_time_ns for all
pre-allocated rq, since the first returned rq still need to be set.

Signed-off-by: Chengming Zhou <[email protected]>
---
block/blk-mq.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8b981d0a868e..6a3f1b8aaad8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -337,6 +337,24 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
}
EXPORT_SYMBOL(blk_rq_init);

+/* Set rq alloc and start time when pre-allocated rq is actually used */
+static inline void blk_mq_rq_time_init(struct request_queue *q, struct request *rq)
+{
+ if (blk_mq_need_time_stamp(rq->rq_flags)) {
+ u64 now = ktime_get_ns();
+
+#ifdef CONFIG_BLK_RQ_ALLOC_TIME
+ /*
+ * alloc time is only used by iocost for now,
+ * only possible when blk_mq_need_time_stamp().
+ */
+ if (blk_queue_rq_alloc_time(q))
+ rq->alloc_time_ns = now;
+#endif
+ rq->start_time_ns = now;
+ }
+}
+
static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
struct blk_mq_tags *tags, unsigned int tag,
u64 alloc_time_ns, u64 start_time_ns)
@@ -575,6 +593,7 @@ static struct request *blk_mq_alloc_cached_request(struct request_queue *q,
return NULL;

plug->cached_rq = rq_list_next(rq);
+ blk_mq_rq_time_init(q, rq);
}

rq->cmd_flags = opf;
@@ -2896,6 +2915,7 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
plug->cached_rq = rq_list_next(rq);
rq_qos_throttle(q, *bio);

+ blk_mq_rq_time_init(q, rq);
rq->cmd_flags = (*bio)->bi_opf;
INIT_LIST_HEAD(&rq->queuelist);
return rq;
--
2.39.2

2023-06-26 05:32:56

by Chengming Zhou

[permalink] [raw]

Subject: [PATCH v2 1/3] blk-mq: always use __blk_mq_alloc_requests() to alloc and init rq

From: Chengming Zhou <[email protected]>

This patch is preparation for the next patch that ktime_get_ns() only once
for batched pre-allocated requests start_time_ns setting.

1. data->flags is input for blk_mq_rq_ctx_init(), shouldn't update in
every blk_mq_rq_ctx_init() in batched requests alloc. So put the
data->flags initialization in the caller.

2. make blk_mq_alloc_request_hctx() to reuse __blk_mq_alloc_requests(),
instead of directly using blk_mq_rq_ctx_init() by itself, so avoid
doing the same data->flags initialization in it.

After these cleanup, __blk_mq_alloc_requests() is the only entry to
alloc and init rq.

Signed-off-by: Chengming Zhou <[email protected]>
---
block/blk-mq.c | 46 ++++++++++++++++++----------------------------
1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index decb6ab2d508..c50ef953759f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -349,11 +349,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
rq->mq_ctx = ctx;
rq->mq_hctx = hctx;
rq->cmd_flags = data->cmd_flags;
-
- if (data->flags & BLK_MQ_REQ_PM)
- data->rq_flags |= RQF_PM;
- if (blk_queue_io_stat(q))
- data->rq_flags |= RQF_IO_STAT;
rq->rq_flags = data->rq_flags;

if (data->rq_flags & RQF_SCHED_TAGS) {
@@ -447,6 +442,15 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
if (data->cmd_flags & REQ_NOWAIT)
data->flags |= BLK_MQ_REQ_NOWAIT;

+ if (data->flags & BLK_MQ_REQ_RESERVED)
+ data->rq_flags |= RQF_RESV;
+
+ if (data->flags & BLK_MQ_REQ_PM)
+ data->rq_flags |= RQF_PM;
+
+ if (blk_queue_io_stat(q))
+ data->rq_flags |= RQF_IO_STAT;
+
if (q->elevator) {
/*
* All requests use scheduler tags when an I/O scheduler is
@@ -471,14 +475,15 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
}

retry:
- data->ctx = blk_mq_get_ctx(q);
- data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
+ /* See blk_mq_alloc_request_hctx() for details */
+ if (!data->ctx) {
+ data->ctx = blk_mq_get_ctx(q);
+ data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
+ }
+
if (!(data->rq_flags & RQF_SCHED_TAGS))
blk_mq_tag_busy(data->hctx);

- if (data->flags & BLK_MQ_REQ_RESERVED)
- data->rq_flags |= RQF_RESV;
-
/*
* Try batched alloc if we want more than 1 tag.
*/
@@ -505,6 +510,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
* is going away.
*/
msleep(3);
+ data->ctx = NULL;
goto retry;
}

@@ -613,16 +619,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
.cmd_flags = opf,
.nr_tags = 1,
};
- u64 alloc_time_ns = 0;
struct request *rq;
unsigned int cpu;
- unsigned int tag;
int ret;

- /* alloc_time includes depth and tag waits */
- if (blk_queue_rq_alloc_time(q))
- alloc_time_ns = ktime_get_ns();
-
/*
* If the tag allocator sleeps we could get an allocation for a
* different hardware context. No need to complicate the low level
@@ -653,20 +653,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
goto out_queue_exit;
data.ctx = __blk_mq_get_ctx(q, cpu);

- if (q->elevator)
- data.rq_flags |= RQF_SCHED_TAGS;
- else
- blk_mq_tag_busy(data.hctx);
-
- if (flags & BLK_MQ_REQ_RESERVED)
- data.rq_flags |= RQF_RESV;
-
ret = -EWOULDBLOCK;
- tag = blk_mq_get_tag(&data);
- if (tag == BLK_MQ_NO_TAG)
+ rq = __blk_mq_alloc_requests(&data);
+ if (!rq)
goto out_queue_exit;
- rq = blk_mq_rq_ctx_init(&data, blk_mq_tags_from_data(&data), tag,
- alloc_time_ns);
rq->__data_len = 0;
rq->__sector = (sector_t) -1;
rq->bio = rq->biotail = NULL;
--
2.39.2

2023-06-26 05:32:56

by Chengming Zhou

[permalink] [raw]

Subject: [PATCH v2 2/3] blk-mq: ktime_get_ns() only once for batched requests init

From: Chengming Zhou <[email protected]>

Extend blk_mq_rq_ctx_init() to receive start_time_ns, so we can
ktime_get_ns() only once for batched requests start_time_ns setting.

Since data->rq_flags initialization has been moved to the caller
__blk_mq_alloc_requests(), we can use it to check if time is needed.

Signed-off-by: Chengming Zhou <[email protected]>
---
block/blk-mq.c | 23 ++++++++++++++---------
include/linux/blk-mq.h | 6 +++---
2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c50ef953759f..8b981d0a868e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -338,7 +338,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
EXPORT_SYMBOL(blk_rq_init);

static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
- struct blk_mq_tags *tags, unsigned int tag, u64 alloc_time_ns)
+ struct blk_mq_tags *tags, unsigned int tag,
+ u64 alloc_time_ns, u64 start_time_ns)
{
struct blk_mq_ctx *ctx = data->ctx;
struct blk_mq_hw_ctx *hctx = data->hctx;
@@ -360,14 +361,11 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
}
rq->timeout = 0;

- if (blk_mq_need_time_stamp(rq))
- rq->start_time_ns = ktime_get_ns();
- else
- rq->start_time_ns = 0;
rq->part = NULL;
#ifdef CONFIG_BLK_RQ_ALLOC_TIME
rq->alloc_time_ns = alloc_time_ns;
#endif
+ rq->start_time_ns = start_time_ns;
rq->io_start_time_ns = 0;
rq->stats_sectors = 0;
rq->nr_phys_segments = 0;
@@ -405,11 +403,15 @@ __blk_mq_alloc_requests_batch(struct blk_mq_alloc_data *data,
struct request *rq;
unsigned long tag_mask;
int i, nr = 0;
+ u64 start_time_ns = 0;

tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
if (unlikely(!tag_mask))
return NULL;

+ if (blk_mq_need_time_stamp(data->rq_flags))
+ start_time_ns = ktime_get_ns();
+
tags = blk_mq_tags_from_data(data);
for (i = 0; tag_mask; i++) {
if (!(tag_mask & (1UL << i)))
@@ -417,7 +419,7 @@ __blk_mq_alloc_requests_batch(struct blk_mq_alloc_data *data,
tag = tag_offset + i;
prefetch(tags->static_rqs[tag]);
tag_mask &= ~(1UL << i);
- rq = blk_mq_rq_ctx_init(data, tags, tag, alloc_time_ns);
+ rq = blk_mq_rq_ctx_init(data, tags, tag, alloc_time_ns, start_time_ns);
rq_list_add(data->cached_rq, rq);
nr++;
}
@@ -431,7 +433,7 @@ __blk_mq_alloc_requests_batch(struct blk_mq_alloc_data *data,
static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
{
struct request_queue *q = data->q;
- u64 alloc_time_ns = 0;
+ u64 alloc_time_ns = 0, start_time_ns = 0;
struct request *rq;
unsigned int tag;

@@ -514,8 +516,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
goto retry;
}

+ if (blk_mq_need_time_stamp(data->rq_flags))
+ start_time_ns = ktime_get_ns();
+
return blk_mq_rq_ctx_init(data, blk_mq_tags_from_data(data), tag,
- alloc_time_ns);
+ alloc_time_ns, start_time_ns);
}

static struct request *blk_mq_rq_cache_fill(struct request_queue *q,
@@ -1004,7 +1009,7 @@ static inline void __blk_mq_end_request_acct(struct request *rq, u64 now)

inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
{
- if (blk_mq_need_time_stamp(rq))
+ if (blk_mq_need_time_stamp(rq->rq_flags))
__blk_mq_end_request_acct(rq, ktime_get_ns());

if (rq->end_io) {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..e8366e9c3388 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -834,9 +834,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *ib);
* Only need start/end time stamping if we have iostat or
* blk stats enabled, or using an IO scheduler.
*/
-static inline bool blk_mq_need_time_stamp(struct request *rq)
+static inline bool blk_mq_need_time_stamp(req_flags_t rq_flags)
{
- return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_USE_SCHED));
+ return (rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_USE_SCHED));
}

static inline bool blk_mq_is_reserved_rq(struct request *rq)
@@ -860,7 +860,7 @@ static inline bool blk_mq_add_to_batch(struct request *req,
iob->complete = complete;
else if (iob->complete != complete)
return false;
- iob->need_ts |= blk_mq_need_time_stamp(req);
+ iob->need_ts |= blk_mq_need_time_stamp(req->rq_flags);
rq_list_add(&iob->req_list, req);
return true;
}
--
2.39.2

2023-06-26 21:02:08

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

Hello,

I only glanced the blk-mq core part but in general this looks a lot better
than the previous one.

On Mon, Jun 26, 2023 at 01:04:05PM +0800, [email protected] wrote:
> Note we don't skip setting alloc_time_ns and start_time_ns for all
> pre-allocated rq, since the first returned rq still need to be set.

This part is a bit curious for me tho. Why do we need to set it at batch
allocation time and then at actual dispensing from the bat later? Who uses
the alloc time stamp inbetween?

Thanks.

--
tejun

2023-06-27 12:19:56

by Chengming Zhou

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

On 2023/6/27 04:46, Tejun Heo wrote:
> Hello,
>
> I only glanced the blk-mq core part but in general this looks a lot better
> than the previous one.

Thanks for your review!

>
> On Mon, Jun 26, 2023 at 01:04:05PM +0800, [email protected] wrote:
>> Note we don't skip setting alloc_time_ns and start_time_ns for all
>> pre-allocated rq, since the first returned rq still need to be set.
>
> This part is a bit curious for me tho. Why do we need to set it at batch
> allocation time and then at actual dispensing from the bat later? Who uses
> the alloc time stamp inbetween?
>

Yes, this part should need more explanation, and my explanation is not clear.

Now the batched pre-allocation code looks like this:

```
if (!rq_list_empty(plug->cached_rq))

get pre-allocated rq from plug cache

// we set alloc and start time here
return rq

else
rq = __blk_mq_alloc_requests_batch() do batched allocation (1)
(2)
return rq
```

In (1) we alloc some requests and push them in plug list, and pop one request
to return to use. So this popped one request need to be set time at batch allocation time.

Yes, we can also set this popped request time in (2), just before return it to use.

Since all requests in batch allocation use the same alloc and start time, so this patch
just leave it as it is, and reset it at actual used time.

I think both way is ok, do you think it's better to just set the popped one request, leave
other requests time to 0 ? If so, I can update to do it.

Thanks.

2023-06-27 19:14:00

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

Hello,

On Tue, Jun 27, 2023 at 07:32:42PM +0800, Chengming Zhou wrote:
> Since all requests in batch allocation use the same alloc and start time, so this patch
> just leave it as it is, and reset it at actual used time.
>
> I think both way is ok, do you think it's better to just set the popped one request, leave
> other requests time to 0 ? If so, I can update to do it.

I think it'd be clearer if the rule is that the alloc time is set once when
the request is actually dispensed for use in all cases, so yeah, let's just
set it once when it actually starts getting used.

Thanks.

--
tejun

2023-06-28 01:31:30

by Chengming Zhou

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq

On 2023/6/28 02:47, Tejun Heo wrote:
> Hello,
>
> On Tue, Jun 27, 2023 at 07:32:42PM +0800, Chengming Zhou wrote:
>> Since all requests in batch allocation use the same alloc and start time, so this patch
>> just leave it as it is, and reset it at actual used time.
>>
>> I think both way is ok, do you think it's better to just set the popped one request, leave
>> other requests time to 0 ? If so, I can update to do it.
>
> I think it'd be clearer if the rule is that the alloc time is set once when
> the request is actually dispensed for use in all cases, so yeah, let's just
> set it once when it actually starts getting used.
>

Good, I will update the patchset today.

Thanks.