2024-04-11 03:32:32

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 0/2] block: fix cached time in plug

From: Yu Kuai <[email protected]>

Yu Kuai (2):
block: fix that blk_time_get_ns() doesn't update time after schedule
block: add plug while submitting IO

block/blk-core.c | 7 +++++++
1 file changed, 7 insertions(+)

--
2.39.2



2024-04-11 03:32:37

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 2/2] block: add plug while submitting IO

From: Yu Kuai <[email protected]>

So that if caller didn't use plug, for example, __blkdev_direct_IO_simple()
and __blkdev_direct_IO_async(), block layer can still benefit from caching
nsec time in the plug. And if caller already use plug, then there should be
no affect.

Signed-off-by: Yu Kuai <[email protected]>
---
block/blk-core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index e317d7bc0696..c833bc875a61 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -647,11 +647,13 @@ static void __submit_bio(struct bio *bio)
static void __submit_bio_noacct(struct bio *bio)
{
struct bio_list bio_list_on_stack[2];
+ struct blk_plug plug;

BUG_ON(bio->bi_next);

bio_list_init(&bio_list_on_stack[0]);
current->bio_list = bio_list_on_stack;
+ blk_start_plug(&plug);

do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
@@ -685,19 +687,23 @@ static void __submit_bio_noacct(struct bio *bio)
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} while ((bio = bio_list_pop(&bio_list_on_stack[0])));

+ blk_finish_plug(&plug);
current->bio_list = NULL;
}

static void __submit_bio_noacct_mq(struct bio *bio)
{
struct bio_list bio_list[2] = { };
+ struct blk_plug plug;

current->bio_list = bio_list;
+ blk_start_plug(&plug);

do {
__submit_bio(bio);
} while ((bio = bio_list_pop(&bio_list[0])));

+ blk_finish_plug(&plug);
current->bio_list = NULL;
}

--
2.39.2


2024-04-11 03:32:52

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 1/2] block: fix that blk_time_get_ns() doesn't update time after schedule

From: Yu Kuai <[email protected]>

While monitoring the throttle time of IO from iocost, it's found that
such time is always zero after the io_schedule() from ioc_rqos_throttle,
for example, with the following debug patch:

+ printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (wait.committed)
break;
io_schedule();
}
+ printk("%s-%d: %s exit %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());

It can be observerd that blk_time_get_ns() always return the same time:

[ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.272587] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.472690] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.672656] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.872655] fio-1268: ioc_rqos_throttle exit 1067901962288

And I think the root cause is that 'PF_BLOCK_TS' is always cleared
by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
will never be called:

blk_time_get_ns
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;

io_schedule:
io_schedule_prepare
blk_flush_plug
__blk_flush_plug
/* the flag is cleared, while time is not */
current->flags &= ~PF_BLOCK_TS;
schedule
sched_update_worker
/* the flag is not set, hence plug->cur_ktime is not cleared */
if (tsk->flags & PF_BLOCK_TS)
blk_plug_invalidate_ts()

blk_time_get_ns
/* got the time stashed before schedule */
return plug->cur_ktime;

Fix the problem by clearing cached time in __blk_flush_plug().

Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Signed-off-by: Yu Kuai <[email protected]>
---
block/blk-core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e317d7bc0696 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
if (unlikely(!rq_list_empty(plug->cached_rq)))
blk_mq_free_plug_rqs(plug);

+ plug->cur_ktime = 0;
current->flags &= ~PF_BLOCK_TS;
}

--
2.39.2


2024-04-11 17:58:58

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next 1/2] block: fix that blk_time_get_ns() doesn't update time after schedule

On 4/10/24 9:23 PM, Yu Kuai wrote:
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a16b5abdbbf5..e317d7bc0696 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
> if (unlikely(!rq_list_empty(plug->cached_rq)))
> blk_mq_free_plug_rqs(plug);
>
> + plug->cur_ktime = 0;
> current->flags &= ~PF_BLOCK_TS;
> }

We can just use blk_plug_invalidate_ts() here, but not really important.
I think this one should go into 6.9, and patch 2 should go into 6.10,
however.

--
Jens Axboe


2024-04-12 01:25:22

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next 1/2] block: fix that blk_time_get_ns() doesn't update time after schedule

Hi,

在 2024/04/12 0:44, Jens Axboe 写道:
> On 4/10/24 9:23 PM, Yu Kuai wrote:
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index a16b5abdbbf5..e317d7bc0696 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
>> if (unlikely(!rq_list_empty(plug->cached_rq)))
>> blk_mq_free_plug_rqs(plug);
>>
>> + plug->cur_ktime = 0;
>> current->flags &= ~PF_BLOCK_TS;
>> }
>
> We can just use blk_plug_invalidate_ts() here, but not really important.
> I think this one should go into 6.9, and patch 2 should go into 6.10,
> however.

This sounds great! Do you want me to update and send them separately?

Thanks,
Kuai

>

2024-04-12 14:33:49

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next 1/2] block: fix that blk_time_get_ns() doesn't update time after schedule

On 4/11/24 7:24 PM, Yu Kuai wrote:
> Hi,
>
> ? 2024/04/12 0:44, Jens Axboe ??:
>> On 4/10/24 9:23 PM, Yu Kuai wrote:
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index a16b5abdbbf5..e317d7bc0696 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
>>> if (unlikely(!rq_list_empty(plug->cached_rq)))
>>> blk_mq_free_plug_rqs(plug);
>>> + plug->cur_ktime = 0;
>>> current->flags &= ~PF_BLOCK_TS;
>>> }
>>
>> We can just use blk_plug_invalidate_ts() here, but not really important.
>> I think this one should go into 6.9, and patch 2 should go into 6.10,
>> however.
>
> This sounds great! Do you want me to update and send them separately?

I've applied 1/2 separately, so just resend 2/2 when -rc4 has been
tagged and I'll get that one queued for 6.10.

--
Jens Axboe


2024-04-12 14:45:45

by Jens Axboe

[permalink] [raw]
Subject: Re: (subset) [PATCH -next 0/2] block: fix cached time in plug


On Thu, 11 Apr 2024 11:23:47 +0800, Yu Kuai wrote:
> Yu Kuai (2):
> block: fix that blk_time_get_ns() doesn't update time after schedule
> block: add plug while submitting IO
>
> block/blk-core.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> [...]

Applied, thanks!

[1/2] block: fix that blk_time_get_ns() doesn't update time after schedule
commit: 3ec4848913d695245716ea45ca4872d9dff097a5

Best regards,
--
Jens Axboe