2024-02-08 09:32:51

by zhaoyang.huang

[permalink] [raw]
Subject: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

From: Zhaoyang Huang <[email protected]>

According to current policy, RT tasks possess the privilege for both of
CPU and IO scheduler which could have the preempted CFS tasks suffer big
IO-latency unfairly. This commit introduce an approximate method to
deduct the preempt affection.

TaskA
sched in
|
|
|
submit_bio
|
|
|
fifo_time = jiffies + expire
(insert_request)

TaskB
sched in
|
|
preempted by RT task
|\
| This period time is unfair to TaskB's IO request, should be adjust
|/
submit_bio
|
|
|
fifo_time = jiffies + expire * CFS_PROPORTION(rq)
(insert_request)

Signed-off-by: Zhaoyang Huang <[email protected]>
---
block/mq-deadline.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..43c08c3d6f18 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -15,6 +15,7 @@
#include <linux/compiler.h>
#include <linux/rbtree.h>
#include <linux/sbitmap.h>
+#include "../kernel/sched/sched.h"

#include <trace/events/block.h>

@@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
struct dd_per_prio *per_prio;
enum dd_prio prio;
+ int fifo_expire;

lockdep_assert_held(&dd->lock);

@@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
/*
* set expire time and add to fifo list
*/
- rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
+ fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
+ CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
+ rq->fifo_time = jiffies + fifo_expire;
insert_before = &per_prio->fifo_list[data_dir];
#ifdef CONFIG_BLK_DEV_ZONED
/*
--
2.25.1



2024-02-08 17:47:05

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On 2/8/24 01:31, zhaoyang.huang wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..43c08c3d6f18 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -15,6 +15,7 @@
> #include <linux/compiler.h>
> #include <linux/rbtree.h>
> #include <linux/sbitmap.h>
> +#include "../kernel/sched/sched.h"

Is kernel/sched/sched.h perhaps a private scheduler kernel header file? Shouldn't
block layer code only include public scheduler header files?

> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> /*
> * set expire time and add to fifo list
> */
> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> + rq->fifo_time = jiffies + fifo_expire;
> insert_before = &per_prio->fifo_list[data_dir];
> #ifdef CONFIG_BLK_DEV_ZONED
> /*

Making the mq-deadline request expiry time dependent on the task priority seems wrong
to me.

Thanks,

Bart.

2024-02-08 17:51:13

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..43c08c3d6f18 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -15,6 +15,7 @@
> #include <linux/compiler.h>
> #include <linux/rbtree.h>
> #include <linux/sbitmap.h>
> +#include "../kernel/sched/sched.h"
>
> #include <trace/events/block.h>
>
> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> struct dd_per_prio *per_prio;
> enum dd_prio prio;
> + int fifo_expire;
>
> lockdep_assert_held(&dd->lock);
>
> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> /*
> * set expire time and add to fifo list
> */
> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> + rq->fifo_time = jiffies + fifo_expire;
> insert_before = &per_prio->fifo_list[data_dir];
> #ifdef CONFIG_BLK_DEV_ZONED
> /*

Hard pass on this blatant layering violation. Just like the priority
changes, this utterly fails to understand how things are properly
designed.

--
Jens Axboe


2024-02-08 23:52:45

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On Fri, Feb 9, 2024 at 1:46 AM Bart Van Assche <[email protected]> wrote:
>
> On 2/8/24 01:31, zhaoyang.huang wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index f958e79277b8..43c08c3d6f18 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -15,6 +15,7 @@
> > #include <linux/compiler.h>
> > #include <linux/rbtree.h>
> > #include <linux/sbitmap.h>
> > +#include "../kernel/sched/sched.h"
>
> Is kernel/sched/sched.h perhaps a private scheduler kernel header file? Shouldn't
> block layer code only include public scheduler header files?
>
> > @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> > /*
> > * set expire time and add to fifo list
> > */
> > - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> > + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> > + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> > + rq->fifo_time = jiffies + fifo_expire;
> > insert_before = &per_prio->fifo_list[data_dir];
> > #ifdef CONFIG_BLK_DEV_ZONED
> > /*
>
> Making the mq-deadline request expiry time dependent on the task priority seems wrong
> to me.
But bio_set_ioprio has done this before
>
> Thanks,
>
> Bart.

2024-02-09 00:11:11

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <[email protected]> wrote:
>>
>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index f958e79277b8..43c08c3d6f18 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -15,6 +15,7 @@
>>> #include <linux/compiler.h>
>>> #include <linux/rbtree.h>
>>> #include <linux/sbitmap.h>
>>> +#include "../kernel/sched/sched.h"
>>>
>>> #include <trace/events/block.h>
>>>
>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>>> struct dd_per_prio *per_prio;
>>> enum dd_prio prio;
>>> + int fifo_expire;
>>>
>>> lockdep_assert_held(&dd->lock);
>>>
>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>> /*
>>> * set expire time and add to fifo list
>>> */
>>> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
>>> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
>>> + rq->fifo_time = jiffies + fifo_expire;
>>> insert_before = &per_prio->fifo_list[data_dir];
>>> #ifdef CONFIG_BLK_DEV_ZONED
>>> /*
>>
>> Hard pass on this blatant layering violation. Just like the priority
>> changes, this utterly fails to understand how things are properly
>> designed.
> IMHO, I don't think this is a layering violation. bio_set_ioprio is
> the one which introduces the scheduler thing into the block layer,
> this commit just wants to do a little improvement based on that. This
> commit helps CFS task save some IO time when preempted by RT heavily.

Listen, both this and the previous content ioprio thing show a glaring
misunderstanding of how to design these kinds of things. You have no
grasp of what the different layers do, or how they interact. I'm not
sure how to put this kindly, but it's really an awful idea to hardcore
some CFS helper into the IO scheduler. The fact that you had to fiddle
around with headers to make it work was the first warning sign, and the
fact that you didn't stop at that point to consider how it could be
properly done makes it even worse.

You need to stop sending kernel patches until you understand basic
software design. Neither of these patches are going anywhere until this
happens. There's been plenty of feedback to telling you that, but you
seem to just ignore it and plow on ahead. Stop.

--
Jens Axboe


2024-02-09 00:38:17

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On Fri, Feb 9, 2024 at 1:49 AM Jens Axboe <[email protected]> wrote:
>
> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index f958e79277b8..43c08c3d6f18 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -15,6 +15,7 @@
> > #include <linux/compiler.h>
> > #include <linux/rbtree.h>
> > #include <linux/sbitmap.h>
> > +#include "../kernel/sched/sched.h"
> >
> > #include <trace/events/block.h>
> >
> > @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> > struct dd_per_prio *per_prio;
> > enum dd_prio prio;
> > + int fifo_expire;
> >
> > lockdep_assert_held(&dd->lock);
> >
> > @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> > /*
> > * set expire time and add to fifo list
> > */
> > - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> > + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> > + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> > + rq->fifo_time = jiffies + fifo_expire;
> > insert_before = &per_prio->fifo_list[data_dir];
> > #ifdef CONFIG_BLK_DEV_ZONED
> > /*
>
> Hard pass on this blatant layering violation. Just like the priority
> changes, this utterly fails to understand how things are properly
> designed.
IMHO, I don't think this is a layering violation. bio_set_ioprio is
the one which introduces the scheduler thing into the block layer,
this commit just wants to do a little improvement based on that. This
commit helps CFS task save some IO time when preempted by RT heavily.

PS: [PATCHv9 1/1] block: introduce content activity based ioprio has
solved layering violation issue. Could you please have a look.
>
> --
> Jens Axboe
>

2024-02-09 00:56:05

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <[email protected]> wrote:
>
> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> > On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <[email protected]> wrote:
> >>
> >> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> >>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>> index f958e79277b8..43c08c3d6f18 100644
> >>> --- a/block/mq-deadline.c
> >>> +++ b/block/mq-deadline.c
> >>> @@ -15,6 +15,7 @@
> >>> #include <linux/compiler.h>
> >>> #include <linux/rbtree.h>
> >>> #include <linux/sbitmap.h>
> >>> +#include "../kernel/sched/sched.h"
> >>>
> >>> #include <trace/events/block.h>
> >>>
> >>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> >>> struct dd_per_prio *per_prio;
> >>> enum dd_prio prio;
> >>> + int fifo_expire;
> >>>
> >>> lockdep_assert_held(&dd->lock);
> >>>
> >>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>> /*
> >>> * set expire time and add to fifo list
> >>> */
> >>> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> >>> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> >>> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> >>> + rq->fifo_time = jiffies + fifo_expire;
> >>> insert_before = &per_prio->fifo_list[data_dir];
> >>> #ifdef CONFIG_BLK_DEV_ZONED
> >>> /*
> >>
> >> Hard pass on this blatant layering violation. Just like the priority
> >> changes, this utterly fails to understand how things are properly
> >> designed.
> > IMHO, I don't think this is a layering violation. bio_set_ioprio is
> > the one which introduces the scheduler thing into the block layer,
> > this commit just wants to do a little improvement based on that. This
> > commit helps CFS task save some IO time when preempted by RT heavily.
>
> Listen, both this and the previous content ioprio thing show a glaring
> misunderstanding of how to design these kinds of things. You have no
> grasp of what the different layers do, or how they interact. I'm not
> sure how to put this kindly, but it's really an awful idea to hardcore
> some CFS helper into the IO scheduler. The fact that you had to fiddle
> around with headers to make it work was the first warning sign, and the
> fact that you didn't stop at that point to consider how it could be
> properly done makes it even worse.
>
> You need to stop sending kernel patches until you understand basic
> software design. Neither of these patches are going anywhere until this
> happens. There's been plenty of feedback to telling you that, but you
> seem to just ignore it and plow on ahead. Stop.
Ok, thanks for pointing this out, I will follow your advice. But I
have to say that '[PATCHv9 1/1] block: introduce content activity
based ioprio' really solved layering violation things. I would like to
humbly ask for your kindly patient to have a look, as it is really
helpful.
>
> --
> Jens Axboe
>

2024-02-09 01:59:44

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On 2/9/24 09:28, Zhaoyang Huang wrote:
> On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <[email protected]> wrote:
>>
>> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
>>> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <[email protected]> wrote:
>>>>
>>>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
>>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>>>> index f958e79277b8..43c08c3d6f18 100644
>>>>> --- a/block/mq-deadline.c
>>>>> +++ b/block/mq-deadline.c
>>>>> @@ -15,6 +15,7 @@
>>>>> #include <linux/compiler.h>
>>>>> #include <linux/rbtree.h>
>>>>> #include <linux/sbitmap.h>
>>>>> +#include "../kernel/sched/sched.h"
>>>>>
>>>>> #include <trace/events/block.h>
>>>>>
>>>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>>>>> struct dd_per_prio *per_prio;
>>>>> enum dd_prio prio;
>>>>> + int fifo_expire;
>>>>>
>>>>> lockdep_assert_held(&dd->lock);
>>>>>
>>>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>> /*
>>>>> * set expire time and add to fifo list
>>>>> */
>>>>> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>>>> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
>>>>> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
>>>>> + rq->fifo_time = jiffies + fifo_expire;
>>>>> insert_before = &per_prio->fifo_list[data_dir];
>>>>> #ifdef CONFIG_BLK_DEV_ZONED
>>>>> /*
>>>>
>>>> Hard pass on this blatant layering violation. Just like the priority
>>>> changes, this utterly fails to understand how things are properly
>>>> designed.
>>> IMHO, I don't think this is a layering violation. bio_set_ioprio is
>>> the one which introduces the scheduler thing into the block layer,
>>> this commit just wants to do a little improvement based on that. This
>>> commit helps CFS task save some IO time when preempted by RT heavily.
>>
>> Listen, both this and the previous content ioprio thing show a glaring
>> misunderstanding of how to design these kinds of things. You have no
>> grasp of what the different layers do, or how they interact. I'm not
>> sure how to put this kindly, but it's really an awful idea to hardcore
>> some CFS helper into the IO scheduler. The fact that you had to fiddle
>> around with headers to make it work was the first warning sign, and the
>> fact that you didn't stop at that point to consider how it could be
>> properly done makes it even worse.
>>
>> You need to stop sending kernel patches until you understand basic
>> software design. Neither of these patches are going anywhere until this
>> happens. There's been plenty of feedback to telling you that, but you
>> seem to just ignore it and plow on ahead. Stop.
> Ok, thanks for pointing this out, I will follow your advice. But I
> have to say that '[PATCHv9 1/1] block: introduce content activity
> based ioprio' really solved layering violation things. I would like to
> humbly ask for your kindly patient to have a look, as it is really
> helpful.

If properly designed, that patch would *not* be a block layer API/function and
so does not need review by block layer folks/Jens as it would simply set an IO
prio for a BIO issued by an FS. So that patch needs to be accepted by FS
people, for the FS you are interested in.


--
Damien Le Moal
Western Digital Research


2024-02-09 03:10:24

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [PATCH 3/3] block: introducing a bias over deadline's fifo_time

On Fri, Feb 9, 2024 at 9:58 AM Damien Le Moal <[email protected]> wrote:
>
> On 2/9/24 09:28, Zhaoyang Huang wrote:
> > On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <[email protected]> wrote:
> >>
> >> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> >>> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <[email protected]> wrote:
> >>>>
> >>>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> >>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>>>> index f958e79277b8..43c08c3d6f18 100644
> >>>>> --- a/block/mq-deadline.c
> >>>>> +++ b/block/mq-deadline.c
> >>>>> @@ -15,6 +15,7 @@
> >>>>> #include <linux/compiler.h>
> >>>>> #include <linux/rbtree.h>
> >>>>> #include <linux/sbitmap.h>
> >>>>> +#include "../kernel/sched/sched.h"
> >>>>>
> >>>>> #include <trace/events/block.h>
> >>>>>
> >>>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>>> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> >>>>> struct dd_per_prio *per_prio;
> >>>>> enum dd_prio prio;
> >>>>> + int fifo_expire;
> >>>>>
> >>>>> lockdep_assert_held(&dd->lock);
> >>>>>
> >>>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>>> /*
> >>>>> * set expire time and add to fifo list
> >>>>> */
> >>>>> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> >>>>> + fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> >>>>> + CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> >>>>> + rq->fifo_time = jiffies + fifo_expire;
> >>>>> insert_before = &per_prio->fifo_list[data_dir];
> >>>>> #ifdef CONFIG_BLK_DEV_ZONED
> >>>>> /*
> >>>>
> >>>> Hard pass on this blatant layering violation. Just like the priority
> >>>> changes, this utterly fails to understand how things are properly
> >>>> designed.
> >>> IMHO, I don't think this is a layering violation. bio_set_ioprio is
> >>> the one which introduces the scheduler thing into the block layer,
> >>> this commit just wants to do a little improvement based on that. This
> >>> commit helps CFS task save some IO time when preempted by RT heavily.
> >>
> >> Listen, both this and the previous content ioprio thing show a glaring
> >> misunderstanding of how to design these kinds of things. You have no
> >> grasp of what the different layers do, or how they interact. I'm not
> >> sure how to put this kindly, but it's really an awful idea to hardcore
> >> some CFS helper into the IO scheduler. The fact that you had to fiddle
> >> around with headers to make it work was the first warning sign, and the
> >> fact that you didn't stop at that point to consider how it could be
> >> properly done makes it even worse.
> >>
> >> You need to stop sending kernel patches until you understand basic
> >> software design. Neither of these patches are going anywhere until this
> >> happens. There's been plenty of feedback to telling you that, but you
> >> seem to just ignore it and plow on ahead. Stop.
> > Ok, thanks for pointing this out, I will follow your advice. But I
> > have to say that '[PATCHv9 1/1] block: introduce content activity
> > based ioprio' really solved layering violation things. I would like to
> > humbly ask for your kindly patient to have a look, as it is really
> > helpful.
>
> If properly designed, that patch would *not* be a block layer API/function and
> so does not need review by block layer folks/Jens as it would simply set an IO
> prio for a BIO issued by an FS. So that patch needs to be accepted by FS
> people, for the FS you are interested in.
Thanks for the heads-up, sorry for my none-sense on the needs of
maintaining the whole framework. IMHO, the newly introduced API is a
little bit like bio_set_pages_dirty which is mainly related to bio and
the pages inside. Patchv9 has changed a lot to meet your kind advice.
I would be grateful to you if you could review it.
>
>
> --
> Damien Le Moal
> Western Digital Research
>