2018-10-03 06:29:46

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices



> Il giorno 02 ott 2018, alle ore 16:31, Jens Axboe <[email protected]> ha scritto:
>
> On 10/2/18 6:43 AM, Linus Walleij wrote:
>> This sets BFQ as the default scheduler for single queue
>> block devices (nr_hw_queues == 1) if it is available. This
>> affects notably MMC/SD-cards but notably also UBI and
>> the loopback device.
>>
>> I have been running it for a while without any negative
>> effects on my pet systems and I want some wider testing
>> so let's throw it out there and see what people say.
>> Admittedly my use cases are limited.
>>
>> I talked to Pavel a bit back and it turns out he has a
>> usecase for BFQ as well and I bet he also would like it
>> as default scheduler for that system (Pavel tell us more,
>> I don't remember what it was!)
>>
>> Intuitively I could understand that maybe we want to
>> leave the loop device (possibly others? nbd? rbd?) as
>> "none", as it is probably relying on a scheduler on the
>> device below it, so I'm open to passing in a scheduler hint
>> from the respective subsystem in say struct blk_mq_tag_set.
>> However that makes for a bit of syntactic dissonance
>> with the struct member ".nr_hw_queues" (I wonder how
>> the loop device can have 1 "hardware queue"?) so
>> maybe we should in that case also rename that struct
>> member to ".nr_queues" fair and square before we start
>> making adjustments for treating queues differently whether
>> they are in hardware or actually not.
>
> I think this should just be done with udev rules, and I'd
> prefer if the distros would lead the way on this, as they
> are the ones that will most likely see the most bug reports
> on a change like this.
>

Hi Jens,
I see your point, but I doubt this is the way to go, because of the
following flaws.

As also Linus Torvalds complained [1], people feel lost among
I/O-scheduler options. Actual differences across I/O schedulers are
basically obscure to non experts. In this respect, Linux-kernel
'users' are way more than a few top-level distros that can afford a
strong performance team, and that, basing on the input of such a team,
might venture light-heartedly to change a critical component like an
I/O scheduler. Plus, as Linus Walleij pointed out, some users simply
are not distros that use udev.

So, probably 99% of Linux-kernel users will just stick to the default
I/O scheduler, mq-deadline, assuming that the algorithm by which that
scheduler was chosen was not "pick the scheduler with the longest
name", but "pick the best scheduler for most cases". The problem is
that, for single-queue devices with a speed below 400/500 KIOPS, the
default scheduler is apparently incomparably worse than bfq in terms
of responsiveness and latency for time-sensitive applications [2], and
in terms of throughput reached while controlling I/O [3]. And, in all
other tests ran so far, by any entity or group I'm aware of, bfq
results basically on par with or better than mq-deadline.

So, I do understand your need for conservativeness, but, after so much
evidence on single-queue devices, and so many years! :), what's the
point in keeping Linux worse for virtually everybody, by default?

Thanks,
Paolo

[1] https://lkml.org/lkml/2017/2/21/791
[2] http://algo.ing.unimo.it/people/paolo/disk_sched/results.php
[3] https://lwn.net/Articles/763603/



> --
> Jens Axboe



2018-10-03 06:55:43

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, Oct 3, 2018 at 8:29 AM Paolo Valente <[email protected]> wrote:

> So, I do understand your need for conservativeness, but, after so much
> evidence on single-queue devices, and so many years! :), what's the
> point in keeping Linux worse for virtually everybody, by default?

I understand if we need to ease things in as well, I don't intend this
change for the current merge window or anything, since v4.19
will notably have this patch:

commit d5038a13eca72fb216c07eb717169092e92284f1
Author: Johannes Thumshirn <[email protected]>
Date: Wed Jul 4 10:53:56 2018 +0200

scsi: core: switch to scsi-mq by default

It has been more than one year since we tried to change the default from
legacy to multi queue in SCSI with commit c279bd9e406 ("scsi: default to
scsi-mq"). But due to issues with suspend/resume and performance problems
it had been reverted again with commit cbe7dfa26eee ("Revert "scsi: default
to scsi-mq"").

In the meantime there have been a substantial amount of performance
improvements and suspend/resume got fixed as well, thus we can re-enable
scsi-mq without a significant performance penalty.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Acked-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

I guess that patch can be a bit scary by itself. But IIUC it all went
fine this time!

But hey, if that works, that means $SUBJECT patch will enable BFQ on all
libata devices and any SCSI that is single queue as well, not just
"obscure" stuff like MMC/SD and UBI, and that is
indeed a massive crowd of legacy devices. But we're talking
v4.21 here.

Johannes, you might be interested in $SUBJECT patch.
It'd be nice to hear what SUSE people have to add, since they
are pretty proactive in this area.

Yours,
Linus Walleij

2018-10-03 07:05:57

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, 2018-10-03 at 08:29 +0200, Paolo Valente wrote:
> So, I do understand your need for conservativeness, but, after so much
> evidence on single-queue devices, and so many years! :), what's the
> point in keeping Linux worse for virtually everybody, by default?

Sounds like what we just need a mechanism for the device (ubi block in
this case) to select the I/O scheduler. I doubt enhancing the default
scheduler selection logic in 'elevator.c' is the right answer. Just
give the driver authority to override the defaults.

2018-10-03 07:20:46

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, Oct 3, 2018 at 9:05 AM Artem Bityutskiy <[email protected]> wrote:
> On Wed, 2018-10-03 at 08:29 +0200, Paolo Valente wrote:
> > So, I do understand your need for conservativeness, but, after so much
> > evidence on single-queue devices, and so many years! :), what's the
> > point in keeping Linux worse for virtually everybody, by default?
>
> Sounds like what we just need a mechanism for the device (ubi block in
> this case) to select the I/O scheduler. I doubt enhancing the default
> scheduler selection logic in 'elevator.c' is the right answer. Just
> give the driver authority to override the defaults.

This might be true in the wider sense (like for what scheduler to
select for an NVME device with N channels) but $SUBJECT is just
trying to select BFQ (if available) for devices with one and only one
hardware queue.

That is AFAICT the only reasonable choice for anything with just
one hardware queue as things stand right now.

I have a slight reservation for the weird outliers like loopdev, which
has "one hardware queue" (.nr_hw_queues == 1) though this
makes no sense at all. So I would like to know what people think
about that. Maybe we should have .nr_queues and .nr_hw_queues
where the former is the number of logical queues and the latter
the actual number of hardware queues.

Yours,
Linus Walleij

2018-10-03 11:58:26

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

Hi.

On 03.10.2018 08:29, Paolo Valente wrote:
> As also Linus Torvalds complained [1], people feel lost among
> I/O-scheduler options. Actual differences across I/O schedulers are
> basically obscure to non experts. In this respect, Linux-kernel
> 'users' are way more than a few top-level distros that can afford a
> strong performance team, and that, basing on the input of such a team,
> might venture light-heartedly to change a critical component like an
> I/O scheduler. Plus, as Linus Walleij pointed out, some users simply
> are not distros that use udev.

I feel a contradiction in this counter-argument. On one hand, there are
lots of, let's call them, home users, that use major distributions with
udev, so the distribution maintainers can reasonably decide which
scheduler to use for which type of device based on the udev rule and
common sense provided via Documentation/ by linux-block devs. Moreover,
most likely, those rules should be similar or the same across all the
major distros and available via some (systemd?) upstream.

On another hand, the users of embedded devices, mentioned by Linus,
should already know what scheduler to choose because dealing with
embedded world assumes the person can decide this on their own, or with
the help of abovementioned udev scripts and/or Documentation/ as a
reference point.

So I see no obstacles here, and the choice to rely on udev by default
sounds reasonable.

The question that remain is whether it is really important to mount a
root partition while already using some specific scheduler? Why it
cannot be done with "none", for instance?

> So, probably 99% of Linux-kernel users will just stick to the default
> I/O scheduler, mq-deadline, assuming that the algorithm by which that
> scheduler was chosen was not "pick the scheduler with the longest
> name", but "pick the best scheduler for most cases". The problem is
> that, for single-queue devices with a speed below 400/500 KIOPS, the
> default scheduler is apparently incomparably worse than bfq in terms
> of responsiveness and latency for time-sensitive applications [2], and
> in terms of throughput reached while controlling I/O [3]. And, in all
> other tests ran so far, by any entity or group I'm aware of, bfq
> results basically on par with or better than mq-deadline.

And that's why major distributions are likely to default to BFQ via
udev. No one argues with BFQ superiority here ☺.

> So, I do understand your need for conservativeness, but, after so much
> evidence on single-queue devices, and so many years! :), what's the
> point in keeping Linux worse for virtually everybody, by default?

From my point of view this is not a conservative approach at all. On
contrary, offloading decisions to userspace aligns pretty well with
recent trends like pressure metrics/userspace OOM killer, eBPF etc. The
less unnecessary logic the kernel handles, the more flexibility it
affords.

--
Oleksandr Natalenko (post-factum)

2018-10-03 13:26:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed 03-10-18 08:53:37, Linus Walleij wrote:
> On Wed, Oct 3, 2018 at 8:29 AM Paolo Valente <[email protected]> wrote:
>
> > So, I do understand your need for conservativeness, but, after so much
> > evidence on single-queue devices, and so many years! :), what's the
> > point in keeping Linux worse for virtually everybody, by default?
>
> I understand if we need to ease things in as well, I don't intend this
> change for the current merge window or anything, since v4.19
> will notably have this patch:
>
> commit d5038a13eca72fb216c07eb717169092e92284f1
> Author: Johannes Thumshirn <[email protected]>
> Date: Wed Jul 4 10:53:56 2018 +0200
>
> scsi: core: switch to scsi-mq by default
>
> It has been more than one year since we tried to change the default from
> legacy to multi queue in SCSI with commit c279bd9e406 ("scsi: default to
> scsi-mq"). But due to issues with suspend/resume and performance problems
> it had been reverted again with commit cbe7dfa26eee ("Revert "scsi: default
> to scsi-mq"").
>
> In the meantime there have been a substantial amount of performance
> improvements and suspend/resume got fixed as well, thus we can re-enable
> scsi-mq without a significant performance penalty.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Reviewed-by: Ming Lei <[email protected]>
> Acked-by: John Garry <[email protected]>
> Signed-off-by: Martin K. Petersen <[email protected]>
>
> I guess that patch can be a bit scary by itself. But IIUC it all went
> fine this time!
>
> But hey, if that works, that means $SUBJECT patch will enable BFQ on all
> libata devices and any SCSI that is single queue as well, not just
> "obscure" stuff like MMC/SD and UBI, and that is
> indeed a massive crowd of legacy devices. But we're talking
> v4.21 here.
>
> Johannes, you might be interested in $SUBJECT patch.
> It'd be nice to hear what SUSE people have to add, since they
> are pretty proactive in this area.

So we do have a udev rules in our distro which sets the IO scheduler based
on device parameters (rotational at least, with blk-mq we might start
considering number of queues as well, plus we have some exceptions like
virtio, loop, etc.). So the kernel default doesn't concern us too much as a
distro.

I personally would consider bfq a safer default for single-queue devices
(loop probably needs exception) but I don't feel too strongly about it.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-10-03 14:54:14

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, Oct 03, 2018 at 01:49:25PM +0200, Oleksandr Natalenko wrote:

> On another hand, the users of embedded devices, mentioned by Linus, should
> already know what scheduler to choose because dealing with embedded world
> assumes the person can decide this on their own, or with the help of
> abovementioned udev scripts and/or Documentation/ as a reference point.

That's not an entirely realistic assessment of a lot of practical
embedded development - while people *can* go in and tweak things to
their heart's content and some will have the time to do that there's a
lot of small teams pulling together entire systems who rely fairly
heavily on defaults, focusing most of their effort on the bits of code
they directly wrote. You get things like people taking a copy of an
embedded distro at some point and then only updating components that
they specifically want to update like the new kernel with the drivers
for the SoC in the new product.

> So I see no obstacles here, and the choice to rely on udev by default sounds
> reasonable.

There's still a good number of users where there's a big discoverability
problem here I fear.

We have this regularly with the arm64 fixups for emulating old locking
constructs that were removed from the architecture (useful for running
old arm binaries on arm64 systems), that's got a Kconfig option but also
requires enabling at runtime. I've had to help several users who were
completely frustrated trying to get their old binaries working having
upgraded to a kernel with the option, turned it on in Kconfig and then
being unaware that there was also this hoop userspace had to jump
through. This is less severe as it's only a performance thing but still
potentially annoying.


Attachments:
(No filename) (1.72 kB)
signature.asc (499.00 B)
Download all attachments

2018-10-03 15:55:19

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, 2018-10-03 at 08:29 +-0200, Paolo Valente wrote:
+AD4 +AFs-1+AF0 https://lkml.org/lkml/2017/2/21/791
+AD4 +AFs-2+AF0 http://algo.ing.unimo.it/people/paolo/disk+AF8-sched/results.php
+AD4 +AFs-3+AF0 https://lwn.net/Articles/763603/

From +AFs-2+AF0: +ACI-BFQ loses about 18+ACU with only random readers, because the number
of IOPS becomes so high that the execution time and parallel efficiency of
the schedulers becomes relevant.+ACI Since the number of I/O patterns for which
results are available on +AFs-2+AF0 is limited and since the number of devices for
which test results are available on +AFs-2+AF0 is limited (e.g. RAID is missing),
there might be other cases in which configuring BFQ as the default would
introduce a regression.

I agree with Jens that it's best to leave it to the Linux distributors to
select a default I/O scheduler.

Bart.

2018-10-03 15:56:21

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices



> Il giorno 03 ott 2018, alle ore 13:49, Oleksandr Natalenko <[email protected]> ha scritto:
>
> Hi.
>
> On 03.10.2018 08:29, Paolo Valente wrote:
>> As also Linus Torvalds complained [1], people feel lost among
>> I/O-scheduler options. Actual differences across I/O schedulers are
>> basically obscure to non experts. In this respect, Linux-kernel
>> 'users' are way more than a few top-level distros that can afford a
>> strong performance team, and that, basing on the input of such a team,
>> might venture light-heartedly to change a critical component like an
>> I/O scheduler. Plus, as Linus Walleij pointed out, some users simply
>> are not distros that use udev.
>
> I feel a contradiction in this counter-argument. On one hand, there are lots of, let's call them, home users, that use major distributions with udev, so the distribution maintainers can reasonably decide which scheduler to use for which type of device based on the udev rule and common sense provided via Documentation/ by linux-block devs. Moreover, most likely, those rules should be similar or the same across all the major distros and available via some (systemd?) upstream.
>

Let me basically repeat Mark's answer here, with my words.

Unfortunately, facts mismatch with your optimistic view: after so many
years and concordant test results, only very few distributions
switched to bfq, no major distribution did (AFAIK). As I already
wrote, the reason is the one pointed out by Torvalds [1]. Do you want
a simple example? Take the last sentence in Jan's email in this
thread: "I *personally would* consider bfq a safer default ... but *I
don't feel too strongly* about it." And he is definitely a storage
expert.

The problem, in particular, is that bfq is a complex beast, fighting
against a jungle of I/O issues. You have to be really into bfq, even
to just know all of its features!

> On another hand, the users of embedded devices, mentioned by Linus, should already know what scheduler to choose because dealing with embedded world assumes the person can decide this on their own, or with the help of abovementioned udev scripts and/or Documentation/ as a reference point.
>

Same situation for embedded devices, if not even worse. Again for the
same reasons above. In the end, it is hard even for a kernel expert
to be an in-depth expert of every possible complex component.

> So I see no obstacles here, and the choice to rely on udev by default sounds reasonable.
>
> The question that remain is whether it is really important to mount a root partition while already using some specific scheduler? Why it cannot be done with "none", for instance?
>
>> So, probably 99% of Linux-kernel users will just stick to the default
>> I/O scheduler, mq-deadline, assuming that the algorithm by which that
>> scheduler was chosen was not "pick the scheduler with the longest
>> name", but "pick the best scheduler for most cases". The problem is
>> that, for single-queue devices with a speed below 400/500 KIOPS, the
>> default scheduler is apparently incomparably worse than bfq in terms
>> of responsiveness and latency for time-sensitive applications [2], and
>> in terms of throughput reached while controlling I/O [3]. And, in all
>> other tests ran so far, by any entity or group I'm aware of, bfq
>> results basically on par with or better than mq-deadline.
>
> And that's why major distributions are likely to default to BFQ via udev. No one argues with BFQ superiority here ☺.
>
>> So, I do understand your need for conservativeness, but, after so much
>> evidence on single-queue devices, and so many years! :), what's the
>> point in keeping Linux worse for virtually everybody, by default?
>
> From my point of view this is not a conservative approach at all. On contrary, offloading decisions to userspace aligns pretty well with recent trends like pressure metrics/userspace OOM killer, eBPF etc. The less unnecessary logic the kernel handles, the more flexibility it affords.
>

To not answer too seriously here, let me answer with a quote that is
still missing a clear paternity: "Everything should be made as simple
as possible, but not simpler." :)

Thanks,
Paolo

> --
> Oleksandr Natalenko (post-factum)


2018-10-03 16:01:11

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, 2018-10-03 at 17:55 +-0200, Paolo Valente wrote:
+AD4 The problem, in particular, is that bfq is a complex beast, fighting
+AD4 against a jungle of I/O issues. You have to be really into bfq, even
+AD4 to just know all of its features+ACE

This is a problem by itself. I don't know anyone who wants to have to deal
with I/O scheduler tunables.

Bart.


2018-10-03 16:03:36

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices



> Il giorno 03 ott 2018, alle ore 17:54, Bart Van Assche <[email protected]> ha scritto:
>
> On Wed, 2018-10-03 at 08:29 +0200, Paolo Valente wrote:
>> [1] https://lkml.org/lkml/2017/2/21/791
>> [2] http://algo.ing.unimo.it/people/paolo/disk_sched/results.php
>> [3] https://lwn.net/Articles/763603/
>
> From [2]: "BFQ loses about 18% with only random readers, because the number
> of IOPS becomes so high that the execution time and parallel efficiency of
> the schedulers becomes relevant." Since the number of I/O patterns for which
> results are available on [2] is limited and since the number of devices for
> which test results are available on [2] is limited (e.g. RAID is missing),
> there might be other cases in which configuring BFQ as the default would
> introduce a regression.
>

From [3]: none with throttling loses 80% of the throughput when used
to control I/O. On any drive. And this is really only one example among a ton.

In addition, the test you mention, designed by me, was meant exactly
to find and show the worst breaking point of BFQ. If your main
workload of interest is really made only of tens of parallel thread
doing only sync random I/O, and you care only about throughput,
without any concern for your system becoming so unresponsive to be
unusable during the test, then, yes, mq-deadline is a better option
for you.

So, are you really sure the balance is in favor of mq-deadline?

Thanks,
Paolo

> I agree with Jens that it's best to leave it to the Linux distributors to
> select a default I/O scheduler.
>
> Bart.


2018-10-03 16:06:14

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices



> Il giorno 03 ott 2018, alle ore 18:00, Bart Van Assche <[email protected]> ha scritto:
>
> On Wed, 2018-10-03 at 17:55 +0200, Paolo Valente wrote:
>> The problem, in particular, is that bfq is a complex beast, fighting
>> against a jungle of I/O issues. You have to be really into bfq, even
>> to just know all of its features!
>
> This is a problem by itself. I don't know anyone who wants to have to deal
> with I/O scheduler tunables.
>

In fact, I designed and am constantly improving bfq, exactly so that
you don't have to touch any tunable.

Thanks,
Paolo

> Bart.
>


2018-10-03 17:22:48

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices



> Il giorno 03 ott 2018, alle ore 18:02, Paolo Valente <[email protected]> ha scritto:
>
>
>
>> Il giorno 03 ott 2018, alle ore 17:54, Bart Van Assche <[email protected]> ha scritto:
>>
>> On Wed, 2018-10-03 at 08:29 +0200, Paolo Valente wrote:
>>> [1] https://lkml.org/lkml/2017/2/21/791
>>> [2] http://algo.ing.unimo.it/people/paolo/disk_sched/results.php
>>> [3] https://lwn.net/Articles/763603/
>>
>> From [2]: "BFQ loses about 18% with only random readers, because the number
>> of IOPS becomes so high that the execution time and parallel efficiency of
>> the schedulers becomes relevant." Since the number of I/O patterns for which
>> results are available on [2] is limited and since the number of devices for
>> which test results are available on [2] is limited (e.g. RAID is missing),
>> there might be other cases in which configuring BFQ as the default would
>> introduce a regression.
>>
>
> From [3]: none with throttling loses 80% of the throughput when used
> to control I/O. On any drive. And this is really only one example among a ton.
>

I forgot to add that the same 80% loss happens with mq-deadline plus
throttling, sorry. In addition, mq-deadline suffers from much more
than a 18% loss of throughput, w.r.t. bfq, exactly in the same figure
you cited, if there are random writes too.

> In addition, the test you mention, designed by me, was meant exactly
> to find and show the worst breaking point of BFQ. If your main
> workload of interest is really made only of tens of parallel thread
> doing only sync random I/O, and you care only about throughput,
> without any concern for your system becoming so unresponsive to be
> unusable during the test, then, yes, mq-deadline is a better option
> for you.
>

Some more detail on this. The fact that bfq reaches a lower
throughput than none in this test is actually still puzzling me,
because the process rate of I/O with bfq is one order of magnitude
higher than the IOPS of this device. So, I still don't understand
why, with bfq, the queue of the device does not get as full as with
none, and thus why the throughput with bfq is not the same as with
none.

To further test this issue, I replaced sync I/O with async I/O (with a
very high depth). And, nonsensically (for me), throughput dropped
with both bfq and none! I already meant to to report this issue,
after investigating it more. Anyway, this is a different story w.r.t.
this thread.

Thanks,
Paolo


> So, are you really sure the balance is in favor of mq-deadline?
>
> Thanks,
> Paolo
>
>> I agree with Jens that it's best to leave it to the Linux distributors to
>> select a default I/O scheduler.
>>
>> Bart.
>
> --
> You received this message because you are subscribed to the Google Groups "bfq-iosched" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


2018-10-04 07:40:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed 03-10-18 17:55:41, Paolo Valente wrote:
> > On 03.10.2018 08:29, Paolo Valente wrote:
> >> As also Linus Torvalds complained [1], people feel lost among
> >> I/O-scheduler options. Actual differences across I/O schedulers are
> >> basically obscure to non experts. In this respect, Linux-kernel
> >> 'users' are way more than a few top-level distros that can afford a
> >> strong performance team, and that, basing on the input of such a team,
> >> might venture light-heartedly to change a critical component like an
> >> I/O scheduler. Plus, as Linus Walleij pointed out, some users simply
> >> are not distros that use udev.
> >
> > I feel a contradiction in this counter-argument. On one hand, there are lots of, let's call them, home users, that use major distributions with udev, so the distribution maintainers can reasonably decide which scheduler to use for which type of device based on the udev rule and common sense provided via Documentation/ by linux-block devs. Moreover, most likely, those rules should be similar or the same across all the major distros and available via some (systemd?) upstream.
> >
>
> Let me basically repeat Mark's answer here, with my words.
>
> Unfortunately, facts mismatch with your optimistic view: after so many
> years and concordant test results, only very few distributions
> switched to bfq, no major distribution did (AFAIK). As I already
> wrote, the reason is the one pointed out by Torvalds [1]. Do you want
> a simple example? Take the last sentence in Jan's email in this
> thread: "I *personally would* consider bfq a safer default ... but *I
> don't feel too strongly* about it." And he is definitely a storage
> expert.

Yeah, but let me add that currently all our released kernels still use legacy
block stack for SCSI by default and thus CFQ/deadline. And once we feel
scsi-mq + BFQ is comparable enough for rotating disks (which may be after
your latest changes, Andreas will be running some larger evaluation), we
are going to switch to that instead of scsi + CFQ. So it's not like for us
it is a question between deadline-mq and BFQ, it is rather between scsi +
CFQ vs scsi-mq + BFQ.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-10-04 07:46:33

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, Oct 03, 2018 at 03:25:54PM +0200, Jan Kara wrote:
> On Wed 03-10-18 08:53:37, Linus Walleij wrote:
> > On Wed, Oct 3, 2018 at 8:29 AM Paolo Valente <[email protected]> wrote:
> >
> > > So, I do understand your need for conservativeness, but, after so much
> > > evidence on single-queue devices, and so many years! :), what's the
> > > point in keeping Linux worse for virtually everybody, by default?
> >
> > I understand if we need to ease things in as well, I don't intend this
> > change for the current merge window or anything, since v4.19
> > will notably have this patch:
> >
> > commit d5038a13eca72fb216c07eb717169092e92284f1
> > Author: Johannes Thumshirn <[email protected]>
> > Date: Wed Jul 4 10:53:56 2018 +0200
> >
> > scsi: core: switch to scsi-mq by default
> >
> > It has been more than one year since we tried to change the default from
> > legacy to multi queue in SCSI with commit c279bd9e406 ("scsi: default to
> > scsi-mq"). But due to issues with suspend/resume and performance problems
> > it had been reverted again with commit cbe7dfa26eee ("Revert "scsi: default
> > to scsi-mq"").
> >
> > In the meantime there have been a substantial amount of performance
> > improvements and suspend/resume got fixed as well, thus we can re-enable
> > scsi-mq without a significant performance penalty.
> >
> > Signed-off-by: Johannes Thumshirn <[email protected]>
> > Reviewed-by: Hannes Reinecke <[email protected]>
> > Reviewed-by: Ming Lei <[email protected]>
> > Acked-by: John Garry <[email protected]>
> > Signed-off-by: Martin K. Petersen <[email protected]>
> >
> > I guess that patch can be a bit scary by itself. But IIUC it all went
> > fine this time!
> >
> > But hey, if that works, that means $SUBJECT patch will enable BFQ on all
> > libata devices and any SCSI that is single queue as well, not just
> > "obscure" stuff like MMC/SD and UBI, and that is
> > indeed a massive crowd of legacy devices. But we're talking
> > v4.21 here.
> >
> > Johannes, you might be interested in $SUBJECT patch.
> > It'd be nice to hear what SUSE people have to add, since they
> > are pretty proactive in this area.
>
> So we do have a udev rules in our distro which sets the IO scheduler based
> on device parameters (rotational at least, with blk-mq we might start
> considering number of queues as well, plus we have some exceptions like
> virtio, loop, etc.). So the kernel default doesn't concern us too much as a
> distro.
>
> I personally would consider bfq a safer default for single-queue devices
> (loop probably needs exception) but I don't feel too strongly about it.

[Full quote for context]

What about resurrecting CONFIG_DEFAULT_IOSCHED for MQ as well and
leave it default to mq-deadline but give bfq, kyber and none as a
choice as well?

The question is shall we only do it for single queue devices or for
native MQ devices as well if we go down that road?

I understand the embedded floks will want a different interface than
udev, but from the non-embedded point of view I'm with Jens and Jan
here, let udev do the job.

Johannes
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2018-10-04 08:26:07

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Thu, Oct 04, 2018 at 09:45:35AM +0200, Johannes Thumshirn wrote:
> On Wed, Oct 03, 2018 at 03:25:54PM +0200, Jan Kara wrote:
> > On Wed 03-10-18 08:53:37, Linus Walleij wrote:
> > > On Wed, Oct 3, 2018 at 8:29 AM Paolo Valente <[email protected]> wrote:
> > >
> > > > So, I do understand your need for conservativeness, but, after so much
> > > > evidence on single-queue devices, and so many years! :), what's the
> > > > point in keeping Linux worse for virtually everybody, by default?
> > >
> > > I understand if we need to ease things in as well, I don't intend this
> > > change for the current merge window or anything, since v4.19
> > > will notably have this patch:
> > >
> > > commit d5038a13eca72fb216c07eb717169092e92284f1
> > > Author: Johannes Thumshirn <[email protected]>
> > > Date: Wed Jul 4 10:53:56 2018 +0200
> > >
> > > scsi: core: switch to scsi-mq by default
> > >
> > > It has been more than one year since we tried to change the default from
> > > legacy to multi queue in SCSI with commit c279bd9e406 ("scsi: default to
> > > scsi-mq"). But due to issues with suspend/resume and performance problems
> > > it had been reverted again with commit cbe7dfa26eee ("Revert "scsi: default
> > > to scsi-mq"").
> > >
> > > In the meantime there have been a substantial amount of performance
> > > improvements and suspend/resume got fixed as well, thus we can re-enable
> > > scsi-mq without a significant performance penalty.
> > >
> > > Signed-off-by: Johannes Thumshirn <[email protected]>
> > > Reviewed-by: Hannes Reinecke <[email protected]>
> > > Reviewed-by: Ming Lei <[email protected]>
> > > Acked-by: John Garry <[email protected]>
> > > Signed-off-by: Martin K. Petersen <[email protected]>
> > >
> > > I guess that patch can be a bit scary by itself. But IIUC it all went
> > > fine this time!
> > >
> > > But hey, if that works, that means $SUBJECT patch will enable BFQ on all
> > > libata devices and any SCSI that is single queue as well, not just
> > > "obscure" stuff like MMC/SD and UBI, and that is
> > > indeed a massive crowd of legacy devices. But we're talking
> > > v4.21 here.
> > >
> > > Johannes, you might be interested in $SUBJECT patch.
> > > It'd be nice to hear what SUSE people have to add, since they
> > > are pretty proactive in this area.
> >
> > So we do have a udev rules in our distro which sets the IO scheduler based
> > on device parameters (rotational at least, with blk-mq we might start
> > considering number of queues as well, plus we have some exceptions like
> > virtio, loop, etc.). So the kernel default doesn't concern us too much as a
> > distro.
> >
> > I personally would consider bfq a safer default for single-queue devices
> > (loop probably needs exception) but I don't feel too strongly about it.
>
> [Full quote for context]
>
> What about resurrecting CONFIG_DEFAULT_IOSCHED for MQ as well and
> leave it default to mq-deadline but give bfq, kyber and none as a
> choice as well?

I second this -- introduction of a CONFIG_DEFAULT_MQ_IOSCHED.
Having a default I/O scheduler kernel config option for MQ allows to
build a kernel suitable for specific use w/o userspace
dependencies.
(But it still allows to reconfigure things via userspace.)

> The question is shall we only do it for single queue devices or for
> native MQ devices as well if we go down that road?

Good question. I am not yet sure about this.
I'd start with using the default for single queue devices.

Andreas

> I understand the embedded floks will want a different interface than
> udev, but from the non-embedded point of view I'm with Jens and Jan
> here, let udev do the job.
>
> Johannes
> --
> Johannes Thumshirn Storage
> [email protected] +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2018-10-04 08:27:51

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH] block: BFQ default for single queue devices

On Wed, Oct 3, 2018 at 1:49 PM Oleksandr Natalenko
<[email protected]> wrote:

> On another hand, the users of embedded devices, mentioned by Linus,
> should already know what scheduler to choose because dealing with
> embedded world assumes the person can decide this on their own, or with
> the help of abovementioned udev scripts and/or Documentation/ as a
> reference point.
>
> So I see no obstacles here, and the choice to rely on udev by default
> sounds reasonable.

I am sorry but I do not agree with this.

There are several historical precedents where we have
concluded that just "have the kernel do the right thing
by default" is the way to go.

Example 1: pluggable CPU schedulers.
The reasoning was that users or distros have no clue what
scheduler they want, only scheduler developers do. We
drove it to the point where we have one and one
scheduler only, not different flavors. (Special
usecases have special scheduling classes inside the
one scheduler instead.)

Example 2: Automatic process group scheduling
The reasoning was that daemons such as systemd would
be better at placing processes/tasks into the right
control groups to manage their resources, so this would
be a userspace policy handled by the udev/systemd
complex. We did not do that. Instead the kernel does
autogrouping per-session, indeed it is a Kconfig option
but even e.g. Fedora has this enabled by default.
(commit 5091faa449ee)

As pointed out elsewhere: these defaults make it
easy for custom builds not using udev+systemd to
get a system up and running with sensible defaults.

Simple embedded systems use Busybox' mdev (I wouldn't
trust it do do any complex decisions). OpenWRT
has ubox+ubus+uci, also extremely lightweight,
Android has its own init system that I don't
manage to keep track of anymore. Instead of running
all over the map and fixing these userspaces to
do the right thing, it makes sense to make the
right thing the default.

And these are millions and millions of deployed
systems not using udev+systemd we are talking about,
they are not fringe hobby projects. It's not that I
personally dislike udev or anything, I kind of like
it, but these tailored distros simply don't use it
and they are huge in numbers. They need help to do
the right thing. Fixing a udev rule doesn't solve
even half the world's problems I'm afraid.

Yours,
Linus Walleij