LinuxLists.cc - Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

2014-06-02 14:29:32

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On 2014-05-30 23:16, Tejun Heo wrote:
>> for turning patch #2 into a series of changes for CFQ instead. We need to
>> end up with something where we can potentially bisect our way down to
>> whatever caused any given regression. The worst possible situation is "CFQ
>> works fine for this workload, but BFQ does not" or vice versa. Or hangs, or
>> whatever it might be.
>
> It's likely that there will be some workloads out there which may be
> affected adversely, which is true for any change really but with both
> the core scheduling and heuristics properly characterized at least
> finding a reasonable trade-off should be much less of a crapshoot and
> the expected benefits seem to easily outweigh the risks as long as we
> can properly sequence the changes.

Exactly, I think we are pretty much on the same page here. As I said in
the previous email, the biggest thing I care about is not adding a new
IO scheduler wholesale. If Paolo can turn the "add BFQ" patch into a
series of patches against CFQ, then I would have no issue merging it for
testing (and inclusion, when it's stable enough).

One thing I've neglected to bring up but have been thinking about -
we're quickly getting to the point where the old request_fn IO path will
become a legacy thing, mostly in maintenance mode. That isn't a problem
for morphing bfq and cfq, but it does mean that development efforts in
this area would be a lot better spent writing an IO scheduler that fits
into the blk-mq framework instead.

I realize this is a tall order right now, as I haven't included any sort
of framework for that in blk-mq yet. So what I envision happening is
that I will write a basic deadline (ish) scheduler for blk-mq, and
hopefully others can then pitch in and we can get the ball rolling on
that side as well.

--
Jens Axboe

2014-06-02 17:24:59

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hello, Jens.

On Mon, Jun 02, 2014 at 08:29:27AM -0600, Jens Axboe wrote:
> One thing I've neglected to bring up but have been thinking about - we're
> quickly getting to the point where the old request_fn IO path will become a
> legacy thing, mostly in maintenance mode. That isn't a problem for morphing
> bfq and cfq, but it does mean that development efforts in this area would be
> a lot better spent writing an IO scheduler that fits into the blk-mq
> framework instead.

What I'm planning right now is improving blkcg so that it can do both
proportional and hard limits with high cpu scalability, most likely
using percpu charge caches. It probably would be best to roll all
those into one piece of logic. I don't think, well at least hope,
that we'd need multiple modular scheduler / blkcg implementations for
blk-mq and both can be served by built-in scheduling logic.
Regardless of device speed, we'd need some form of fairness
enforcement after all.

Thanks.

--
tejun

2014-06-02 17:31:53

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On 06/02/2014 11:24 AM, Tejun Heo wrote:
> Hello, Jens.
>
> On Mon, Jun 02, 2014 at 08:29:27AM -0600, Jens Axboe wrote:
>> One thing I've neglected to bring up but have been thinking about - we're
>> quickly getting to the point where the old request_fn IO path will become a
>> legacy thing, mostly in maintenance mode. That isn't a problem for morphing
>> bfq and cfq, but it does mean that development efforts in this area would be
>> a lot better spent writing an IO scheduler that fits into the blk-mq
>> framework instead.
>
> What I'm planning right now is improving blkcg so that it can do both
> proportional and hard limits with high cpu scalability, most likely
> using percpu charge caches. It probably would be best to roll all
> those into one piece of logic. I don't think, well at least hope,
> that we'd need multiple modular scheduler / blkcg implementations for
> blk-mq and both can be served by built-in scheduling logic.
> Regardless of device speed, we'd need some form of fairness
> enforcement after all.

For things like blkcg, I agree, it should be able to be common code and
reusable. But there's a need for scheduling beyond that, for people that
don't use control groups (ie most...). And it'd be hard to retrofit cfq
into blk-mq, without rewriting it. I don't believe we need anything this
fancy for blk-mq, hopefully. At least having simple deadline scheduling
would be Good Enough for the foreseeable future.

--
Jens Axboe

2014-06-02 17:42:56

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hello, Jens.

On Mon, Jun 02, 2014 at 11:32:05AM -0600, Jens Axboe wrote:
> For things like blkcg, I agree, it should be able to be common code and
> reusable. But there's a need for scheduling beyond that, for people that
> don't use control groups (ie most...). And it'd be hard to retrofit cfq
> into blk-mq, without rewriting it. I don't believe we need anything this
> fancy for blk-mq, hopefully. At least having simple deadline scheduling
> would be Good Enough for the foreseeable future.

Heh, looks like we're miscommunicating. I don't think anything with
the level of complexity of cfq is realistic for high-iops devices. It
has already become a liability for SATA ssds after all. My suggestion
is that as hierarchical scheduling tends to be logical extension of
flat scheduling, it probably would make sense to implement both
scheduling logics in the same framework as in the cpu scheduler or (to
a lesser extent) cfq. So, a new blk-mq scheduler which can work in
hierarchical mode if blkcg is in active use.

One part I was wondering about is whether we'd need to continue the
modular multiple implementation mechanism. For rotating disks, for
various reasons including some historical ones, we ended up with
multiple ioscheds and somewhat uglily layered blkcg implementations.
Given that the expected characteristics of blk-mq devices are more
consistent, it could be reasonable to stick with single iops and/or
bandwidth scheme.

Thanks.

--
tejun

2014-06-02 17:46:25

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On 06/02/2014 11:42 AM, Tejun Heo wrote:
> Hello, Jens.
>
> On Mon, Jun 02, 2014 at 11:32:05AM -0600, Jens Axboe wrote:
>> For things like blkcg, I agree, it should be able to be common code and
>> reusable. But there's a need for scheduling beyond that, for people that
>> don't use control groups (ie most...). And it'd be hard to retrofit cfq
>> into blk-mq, without rewriting it. I don't believe we need anything this
>> fancy for blk-mq, hopefully. At least having simple deadline scheduling
>> would be Good Enough for the foreseeable future.
>
> Heh, looks like we're miscommunicating. I don't think anything with
> the level of complexity of cfq is realistic for high-iops devices. It
> has already become a liability for SATA ssds after all. My suggestion
> is that as hierarchical scheduling tends to be logical extension of
> flat scheduling, it probably would make sense to implement both
> scheduling logics in the same framework as in the cpu scheduler or (to
> a lesser extent) cfq. So, a new blk-mq scheduler which can work in
> hierarchical mode if blkcg is in active use.

But blk-mq will potentially drive anything, so it might not be out of
the question with a more expensive scheduling variant, if it makes any
sense to do of course. At least until there's no more rotating stuff out
there :-). But it's not a priority at all to me yet. As long as we have
coexisting IO paths, it'd be trivial to select the needed one based on
the device characteristics.

> One part I was wondering about is whether we'd need to continue the
> modular multiple implementation mechanism. For rotating disks, for
> various reasons including some historical ones, we ended up with
> multiple ioscheds and somewhat uglily layered blkcg implementations.
> Given that the expected characteristics of blk-mq devices are more
> consistent, it could be reasonable to stick with single iops and/or
> bandwidth scheme.

I hope not to do that. I just want something sane and simple (like a
deadline scheduler), nothing more.

--
Jens Axboe

2014-06-02 18:51:43

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hello,

On Mon, Jun 02, 2014 at 11:46:36AM -0600, Jens Axboe wrote:
> But blk-mq will potentially drive anything, so it might not be out of
> the question with a more expensive scheduling variant, if it makes any
> sense to do of course. At least until there's no more rotating stuff out
> there :-). But it's not a priority at all to me yet. As long as we have
> coexisting IO paths, it'd be trivial to select the needed one based on
> the device characteristics.

Hmmm... yeah, moving rotating devices over to blk-mq doesn't really
seem beneficial to me. I think there are fundamental behavioral
differences for rotating rusts and newer solid state devices to share
single code path for things like scheduling and selecting the
appropriate path depending on the actual devices sounds like a much
better plan even in the long term.

Thanks.

--
tejun

2014-06-02 20:57:18

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Mon, Jun 02 2014, Tejun Heo wrote:
> Hello,
>
> On Mon, Jun 02, 2014 at 11:46:36AM -0600, Jens Axboe wrote:
> > But blk-mq will potentially drive anything, so it might not be out of
> > the question with a more expensive scheduling variant, if it makes any
> > sense to do of course. At least until there's no more rotating stuff out
> > there :-). But it's not a priority at all to me yet. As long as we have
> > coexisting IO paths, it'd be trivial to select the needed one based on
> > the device characteristics.
>
> Hmmm... yeah, moving rotating devices over to blk-mq doesn't really
> seem beneficial to me. I think there are fundamental behavioral
> differences for rotating rusts and newer solid state devices to share
> single code path for things like scheduling and selecting the
> appropriate path depending on the actual devices sounds like a much
> better plan even in the long term.

It's not so much about it being more beneficial to run in blk-mq, as it
is about not having two code paths. But yes, we're likely going to
maintain that code for a long time, so it's not going anywhere anytime
soon.

And for scsi-mq, it's already opt-in, though on a per-host basis. Doing
finer granularity than that is going to be difficult, unless we let
legacy-block and blk-mq share a tag map (though that would not be too
hard).

--
Jens Axboe

2014-06-04 14:31:47

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Mon, Jun 02, 2014 at 02:57:30PM -0600, Jens Axboe wrote:
> It's not so much about it being more beneficial to run in blk-mq, as it
> is about not having two code paths. But yes, we're likely going to
> maintain that code for a long time, so it's not going anywhere anytime
> soon.
>
> And for scsi-mq, it's already opt-in, though on a per-host basis. Doing
> finer granularity than that is going to be difficult, unless we let
> legacy-block and blk-mq share a tag map (though that would not be too
> hard).

I don't really think there's anything inherently counter productive
to spinning rust (at least for somewhat modern spinning rust and
infrastructure) in blk-mq. I'd really like to get rid of the old
request layer in a reasonable amount of time, and for SCSI I'm very
reluctant to add more integration between the old and new code. I'd
really planning on not maintaining the old request based SCSI code
for a long time once we get positive reports in from users of various
kinds of older hardware.

2014-06-04 14:50:59

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hey, Christoph.

On Wed, Jun 04, 2014 at 07:31:36AM -0700, Christoph Hellwig wrote:
> I don't really think there's anything inherently counter productive
> to spinning rust (at least for somewhat modern spinning rust and
> infrastructure) in blk-mq. I'd really like to get rid of the old
> request layer in a reasonable amount of time, and for SCSI I'm very
> reluctant to add more integration between the old and new code. I'd
> really planning on not maintaining the old request based SCSI code
> for a long time once we get positive reports in from users of various
> kinds of older hardware.

Hmmm... the biggest thing is ioscheds. They heavily rely on being
strongly synchronized and are pretty important for rotating rusts.
Maybe they can be made to work with blk-mq by forcing single queue or
something but do we wnat that?

Thanks.

--
tejun

2014-06-04 14:53:43

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Wed, Jun 04, 2014 at 10:50:53AM -0400, Tejun Heo wrote:
> Hmmm... the biggest thing is ioscheds. They heavily rely on being
> strongly synchronized and are pretty important for rotating rusts.
> Maybe they can be made to work with blk-mq by forcing single queue or
> something but do we wnat that?

Jens is planning to add an (optional) I/O scheduler to blk-mq, and
that is indeed required for proper disk support. I don't think there
even is a need to limit it to a single queue technically, although
devices that support multiple queues are unlikely to need I/O
scheduling.

2014-06-04 14:58:37

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Wed, Jun 04, 2014 at 07:53:30AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 04, 2014 at 10:50:53AM -0400, Tejun Heo wrote:
> > Hmmm... the biggest thing is ioscheds. They heavily rely on being
> > strongly synchronized and are pretty important for rotating rusts.
> > Maybe they can be made to work with blk-mq by forcing single queue or
> > something but do we wnat that?
>
> Jens is planning to add an (optional) I/O scheduler to blk-mq, and
> that is indeed required for proper disk support. I don't think there
> even is a need to limit it to a single queue technically, although
> devices that support multiple queues are unlikely to need I/O
> scheduling.

I think what Jens is planning is something really minimal. Things
like [cb]fq heavily depend on the old block infrastructure. I don't
know. Maybe they can be merged in time but I'm not quite sure we'd
have enough pressure to actually do that. Host-granular switching
should be good enough, I guess.

Thanks.

--
tejun

2014-06-04 17:51:42

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Wed, Jun 04, 2014 at 10:58:29AM -0400, Tejun Heo wrote:
> I think what Jens is planning is something really minimal. Things
> like [cb]fq heavily depend on the old block infrastructure. I don't
> know. Maybe they can be merged in time but I'm not quite sure we'd
> have enough pressure to actually do that. Host-granular switching
> should be good enough, I guess.

Jens told me he wanted to do a deadline scheduler, which actually is
the most sensible for disks unless you want all the cgroup magic.

Given that people in this thread are interested in more complex
schedulers I'd suggest they implement BFQ for blk-mq.

2014-06-17 15:57:28

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Il giorno 02/giu/2014, alle ore 16:29, Jens Axboe <[email protected]> ha scritto:

> On 2014-05-30 23:16, Tejun Heo wrote:
>>> for turning patch #2 into a series of changes for CFQ instead. We need to
>>> end up with something where we can potentially bisect our way down to
>>> whatever caused any given regression. The worst possible situation is "CFQ
>>> works fine for this workload, but BFQ does not" or vice versa. Or hangs, or
>>> whatever it might be.
>>
>> It's likely that there will be some workloads out there which may be
>> affected adversely, which is true for any change really but with both
>> the core scheduling and heuristics properly characterized at least
>> finding a reasonable trade-off should be much less of a crapshoot and
>> the expected benefits seem to easily outweigh the risks as long as we
>> can properly sequence the changes.
>
> Exactly, I think we are pretty much on the same page here. As I said in the previous email, the biggest thing I care about is not adding a new IO scheduler wholesale. If Paolo can turn the "add BFQ" patch into a series of patches against CFQ, then I would have no issue merging it for testing (and inclusion, when it's stable enough).

We have finished analyzing possible ways to turn cfq into bfq, and unfortunately
I think I need some help in this respect. In fact, we have found several, apparently
non-trivial issues. To describe them, I will start from some concrete examples, and
then try to discuss the overall problem in general terms, instead of providing a list
of all the issues we have found. I am sorry for providing however many details, but
I hope they will help me sync with you, and then make other boring emails needless.

First, supposing to start the transformation from adding the low-latency heuristics of bfq
to cfq's engine, one of the main issues is that cfq chooses the next queue to serve,
within a group and class, in a different way than bfq does. In fact, cfq first chooses the
sub-group of queues to serve according to a workload-based priority scheme, and then
performs a round-robin scheduling among the queues in the sub-group. This priority scheme
not only has nothing to do with the logic of the low-latency heuristics of bfq (and actually with
bfq altogether), but also conflicts with the freedom in choosing the next queue that these
heuristics need to succeed in guaranteeing a low latency.

If, on the opposite end, we assume, because of the above issue, to proceed the other
way round, i.e., to start from replacing the engine of cfq with that of bfq, then similar, if not worse,
issues arise:
- the internal scheduler of bfq is hierarchical, whereas the internal, round-robin-based scheduler
of cfq is not
- the hierarchy-flattening scheme adopted in cfq has no counterpart in the hierarchical scheduling
algorithm of bfq
- preemption is not trivial to implement in bfq in such a way that service guarantees are preserved,
but, in the first place, would however be needed to keep a high throughput with interleaved I/O
- cfq uses the workload-based queue selection scheme I mentioned above, and this has no match
with any mechanism in bfq
...

Instead of bothering you with the full list of issues, I want to try to describe the problem, in general
terms, through the following rough simplification (in which I am neglecting trivial common code
between cfq and bfq, such as handling of I/O contexts). On one side, bfq is made by 80% of a
hierarchical fair-queueing scheduler, and by the remaining 20% of a set of heuristics to improve
some performance indexes. On the other side, cfq is made, roughly, by 40% of a simple round-robin
flat scheduler, and by the remaining 60%, of: an extension to support hierarchical scheduling,
workload-based improvements, preemption, virtual-time extensions, further low-latency mechanisms,
and so on. This remaining 60% of cfq has practically very little in common with the above 20% of
heuristics in bfq (although many of the goals of these parts are the same). Probably, commonalities
amount to at most a 10%. The problem is then the remaining, almost completely incompatible, 90%
of non-common mechanisms.

To make a long story short, to implement a smooth transition from cfq to bfq, this 90% should of
course be progressively transformed along the way. This would apparently imply that:
- the intermediate versions would not be partial versions of either cfq or bfq;
- the performance of these intermediate versions would most certainly be worse than that of both
cfq and bfq, as the mechanisms of the latter schedulers have been fine-tuned over the years,
whereas the hybrid mechanisms in the intermediate versions would just be an attempt to avoid
abrupt changes;
- these hybrid mechanisms would likely be more complex than the original ones;
- in the final steps of the transformation, these hybrid mechanisms will all have to be further
changed to become those of bfq, or just thrown away.

In the end, a smooth transition seems messy and confusing. On the opposite end, I thought about
a cleaner but sharper solution, which probably better matches the one proposed by Tejun:
1) removing the 60% of extra code of cfq from around the round-robin engine of cfq, 2) turning the
remaining core into a flat version of bfq-v0, 3) turning this flat scheduler into the actual, hierarchical
bfq-v0, 4) applying the remaining bfq patches.

In general, with both a smooth but messy and a sharp but clean transformation, there seems to be
the following common problems:
1) The main benefits highlighted by Jens, i.e., being able to move back and forth and easily
understand what works and what does not, seem to be lost, because, with both solutions,
intermediate versions would likely have a worse performance than the current version of cfq.
2) bfq, on one side, does not export some of the sysfs parameters of cfq, such as slice_sync, and,
on the other side, uses other common parameters in a different way. For example, bfq turns I/O priorities
into throughput shares in a different way than cfq does. As a consequence, existing configurations may
break or behave in unexpected ways.

I?m sorry for the long list of (only) problems, but, because of the extent at which cfq and bfq have diverged
over the years, we are having a really hard time finding a sensible way to turn the former into the latter.
Of course, we are willing to do our best once we find a viable solution.

Thanks,
Paolo

2014-06-19 01:46:07

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hello,

On Tue, Jun 17, 2014 at 05:55:57PM +0200, Paolo Valente wrote:
> In general, with both a smooth but messy and a sharp but clean
> transformation, there seems to be the following common problems:
>
> 1) The main benefits highlighted by Jens, i.e., being able to move
> back and forth and easily understand what works and what does not,
> seem to be lost, because, with both solutions, intermediate versions
> would likely have a worse performance than the current version of
> cfq.

So, the perfectly smooth and performant transformation is possible,
it'd be great, but I don't really think that'd be the case. My
opinion is that if the infrastructure pieces can be mostly maintained
while making logical gradual steps it should be fine. ie. pick
whatever strategy which seems executable, chop down the pieces which
get in the way (ie. tear down all the cfq heuristics if you have to),
transform the base and then build things on top again. Ensuring that
each step is logical and keeps working should give us enough safety
net, IMO.

Jens, what do you think?

> 2) bfq, on one side, does not export some of the sysfs parameters of
> cfq, such as slice_sync, and, on the other side, uses other common
> parameters in a different way. For example, bfq turns I/O priorities
> into throughput shares in a different way than cfq does. As a
> consequence, existing configurations may break or behave in
> unexpected ways.

This is why I hate exposing internal knobs without layering proper
semantic interpretation on top. It ends up creating unnecessary
lock-in effect too often just to serve some esoteric cases which
aren't all that useful. For knobs which don't make any sense for the
new scheduler, the appropriate thing to do would be just making them
noop and generate a warning message when it's written to.

As for behavior change for existing users, any change to scheduler
does that. I don't think it's practical to avoid any changes for that
reason. I think there already is a pretty solid platform to base
things on and the way forward is making the changes and iterating as
testing goes on and issues get reported.

Thanks.

--
tejun

2014-06-19 01:49:26

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Wed, Jun 18, 2014 at 09:46:00PM -0400, Tejun Heo wrote:
...
> So, the perfectly smooth and performant transformation is possible,
^
if
> it'd be great, but I don't really think that'd be the case. My

--
tejun

2014-06-19 02:29:59

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On 2014-06-18 18:46, Tejun Heo wrote:
> Hello,
>
> On Tue, Jun 17, 2014 at 05:55:57PM +0200, Paolo Valente wrote:
>> In general, with both a smooth but messy and a sharp but clean
>> transformation, there seems to be the following common problems:
>>
>> 1) The main benefits highlighted by Jens, i.e., being able to move
>> back and forth and easily understand what works and what does not,
>> seem to be lost, because, with both solutions, intermediate versions
>> would likely have a worse performance than the current version of
>> cfq.
>
> So, the perfectly smooth and performant transformation is possible,
> it'd be great, but I don't really think that'd be the case. My
> opinion is that if the infrastructure pieces can be mostly maintained
> while making logical gradual steps it should be fine. ie. pick
> whatever strategy which seems executable, chop down the pieces which
> get in the way (ie. tear down all the cfq heuristics if you have to),
> transform the base and then build things on top again. Ensuring that
> each step is logical and keeps working should give us enough safety
> net, IMO.
>
> Jens, what do you think?

I was thinking the same - strip CFQ back down, getting rid of the
heuristics, then go forward to BFQ. That should be feasible. You need to
find the common core first.

>> 2) bfq, on one side, does not export some of the sysfs parameters of
>> cfq, such as slice_sync, and, on the other side, uses other common
>> parameters in a different way. For example, bfq turns I/O priorities
>> into throughput shares in a different way than cfq does. As a
>> consequence, existing configurations may break or behave in
>> unexpected ways.
>
> This is why I hate exposing internal knobs without layering proper
> semantic interpretation on top. It ends up creating unnecessary
> lock-in effect too often just to serve some esoteric cases which
> aren't all that useful. For knobs which don't make any sense for the
> new scheduler, the appropriate thing to do would be just making them
> noop and generate a warning message when it's written to.
>
> As for behavior change for existing users, any change to scheduler
> does that. I don't think it's practical to avoid any changes for that
> reason. I think there already is a pretty solid platform to base
> things on and the way forward is making the changes and iterating as
> testing goes on and issues get reported.

Completely agree, don't worry about that. It's not like we advertise
hard guarantees on the priorities right now, for instance, so as long as
the end result isn't orders of magnitude different for the
classes/levels, then it'll likely be good enough.

Ditto on the sysfs files, as some of those are likely fairly widely
used. But if we warn and do nothing, then that'll allow us to sort out
popular uses of it before we (later on) remove the files.

--
Jens Axboe

2014-06-23 13:53:37

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Il giorno 19/giu/2014, alle ore 04:29, Jens Axboe <[email protected]> ha scritto:

> On 2014-06-18 18:46, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Jun 17, 2014 at 05:55:57PM +0200, Paolo Valente wrote:
>>> In general, with both a smooth but messy and a sharp but clean
>>> transformation, there seems to be the following common problems:
>>>
>>> 1) The main benefits highlighted by Jens, i.e., being able to move
>>> back and forth and easily understand what works and what does not,
>>> seem to be lost, because, with both solutions, intermediate versions
>>> would likely have a worse performance than the current version of
>>> cfq.
>>
>> So, the perfectly smooth and performant transformation is possible,
>> it'd be great, but I don't really think that'd be the case. My
>> opinion is that if the infrastructure pieces can be mostly maintained
>> while making logical gradual steps it should be fine. ie. pick
>> whatever strategy which seems executable, chop down the pieces which
>> get in the way (ie. tear down all the cfq heuristics if you have to),
>> transform the base and then build things on top again. Ensuring that
>> each step is logical and keeps working should give us enough safety
>> net, IMO.
>>
>> Jens, what do you think?
>
> I was thinking the same - strip CFQ back down, getting rid of the heuristics, then go forward to BFQ. That should be feasible. You need to find the common core first.

OK, I will try exactly this approach (hoping not to have misunderstood anything).
Here is, very briefly, the strategy I am thinking about:
1) In a first, only-destructive phase, bring CFQ back, more or less, to its state
at the time when BFQ was forked initially, and justify the removal of every heuristic
and improvement. Depending on how many patches come out during this phase,
possibly pack them into a first, separate patch series.
2) In a second, only-constructive phase: (a) turn the stripped-down version of CFQ into
a flat BFQ-v0, (b) turn the latter into BFQ-v0, and, finally, (c) progressively turn BFQ-v0
into the last version of BFQ, through the previously-submitted patches. Of course after
fixing and improving all the involved patches according to the suggestions and corrections
of Tejun.

I will wait shortly for a possible feedback on this proposal, and, then, if nothing has still to be
changed or refined, silently start the process.

>
>>> 2) bfq, on one side, does not export some of the sysfs parameters of
>>> cfq, such as slice_sync, and, on the other side, uses other common
>>> parameters in a different way. For example, bfq turns I/O priorities
>>> into throughput shares in a different way than cfq does. As a
>>> consequence, existing configurations may break or behave in
>>> unexpected ways.
>>
>> This is why I hate exposing internal knobs without layering proper
>> semantic interpretation on top. It ends up creating unnecessary
>> lock-in effect too often just to serve some esoteric cases which
>> aren't all that useful. For knobs which don't make any sense for the
>> new scheduler, the appropriate thing to do would be just making them
>> noop and generate a warning message when it's written to.
>>
>> As for behavior change for existing users, any change to scheduler
>> does that. I don't think it's practical to avoid any changes for that
>> reason. I think there already is a pretty solid platform to base
>> things on and the way forward is making the changes and iterating as
>> testing goes on and issues get reported.
>
> Completely agree, don't worry about that. It's not like we advertise hard guarantees on the priorities right now, for instance, so as long as the end result isn't orders of magnitude different for the classes/levels, then it'll likely be good enough.
>
> Ditto on the sysfs files, as some of those are likely fairly widely used. But if we warn and do nothing, then that'll allow us to sort out popular uses of it before we (later on) remove the files.

Great, thanks. BTW, most of the ?internal? parameters inappropriately exposed by BFQ,
as noted by Tejun, were exposed just because we forgot to remove them while turning
the testing version of BFQ into the submitted one. Sorry about that.

Thanks,
Paolo

>
> --
> Jens Axboe

--
Paolo Valente
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy
homepage: http://algogroup.unimore.it/people/paolo/

2014-06-23 19:20:37

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

On Mon, Jun 23, 2014 at 03:53:09PM +0200, Paolo Valente wrote:
> I will wait shortly for a possible feedback on this proposal, and,
> then, if nothing has still to be changed or refined, silently start
> the process.

We'll prolly end up doing a few iterations but overall it sounds good
to me.

Thanks.

--
tejun

2014-07-09 20:54:57

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Hoping that it may help people get a better idea of the features of bfq (while we work on the patches), I just uploaded a new, shorter demo (7 minutes) of BFQ with an SSD:
http://youtu.be/KhZl9LjCKuU

Paolo

Il giorno 23/giu/2014, alle ore 21:20, Tejun Heo <[email protected]> ha scritto:

> On Mon, Jun 23, 2014 at 03:53:09PM +0200, Paolo Valente wrote:
>> I will wait shortly for a possible feedback on this proposal, and,
>> then, if nothing has still to be changed or refined, silently start
>> the process.
>
> We'll prolly end up doing a few iterations but overall it sounds good
> to me.
>
> Thanks.
>
> --
> tejun

--
Paolo Valente
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy
homepage: http://algogroup.unimore.it/people/paolo/