LinuxLists.cc - Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

2014-06-02 09:46:52

Subject: Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

Il giorno 01/giu/2014, alle ore 02:03, Tejun Heo <[email protected]> ha scritto:

> Hello,
>
> On Thu, May 29, 2014 at 11:05:41AM +0200, Paolo Valente wrote:
>> Unfortunately, in the following frequent case the mechanism
>> implemented in CFQ for detecting cooperating processes and merging
>> their queues is not responsive enough to handle also the fluctuating
>> I/O pattern of the second type of processes. Suppose that one process
>> of the second type issues a request close to the next request to serve
>> of another process of the same type. At that time the two processes
>> can be considered as cooperating. But, if the request issued by the
>> first process is to be merged with some other already-queued request,
>> then, from the moment at which this request arrives, to the moment
>> when CFQ controls whether the two processes are cooperating, the two
>> processes are likely to be already doing I/O in distant zones of the
>> disk surface or device memory.
>
> I don't really follow the last part. So, the difference is that
> cooperating queue setup also takes place during bio merge too, right?

Not only, in bfq an actual queue merge is performed in the bio-merge hook.

> Are you trying to say that it's beneficial to detect cooperating
> queues before the issuing queue gets unplugged because the two queues
> might deviate while plugged?

Yes, to keep throughput high both detection and queue merging must be performed immediately.

> If so, the above paragraph is, while
> quite wordy, a rather lousy description.

Sorry for the long and badly written paragraph. Hoping to have fully understood the issue you raised, I will try to synthesize it in the next submission.

>
>> CFQ uses however preemption to get a sequential read pattern out of
>> the read requests performed by the second type of processes too. As a
>> consequence, CFQ uses two different mechanisms to achieve the same
>> goal: boosting the throughput with interleaved I/O.
>
> You mean the cfq_rq_close() preemption, right? Hmmm... interesting.
> I'm a bit bothered that we might do this multiple times for the same
> bio/request. e.g. if bio is back merged to an existing request which
> would be the most common bio merge scenario anyway, is it really
> meaningful to retry it for each merge

Unfortunately the only way to make sure that we never miss any queue-merge opportunities should be checking every time.

> and then again on submission?

Even if a queue-merge attempt fails in the invocation of an allow_merge_fn hook that returns false, the request related to the same bio may then lead to a successful queue-merge in the following add_rq_fn.

> cfq does it once when allocating the request. That seems a lot more
> reasonable to me. It's doing that once for one start sector. I mean,
> plugging is usually extremely short compared to actual IO service
> time. It's there to mask the latencies between bio issues that the
> same CPU is doing. I can't see how this earliness can be actually
> useful. Do you have results to back this one up? Or is this just
> born out of thin air?
>

Arianna added the early-queue-merge part in the allow_merge_fn hook about one year ago, as a a consequence of a throughput loss of about 30% with KVM/QEMU workloads. In particular, we ran most of the tests on a WDC WD60000HLHX-0 Velociraptor. That HDD might not be available for testing any more, but we can reproduce our results for you on other HDDs, with and without early queue merge. And, maybe through traces, we can show you that the reason for the throughput loss is exactly that described (in a wordy way) in this patch. Of course unless we have missed something.

>> +/*
>> + * Must be called with the queue_lock held.
>
> Use lockdep_assert_held() instead?
>

We agree on this and on all your other suggestions/recommendations/corrections, especially on the idea of using per-parent rq position trees. We will apply these changes on the next submission of this patch.

Thanks,
Paolo

>> + */
>> +static int bfqq_process_refs(struct bfq_queue *bfqq)
>> +{
>> + int process_refs, io_refs;
>> +
>> + io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
>> + process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
>> + return process_refs;
>> +}
> ...
>> +static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
>> +{
>> + if (request)
>> + return blk_rq_pos(io_struct);
>> + else
>> + return ((struct bio *)io_struct)->bi_iter.bi_sector;
>> +}
>> +
>> +static inline sector_t bfq_dist_from(sector_t pos1,
>> + sector_t pos2)
>> +{
>> + if (pos1 >= pos2)
>> + return pos1 - pos2;
>> + else
>> + return pos2 - pos1;
>> +}
>> +
>> +static inline int bfq_rq_close_to_sector(void *io_struct, bool request,
>> + sector_t sector)
>> +{
>> + return bfq_dist_from(bfq_io_struct_pos(io_struct, request), sector) <=
>> + BFQQ_SEEK_THR;
>> +}
>
> You can simply write the following.
>
> abs64(sector0 - sector1) < BFQQ_SEEKTHR;
>
> Note that abs64() works whether the source type is signed or unsigned.

> Also, please don't do "void *" + type switch. If it absoluately has
> to take two different types of pointers, just take two different
> pointers and BUG_ON() if both are set, but here this doesn't seem to
> be the case. Just pass around the actual sectors. This applies to
> all usages of void *io_struct.

>> +static struct bfq_queue *bfqq_close(struct bfq_data *bfqd, sector_t sector)
>
> bfqq_find_close() or find_close_bfqq()?

>
>> +/*
>> + * bfqd - obvious
>> + * cur_bfqq - passed in so that we don't decide that the current queue
>> + * is closely cooperating with itself
>> + * sector - used as a reference point to search for a close queue
>> + */
>
> If you're gonna do the above, please use proper function comment.
> Please take a look at Documentation/kernel-doc-nano-HOWTO.txt.
>

>> +static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
>> + struct bfq_queue *cur_bfqq,
>> + sector_t sector)
>> +{
>> + struct bfq_queue *bfqq;
>> +
>> + if (bfq_class_idle(cur_bfqq))
>> + return NULL;
>> + if (!bfq_bfqq_sync(cur_bfqq))
>> + return NULL;
>> + if (BFQQ_SEEKY(cur_bfqq))
>> + return NULL;
>
> Why are some of these conditions tested twice? Once here and once in
> the caller? Collect them into one place?
>

> ...
>> + if (bfqq->entity.parent != cur_bfqq->entity.parent)
>> + return NULL;
>
> This is the only place this rq position tree is being used, right?
> Any reason not to use per-parent tree instead?

The only reason could have been to save some memory, but your proposal seems much more efficient, thanks.

>
>> + if (bfq_class_rt(bfqq) != bfq_class_rt(cur_bfqq))
>> + return NULL;
>
> Test ioprio_class for equality?

>
>> +/*
>> + * Attempt to schedule a merge of bfqq with the currently in-service queue
>> + * or with a close queue among the scheduled queues.
>> + * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
>> + * structure otherwise.
>> + */
>> +static struct bfq_queue *
>> +bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>> + void *io_struct, bool request)
>> +{
>> + struct bfq_queue *in_service_bfqq, *new_bfqq;
>> +
>> + if (bfqq->new_bfqq)
>> + return bfqq->new_bfqq;
>> +
>> + if (!io_struct)
>> + return NULL;
>> +
>> + in_service_bfqq = bfqd->in_service_queue;
>> +
>> + if (in_service_bfqq == NULL || in_service_bfqq == bfqq ||
>> + !bfqd->in_service_bic)
>> + goto check_scheduled;
>> +
>> + if (bfq_class_idle(in_service_bfqq) || bfq_class_idle(bfqq))
>> + goto check_scheduled;
>> +
>> + if (bfq_class_rt(in_service_bfqq) != bfq_class_rt(bfqq))
>> + goto check_scheduled;
>> +
>> + if (in_service_bfqq->entity.parent != bfqq->entity.parent)
>> + goto check_scheduled;
>> +
>> + if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
>> + bfq_bfqq_sync(in_service_bfqq) && bfq_bfqq_sync(bfqq)) {
>> + new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
>> + if (new_bfqq != NULL)
>> + return new_bfqq; /* Merge with in-service queue */
>> + }
>> +
>> + /*
>> + * Check whether there is a cooperator among currently scheduled
>> + * queues. The only thing we need is that the bio/request is not
>> + * NULL, as we need it to establish whether a cooperator exists.
>> + */
>> +check_scheduled:
>> + new_bfqq = bfq_close_cooperator(bfqd, bfqq,
>> + bfq_io_struct_pos(io_struct, request));
>> + if (new_bfqq)
>> + return bfq_setup_merge(bfqq, new_bfqq);
>
> Why don't in_service_queue and scheduled search share the cooperation
> conditions? They should be the same, right? Shouldn't the function
> be structured like the following instead?
>
> if (bfq_try_close_cooperator(current_one, in_service_one))
> return in_service_one;
>
> found = bfq_find_close_cooperator();
> if (bfq_try_close_cooperator(current_one, found))
> return found;
> return NULL;
>
> Thanks.
>
> --
> tejun

--
Paolo Valente
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy
homepage: http://algogroup.unimore.it/people/paolo/

2014-06-03 16:28:50

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

Hello,

On Mon, Jun 02, 2014 at 11:46:45AM +0200, Paolo Valente wrote:
> > I don't really follow the last part. So, the difference is that
> > cooperating queue setup also takes place during bio merge too, right?
>
> Not only, in bfq an actual queue merge is performed in the bio-merge hook.

I think I'm a bit confused because it's named "early" queue merge
while it actually moves queue merging later than cfq - set_request()
happens before bio/rq merging. So, what it tries to do is
compensating for the lack of cfq_rq_close() preemption at request
issue time, right?

> > cfq does it once when allocating the request. That seems a lot more
> > reasonable to me. It's doing that once for one start sector. I mean,
> > plugging is usually extremely short compared to actual IO service
> > time. It's there to mask the latencies between bio issues that the
> > same CPU is doing. I can't see how this earliness can be actually
> > useful. Do you have results to back this one up? Or is this just
> > born out of thin air?
>
> Arianna added the early-queue-merge part in the allow_merge_fn hook
> about one year ago, as a a consequence of a throughput loss of about
> 30% with KVM/QEMU workloads. In particular, we ran most of the tests
> on a WDC WD60000HLHX-0 Velociraptor. That HDD might not be available
> for testing any more, but we can reproduce our results for you on
> other HDDs, with and without early queue merge. And, maybe through
> traces, we can show you that the reason for the throughput loss is
> exactly that described (in a wordy way) in this patch. Of course
> unless we have missed something.

Oh, as long as it makes measureable difference, I have no objection;
however, I do think more explanation and comments would be nice. I
still can't quite understand why retrying on each merge attempt would
make so much difference. Maybe I just failed to understand what you
wrote in the commit message. Is it because the cooperating tasks
issue IOs which grow large and close enough after merges but not on
the first bio issuance? If so, why isn't doing it on rq merge time
enough? Is the timing sensitive enough for certain workloads that
waiting till unplug time misses the opportunity? But plugging should
be relatively short compared to the time actual IOs take, so why would
it be that sensitive? What am I missing here?

Thanks.

--
tejun

2014-06-04 11:48:21

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

Il giorno 03/giu/2014, alle ore 18:28, Tejun Heo <[email protected]> ha scritto:

> Hello,
>
> On Mon, Jun 02, 2014 at 11:46:45AM +0200, Paolo Valente wrote:
>>> I don't really follow the last part. So, the difference is that
>>> cooperating queue setup also takes place during bio merge too, right?
>>
>> Not only, in bfq an actual queue merge is performed in the bio-merge hook.
>
> I think I'm a bit confused because it's named "early" queue merge
> while it actually moves queue merging later than cfq - set_request()
> happens before bio/rq merging.

There is probably something I am missing here, because, as can be seen in blk-core.c,
around line 1495, elv_set_request() is invoked in the context of the get_request() function,
which in its turn is called from blk_queue_bio() *after* attempting both a plug merge
and a merge with one of the requests in the block layer's cache. The first
attempt is lockless and doesn't involve the I/O scheduler, but the
second attempt includes invoking the allow_merge_fn hook of the scheduler
(elv_merge() -> elv_rq_merge_ok() -> elv_iosched_allow_merge()).

Furthermore, as far as I know, it is true that CFQ actually merges queues in the
set_request hook, but a cooperator is searched for a queue (and, if it is found,
the two queues are scheduled to merge) only when the queue expires after being
served (see cfq_select_queue() and the two functions cfq_close_cooperator() and
cfq_setup_merge() that it invokes). If a cooperator is found, it is forcedly
served; however, the actual merge of the two queues happens at the next
set_request (cfq_merge_bfqqs()).

In contrast, BFQ both searches for a cooperator and merges the queue with a
newly-found cooperator in the allow_merge hook, which is "earlier" with respect
to CFQ, as it doesn't need to wait for a queue to be served and expire, and for its
associated process to issue new I/O. Hence the name Early Queue Merge.

> So, what it tries to do is
> compensating for the lack of cfq_rq_close() preemption at request
> issue time, right?
>

Yes, thanks to early merging, there is then no need to recover a lost sequential
pattern through preemptions.

>>> cfq does it once when allocating the request. That seems a lot more
>>> reasonable to me. It's doing that once for one start sector. I mean,
>>> plugging is usually extremely short compared to actual IO service
>>> time. It's there to mask the latencies between bio issues that the
>>> same CPU is doing. I can't see how this earliness can be actually
>>> useful. Do you have results to back this one up? Or is this just
>>> born out of thin air?
>>
>> Arianna added the early-queue-merge part in the allow_merge_fn hook
>> about one year ago, as a a consequence of a throughput loss of about
>> 30% with KVM/QEMU workloads. In particular, we ran most of the tests
>> on a WDC WD60000HLHX-0 Velociraptor. That HDD might not be available
>> for testing any more, but we can reproduce our results for you on
>> other HDDs, with and without early queue merge. And, maybe through
>> traces, we can show you that the reason for the throughput loss is
>> exactly that described (in a wordy way) in this patch. Of course
>> unless we have missed something.
>
> Oh, as long as it makes measureable difference, I have no objection;
> however, I do think more explanation and comments would be nice. I
> still can't quite understand why retrying on each merge attempt would
> make so much difference. Maybe I just failed to understand what you
> wrote in the commit message.

If we remember well, one of the problems was exactly that a different request
may become the head request of the in-service queue between two rq merge
attempts. If we do not retry on every attempt, we lose the chance
to merge the queue at hand with the in-service queue. The two queues may
then diverge, and hence have no other opportunity to be merged.

> Is it because the cooperating tasks
> issue IOs which grow large and close enough after merges but not on
> the first bio issuance? If so, why isn't doing it on rq merge time
> enough? Is the timing sensitive enough for certain workloads that
> waiting till unplug time misses the opportunity? But plugging should
> be relatively short compared to the time actual IOs take, so why would
> it be that sensitive? What am I missing here?

The problem is not the duration of the plugging, but the fact that, if a request merge
succeeds for a bio, then there will be no set_request invocation for that bio.
Therefore, without early merging, there will be no queue merge at all.

If my replies are correct and convince you, then I will use them to integrate and
hopefully improve the documentation for this patch.

Paolo

>
> Thanks.
>
> --
> tejun

2014-06-04 13:04:53

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

Hello,

On Wed, Jun 04, 2014 at 01:47:36PM +0200, Paolo Valente wrote:
> > I think I'm a bit confused because it's named "early" queue merge
> > while it actually moves queue merging later than cfq - set_request()
> > happens before bio/rq merging.
>
>
> There is probably something I am missing here, because, as can be seen in blk-core.c,
> around line 1495, elv_set_request() is invoked in the context of the get_request() function,
> which in its turn is called from blk_queue_bio() *after* attempting both a plug merge
> and a merge with one of the requests in the block layer's cache. The first
> attempt is lockless and doesn't involve the I/O scheduler, but the
> second attempt includes invoking the allow_merge_fn hook of the scheduler
> (elv_merge() -> elv_rq_merge_ok() -> elv_iosched_allow_merge()).

Ah, you're right, set_request doesn't happen if a bio is merged into
an existing request.

> > Oh, as long as it makes measureable difference, I have no objection;
> > however, I do think more explanation and comments would be nice. I
> > still can't quite understand why retrying on each merge attempt would
> > make so much difference. Maybe I just failed to understand what you
> > wrote in the commit message.
>
> If we remember well, one of the problems was exactly that a different request
> may become the head request of the in-service queue between two rq merge
> attempts. If we do not retry on every attempt, we lose the chance
> to merge the queue at hand with the in-service queue. The two queues may
> then diverge, and hence have no other opportunity to be merged.
>
> > Is it because the cooperating tasks
> > issue IOs which grow large and close enough after merges but not on
> > the first bio issuance? If so, why isn't doing it on rq merge time
> > enough? Is the timing sensitive enough for certain workloads that
> > waiting till unplug time misses the opportunity? But plugging should
> > be relatively short compared to the time actual IOs take, so why would
> > it be that sensitive? What am I missing here?
>
> The problem is not the duration of the plugging, but the fact that, if a request merge
> succeeds for a bio, then there will be no set_request invocation for that bio.
> Therefore, without early merging, there will be no queue merge at all.
>
> If my replies are correct and convince you, then I will use them to integrate and
> hopefully improve the documentation for this patch.

Ah, okay, so it's about missing the chance to look for cooperating
queues when merge succeeds. Yeah, that makes a lot more sense to me.
If that's the case, wouldn't it be better to try finding cooperating
queues after each merge success rather than each allow_merge()
invocation? And let's please document what we're catching with the
extra attempts.

Thanks.

--
tejun

2014-06-16 11:23:45

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH RFC - TAKE TWO - 10/12] block, bfq: add Early Queue Merge (EQM)

Hi,

Il giorno 04/giu/2014, alle ore 15:04, Tejun Heo <[email protected]> ha scritto:

> Hello,
>
>> [?]

>> The problem is not the duration of the plugging, but the fact that, if a request merge
>> succeeds for a bio, then there will be no set_request invocation for that bio.
>> Therefore, without early merging, there will be no queue merge at all.
>>
>> If my replies are correct and convince you, then I will use them to integrate and
>> hopefully improve the documentation for this patch.
>
> Ah, okay, so it's about missing the chance to look for cooperating
> queues when merge succeeds. Yeah, that makes a lot more sense to me.
> If that's the case, wouldn't it be better to try finding cooperating
> queues after each merge success rather than each allow_merge()
> invocation?

If I have correctly understandood your question, then the answer is unfortunately no.

First, without any queue-merge control in allow_merge, a bio-merge attempt would
fail if the destination queue of the bio differs from the queue containing the target rq,
even if the two queues should be merged. With early queue merge, not only the two
queues are merged, but also the request-merge attempt succeeds.

Second, if the queue-merge check was executed only after a successful request merge,
then, in the case of a request-merge failure, the next chance to merge the destination queue
of the bio with some other queue would be in the next add_request. But, according to the
what we already discussed in some previous emails, waiting for the add_request is usually
too much with, e.g., the I/O generated by QEMU I/O threads.

In the end, moving queue merging to after successful request merges would most certainly
cause the throughput to drop.

> And let's please document what we're catching with the
> extra attempts.
>

Definitely, thanks,

Paolo

> Thanks.
>
> --
> tejun

--
Paolo Valente
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy
homepage: http://algogroup.unimore.it/people/paolo/