Hello Jens,
In SCST (http://scst.sf.net) in some cases IO can be submitted
asynchronously. This is possible for pass-through (i.e. using
scsi_execute_async()) and BLOCKIO (i.e. using direct bio interface, see
blockio_exec_rw() in
http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/src/dev_handlers/scst_vdisk.c?revision=614&view=markup)
backend. For them there's no need to have a per device pool of threads,
one or more global thread(s) can perfectly do all the work. But it is
very desirable for performance that all the IO is submitted in a
dedicated IO context for each initiator (i.e. client), which originated
it. I.e. commands from initiator 1 submitted in IO context IOC1, from
initiator 2 - IOC2, etc. Most likely, the same approach would be very
useful for NFS server as well.
To achieve that it is necessary to have a possibility to switch IO
context of the threads on the fly. I tried to implement that (see the
attached patch), but hit BUG_ON(!cic->dead_key) in cic_free_func(), when
session for initiator with the corresponding IO context was being
destroyed by scst_free_tgt_dev(). At that point it was guaranteed that
there was no outstanding IO with this IO context.
So, I had to go to a more defensive approach to have for each pool of
threads, including threads for async. IO, a dedicated IO context, which
is currently implemented.
Could you advice please what was going wrong? What should I do to
achieve what's desired?
Thanks,
Vlad
Fabio Checconi, on 12/16/2008 11:22 AM wrote:
>> In SCST (http://scst.sf.net) in some cases IO can be submitted
>> asynchronously. This is possible for pass-through (i.e. using
>> scsi_execute_async()) and BLOCKIO (i.e. using direct bio interface, see
>> blockio_exec_rw() in
>> http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/src/dev_handlers/scst_vdisk.c?revision=614&view=markup)
>> backend. For them there's no need to have a per device pool of threads, one
>> or more global thread(s) can perfectly do all the work. But it is very
>> desirable for performance that all the IO is submitted in a dedicated IO
>> context for each initiator (i.e. client), which originated it. I.e.
>> commands from initiator 1 submitted in IO context IOC1, from initiator 2 -
>> IOC2, etc. Most likely, the same approach would be very useful for NFS
>> server as well.
>>
>> To achieve that it is necessary to have a possibility to switch IO
>> context of the threads on the fly. I tried to implement that (see the
>> attached patch), but hit BUG_ON(!cic->dead_key) in cic_free_func(), when
>> session for initiator with the corresponding IO context was being
>> destroyed by scst_free_tgt_dev(). At that point it was guaranteed that
>> there was no outstanding IO with this IO context.
>>
>> So, I had to go to a more defensive approach to have for each pool of
>> threads, including threads for async. IO, a dedicated IO context, which
>> is currently implemented.
>>
>> Could you advice please what was going wrong? What should I do to
>> achieve what's desired?
>
> I think the problem may be that cfq expects cfq_exit_io_context()
> to be called before the last reference to an io context is put.
> Since cfq_exit_io_context() is called during process exit, and AFAICT
> you are not calling exit_io_context() on the given ioc, cfq finds it
> in an incorrect state.
With your hint I figured out that put_io_context() isn't sufficient and
I should also call exit_io_context() instead of the latest
put_io_context(). Thanks!
> I haven't seen the rest of the code, so I may be wrong, but I suppose
> that a better approach would be to use CLONE_IO to share io contexts,
> if possible.
Unfortunately, it would be very non-optimal. As it is known, to achieve
the best performance with async. IO, it should be submitted by a limited
number of threads <= CPU count. So, the only way to submit IO from each
of, e.g. 100, clients in a dedicated per-client IO context is to
dynamically switch io_context of the current threads to io_context of
the client before IO submission.
Vlad
On Wed, Dec 17 2008, Vladislav Bolkhovitin wrote:
> >I haven't seen the rest of the code, so I may be wrong, but I suppose
> >that a better approach would be to use CLONE_IO to share io contexts,
> >if possible.
>
> Unfortunately, it would be very non-optimal. As it is known, to achieve
> the best performance with async. IO, it should be submitted by a limited
> number of threads <= CPU count. So, the only way to submit IO from each
> of, e.g. 100, clients in a dedicated per-client IO context is to
> dynamically switch io_context of the current threads to io_context of
> the client before IO submission.
There's also likely to be another use of exactly the same type of thing
- the acall patches from Zach. At least my vision of the punt-to-thread
approach would be very similar: grab an available thread and attach it
to a given IO context.
So while I did mention exactly what Fabio outlines in my initial mail on
this, a generic way to attach/detach IO contexts from processes/threads
would be useful outside of this project. nfsd comes to mind as well.
--
Jens Axboe
Jens Axboe, on 12/18/2008 04:40 PM wrote:
> On Wed, Dec 17 2008, Vladislav Bolkhovitin wrote:
>>> I haven't seen the rest of the code, so I may be wrong, but I suppose
>>> that a better approach would be to use CLONE_IO to share io contexts,
>>> if possible.
>> Unfortunately, it would be very non-optimal. As it is known, to achieve
>> the best performance with async. IO, it should be submitted by a limited
>> number of threads <= CPU count. So, the only way to submit IO from each
>> of, e.g. 100, clients in a dedicated per-client IO context is to
>> dynamically switch io_context of the current threads to io_context of
>> the client before IO submission.
>
> There's also likely to be another use of exactly the same type of thing
> - the acall patches from Zach. At least my vision of the punt-to-thread
> approach would be very similar: grab an available thread and attach it
> to a given IO context.
>
> So while I did mention exactly what Fabio outlines in my initial mail on
> this, a generic way to attach/detach IO contexts from processes/threads
> would be useful outside of this project. nfsd comes to mind as well.
I fully agree. Looking forward to use that facility in 2.6.29! (I guess,
from your reference on Zach's patches, that the work is being done. If
I'm wrong and you need my participation in implementing it, just let me
know.)
Thanks,
Vlad