2008-01-14 22:49:50

by Chuck Lever III

[permalink] [raw]
Subject: Comments on some of the work in NFS_ALL for 2.6.25

Hi Trond-

A couple of initial comments on the RPC client API cleanups in the
latest NFS_ALL.

First, I'm a little uncomfortable with the new rpc_call_start()
function. Seems like that's a detail for the RPC client to manage,
not the ULPs (nor the developers of ULPs who want to use the RPC
client). I think it would be cleaner (and less error prone) if the
RPC client set up tk_action entirely automatically rather than
requiring ULPs to know when to invoke rpc_call_start().

Second, I see that you have created a new rpc_run_task() function
that replaces rpc_init_task + rpc_execute. It manages the signal
mask appropriately. However, in the NFS direct I/O path, we still
have a pair of rpc_clnt_sigmask() calls in nfs_direct_read() and
nfs_direct_write() that wrap all the RPC dispatching done in the
lower layers.

I had carefully optimized the direct path so there was only a single
pair of sigmask calls for each application I/O request, rather than
one pair for every dispatched RPC. With the rpc_run_task interface,
we're now back to doing one pair for each RPC. That seems like a
step backwards to me...?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-01-14 23:11:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: Comments on some of the work in NFS_ALL for 2.6.25


On Mon, 2008-01-14 at 17:47 -0500, Chuck Lever wrote:
> Hi Trond-
>
> A couple of initial comments on the RPC client API cleanups in the
> latest NFS_ALL.
>
> First, I'm a little uncomfortable with the new rpc_call_start()
> function. Seems like that's a detail for the RPC client to manage,
> not the ULPs (nor the developers of ULPs who want to use the RPC
> client). I think it would be cleaner (and less error prone) if the
> RPC client set up tk_action entirely automatically rather than
> requiring ULPs to know when to invoke rpc_call_start().

You cannot avoid that. There are plenty of NFSv4 prepare_task()
callbacks that requires you to loop until you have a sequence lock.
Previously, if you had a prepare_task() callback, then you had to call
rpc_call_setup(), which essentially does the same thing as
rpc_call_start + sets up the rpc_message. rpc_call_start() simplifies
things in that it allows the caller to set up the rpc_message before
calling rpc_execute.

> Second, I see that you have created a new rpc_run_task() function
> that replaces rpc_init_task + rpc_execute. It manages the signal
> mask appropriately. However, in the NFS direct I/O path, we still
> have a pair of rpc_clnt_sigmask() calls in nfs_direct_read() and
> nfs_direct_write() that wrap all the RPC dispatching done in the
> lower layers.

Yup. We can get rid of those. The whole idea here is that we should have
_one_ entry point to the RPC code instead of the 5 or 6 different ways
of setting up an rpc_task that we had previously.

IOW: I don't want to have to code up 5 or 6 different ways of blocking
RPC calls when we get the migration/replication code up and running.

> I had carefully optimized the direct path so there was only a single
> pair of sigmask calls for each application I/O request, rather than
> one pair for every dispatched RPC. With the rpc_run_task interface,
> we're now back to doing one pair for each RPC. That seems like a
> step backwards to me...?

AFAICS, nfs_execute_read() dispatches only _one_ RPC call. Ditto for
nfs_execute_write(). Those are the critical code paths...

Trond


2008-01-14 23:26:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: Comments on some of the work in NFS_ALL for 2.6.25


On Mon, 2008-01-14 at 18:10 -0500, Trond Myklebust wrote:

> > I had carefully optimized the direct path so there was only a single
> > pair of sigmask calls for each application I/O request, rather than
> > one pair for every dispatched RPC. With the rpc_run_task interface,
> > we're now back to doing one pair for each RPC. That seems like a
> > step backwards to me...?
>
> AFAICS, nfs_execute_read() dispatches only _one_ RPC call. Ditto for
> nfs_execute_write(). Those are the critical code paths...

In fact, if you are interested in optimising away sigmask calls, then
the first target should AFAICS be asynchronous RPC calls. Why set the
sigmask at all when doing reads and writes?

Cheers,
Trond


2008-01-15 17:44:55

by Chuck Lever III

[permalink] [raw]
Subject: Re: Comments on some of the work in NFS_ALL for 2.6.25

Howdy Trond-

On Jan 14, 2008, at 6:10 PM, Trond Myklebust wrote:
> On Mon, 2008-01-14 at 17:47 -0500, Chuck Lever wrote:
>> Hi Trond-
>>
>> A couple of initial comments on the RPC client API cleanups in the
>> latest NFS_ALL.
>>
>> First, I'm a little uncomfortable with the new rpc_call_start()
>> function. Seems like that's a detail for the RPC client to manage,
>> not the ULPs (nor the developers of ULPs who want to use the RPC
>> client). I think it would be cleaner (and less error prone) if the
>> RPC client set up tk_action entirely automatically rather than
>> requiring ULPs to know when to invoke rpc_call_start().
>
> You cannot avoid that. There are plenty of NFSv4 prepare_task()
> callbacks that requires you to loop until you have a sequence lock.
> Previously, if you had a prepare_task() callback, then you had to call
> rpc_call_setup(), which essentially does the same thing as
> rpc_call_start + sets up the rpc_message. rpc_call_start() simplifies
> things in that it allows the caller to set up the rpc_message before
> calling rpc_execute.

Of course, I need to review all of this more carefully; my comments
are just a first impression. From what I can tell, the setting of
tk_action gates whether the RPC is actually dispatched, or just
binned, when you call rpc_run_task(). If that's so, then it would be
easier to understand if you code that logic explicitly in the ULP,
and not use a side effect (like the tk_action setting).

As an aside, I see also you have open-coded "task->tk_action = NULL;"
in the NFSv4 procs. If you must make "rpc_call_start" an exported
function, it would be cleaner API design to introduce a matching
function to clear task->tk_action as well. Something like
rpc_arm_task() to set tk_action to call_start and rpc_disarm_task()
to set tk_action to NULL.

But honestly I think hiding all of this entirely from ULPs provides a
better and simpler RPC client interface. Basically ULPs are
puttering around in RPC client internal data structures with the
current design.

>> Second, I see that you have created a new rpc_run_task() function
>> that replaces rpc_init_task + rpc_execute. It manages the signal
>> mask appropriately. However, in the NFS direct I/O path, we still
>> have a pair of rpc_clnt_sigmask() calls in nfs_direct_read() and
>> nfs_direct_write() that wrap all the RPC dispatching done in the
>> lower layers.
>
> Yup. We can get rid of those.

As I understand it, we want to prevent a signal from interrupting the
dispatcher until all the RPCs have been sent. Otherwise, we are
inviting fractured reads and writes. This is an important feature of
the direct I/O engine.

> The whole idea here is that we should have
> _one_ entry point to the RPC code instead of the 5 or 6 different ways
> of setting up an rpc_task that we had previously.
>
> IOW: I don't want to have to code up 5 or 6 different ways of blocking
> RPC calls when we get the migration/replication code up and running.

Ah. That's a reasonable goal, but unfortunately we don't have the
context provided by nice cover letters when posting these jumbo
change sets via git. It would be nice if that objective was stated
in at least one of the patch descriptions.

>> I had carefully optimized the direct path so there was only a single
>> pair of sigmask calls for each application I/O request, rather than
>> one pair for every dispatched RPC. With the rpc_run_task interface,
>> we're now back to doing one pair for each RPC. That seems like a
>> step backwards to me...?
>
> AFAICS, nfs_execute_read() dispatches only _one_ RPC call. Ditto for
> nfs_execute_write(). Those are the critical code paths...


Those are used only in the cached path.

> In fact, if you are interested in optimising away sigmask calls, then
> the first target should AFAICS be asynchronous RPC calls. Why set the
> sigmask at all when doing reads and writes?


Yes, but not at the expense of potentially fracturing writes.

I'll try to stop by CITI tomorrow morning, if you'll be around, to
discuss it.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-01-15 18:47:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: Comments on some of the work in NFS_ALL for 2.6.25


On Tue, 2008-01-15 at 12:42 -0500, Chuck Lever wrote:
> Howdy Trond-

> As an aside, I see also you have open-coded "task->tk_action = NULL;"
> in the NFSv4 procs. If you must make "rpc_call_start" an exported
> function, it would be cleaner API design to introduce a matching
> function to clear task->tk_action as well. Something like
> rpc_arm_task() to set tk_action to call_start and rpc_disarm_task()
> to set tk_action to NULL.

I can add an rpc_call_abort() to wrap the tk_action clear, but frankly
there is a big difference here: the rpc_call_start() is there in order
to hide the private function call_start(). It is not there in order to
hide task->tk_action...

> But honestly I think hiding all of this entirely from ULPs provides a
> better and simpler RPC client interface. Basically ULPs are
> puttering around in RPC client internal data structures with the
> current design.

To which internal data structures are you referring?

> >> Second, I see that you have created a new rpc_run_task() function
> >> that replaces rpc_init_task + rpc_execute. It manages the signal
> >> mask appropriately. However, in the NFS direct I/O path, we still
> >> have a pair of rpc_clnt_sigmask() calls in nfs_direct_read() and
> >> nfs_direct_write() that wrap all the RPC dispatching done in the
> >> lower layers.
> >
> > Yup. We can get rid of those.
>
> As I understand it, we want to prevent a signal from interrupting the
> dispatcher until all the RPCs have been sent. Otherwise, we are
> inviting fractured reads and writes. This is an important feature of
> the direct I/O engine.

We only need to wrap the interruptible waits. I've got a patch that
moves the rpc_sigmask call into nfs_direct_wait(). If you add that to
the patch that gets rid of sigmask for the asynchronous rpc task case.
then that gets rid of them altogether for the aio/dio path.

> > In fact, if you are interested in optimising away sigmask calls, then
> > the first target should AFAICS be asynchronous RPC calls. Why set the
> > sigmask at all when doing reads and writes?
>
>
> Yes, but not at the expense of potentially fracturing writes.

I fail to see how this could occur. There are no interruptible waits or
any other checks for signals in the loop that schedules the write rpcs.

Trond