2008-07-03 20:46:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> Hi Trond-
>
> Seven patches that implement kernel RPC service registration via rpcbind v4.
> This allows the kernel to advertise IPv4-only services on hosts with IPv6
> addresses, for example.

This is Trond's baliwick, but I read through all 7 quickly and they
looked good to me....

--b.

>
> ---
>
> Chuck Lever (7):
> SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
> SUNRPC: Quickly detect missing portmapper during RPC service registration
> SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect
> SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
> SUNRPC: None of rpcb_create's callers wants a privileged source port
> SUNRPC: Introduce a specific rpcb_create for contacting localhost
> SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
>
>
> include/linux/sunrpc/clnt.h | 3
> include/linux/sunrpc/sched.h | 2
> net/sunrpc/clnt.c | 2
> net/sunrpc/rpcb_clnt.c | 290 +++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 263 insertions(+), 34 deletions(-)
>
> --
> Chuck Lever
> chu ckl eve rat ora cle dot com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-07-07 18:20:29

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> > Hi Trond-
> >
> > Seven patches that implement kernel RPC service registration via rpcbind v4.
> > This allows the kernel to advertise IPv4-only services on hosts with IPv6
> > addresses, for example.
>
> This is Trond's baliwick, but I read through all 7 quickly and they
> looked good to me....

They look more or less OK to me too, however I'm a bit unhappy about the
RPC_TASK_ONESHOT name: it isn't at all descriptive.

I also have questions about the change to a TCP socket here. Why not
just implement connected UDP sockets?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2008-07-07 18:46:07

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
> On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
>> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
>>> Hi Trond-
>>>
>>> Seven patches that implement kernel RPC service registration via
>>> rpcbind v4.
>>> This allows the kernel to advertise IPv4-only services on hosts
>>> with IPv6
>>> addresses, for example.
>>
>> This is Trond's baliwick, but I read through all 7 quickly and they
>> looked good to me....
>
> They look more or less OK to me too, however I'm a bit unhappy about
> the
> RPC_TASK_ONESHOT name: it isn't at all descriptive.

Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
bit wordy ;-)

> I also have questions about the change to a TCP socket here. Why not
> just implement connected UDP sockets?

Changing rpcb_register() to use a TCP socket is less work overall, and
we get a positive hand shake between the kernel and user space when
the TCP connection is opened.

Other services might also want to use TCP+ONESHOT for several short
requests over a real network with actual packet loss, but they might
find CUDP+ONESHOT less practical/reliable (or even forbidden in the
case of NFSv4). So we would end up with something of a one-off
implementation for rpcb_register.

The downside of using TCP in this case is that it's more overhead: 8
packets instead of two for registration in the common case, and it
leaves a single privileged port in TIME_WAIT for each registered
service. I don't think this matters much as registration happens
quite infrequently.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-07-07 18:51:35

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

On Mon, 2008-07-07 at 14:43 -0400, Chuck Lever wrote:
> On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
> > On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
> >> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> >>> Hi Trond-
> >>>
> >>> Seven patches that implement kernel RPC service registration via
> >>> rpcbind v4.
> >>> This allows the kernel to advertise IPv4-only services on hosts
> >>> with IPv6
> >>> addresses, for example.
> >>
> >> This is Trond's baliwick, but I read through all 7 quickly and they
> >> looked good to me....
> >
> > They look more or less OK to me too, however I'm a bit unhappy about
> > the
> > RPC_TASK_ONESHOT name: it isn't at all descriptive.
>
> Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
> bit wordy ;-)

RPC_TASK_CONNECT_ONCE ?

> > I also have questions about the change to a TCP socket here. Why not
> > just implement connected UDP sockets?
>
> Changing rpcb_register() to use a TCP socket is less work overall, and
> we get a positive hand shake between the kernel and user space when
> the TCP connection is opened.
>
> Other services might also want to use TCP+ONESHOT for several short
> requests over a real network with actual packet loss, but they might
> find CUDP+ONESHOT less practical/reliable (or even forbidden in the
> case of NFSv4). So we would end up with something of a one-off
> implementation for rpcb_register.

I don't see what that has to do with anything: the connection failed
codepath in call_connect_status() should be the same in both the TCP and
the UDP case.

> The downside of using TCP in this case is that it's more overhead: 8
> packets instead of two for registration in the common case, and it
> leaves a single privileged port in TIME_WAIT for each registered
> service. I don't think this matters much as registration happens
> quite infrequently.

The problem is that registration usually happens at boot time, which is
also when most of the NFS 'mount' requests will be eating privileged
ports.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2008-07-07 19:44:23

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

On Mon, Jul 7, 2008 at 2:51 PM, Trond Myklebust
<[email protected]> wrote:
> On Mon, 2008-07-07 at 14:43 -0400, Chuck Lever wrote:
>> On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
>> > On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
>> >> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
>> >>> Hi Trond-
>> >>>
>> >>> Seven patches that implement kernel RPC service registration via
>> >>> rpcbind v4.
>> >>> This allows the kernel to advertise IPv4-only services on hosts
>> >>> with IPv6
>> >>> addresses, for example.
>> >>
>> >> This is Trond's baliwick, but I read through all 7 quickly and they
>> >> looked good to me....
>> >
>> > They look more or less OK to me too, however I'm a bit unhappy about
>> > the
>> > RPC_TASK_ONESHOT name: it isn't at all descriptive.
>>
>> Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
>> bit wordy ;-)
>
> RPC_TASK_CONNECT_ONCE ?

That's not the semantic I was really going for. FAIL_ON_CONNRESET is
probably closer.

>> > I also have questions about the change to a TCP socket here. Why not
>> > just implement connected UDP sockets?
>>
>> Changing rpcb_register() to use a TCP socket is less work overall, and
>> we get a positive hand shake between the kernel and user space when
>> the TCP connection is opened.
>>
>> Other services might also want to use TCP+ONESHOT for several short
>> requests over a real network with actual packet loss, but they might
>> find CUDP+ONESHOT less practical/reliable (or even forbidden in the
>> case of NFSv4). So we would end up with something of a one-off
>> implementation for rpcb_register.
>
> I don't see what that has to do with anything: the connection failed
> codepath in call_connect_status() should be the same in both the TCP and
> the UDP case.

If you would like connected UDP, I won't object to you implementing
it. However, I never tested whether a connected UDP socket will give
the desired semantics without extra code in the UDP transport (for
example, an ->sk_error callback). I don't think it's worth the hassle
if we have to add code to UDP that only this tiny use case would need.

>> The downside of using TCP in this case is that it's more overhead: 8
>> packets instead of two for registration in the common case, and it
>> leaves a single privileged port in TIME_WAIT for each registered
>> service. I don't think this matters much as registration happens
>> quite infrequently.
>
> The problem is that registration usually happens at boot time, which is
> also when most of the NFS 'mount' requests will be eating privileged
> ports.

You're talking about the difference between supporting say 1358 mounts
at boot time versus 1357 mounts at boot time.

In most cases, a client with hundreds of mounts will use up exactly
one extra privileged TCP port to register NLM during the first
lockd_up() call. If these are all NFSv4 mounts, it will use exactly
zero extra ports, since the NFSv4 callback service is not even
registered.

Considering that _each_ mount operation can take between 2 and 5
privileged ports, while registering NFSD and NLM both would take
exactly two ports at boot time, I think that registration is wrong
place to optimize.

--
Chuck Lever