2007-05-16 19:18:27

by Greg Banks

[permalink] [raw]
Subject: [RFC,PATCH 0/14] A transport switch for knfsd

G'day,

These 14 patches are an experimental transport switch for knfsd.
They're based on Tom Tucker's 01-svc-xprt-switch.patch from the
nfsrdma project November release, but redesigned to provide as simple
and clean an abstraction as possible to new transport-specific code.
Various messy details of flags, reference counts and other behaviour
which are currently redundantly handled in both TCP and UDP code,
will be handled in generic code now. This makes the task of writing
new transport code easier and less prone to breakage.

These patches have received light testing in a tree similar to 2.6.19
on ia64, and been forward ported without further testing to 2.6.21.
They're intended for testing in Tom's development tree, but are cc'ed
to the Linux mailing list for initial review.

[RFC,PATCH 1/14] knfsd: add transport ops
[RFC,PATCH 2/14] knfsd: delete per transport
[RFC,PATCH 3/14] knfsd: prepare reply per transport
[RFC,PATCH 4/14] knfsd: has_wspace per transport
[RFC,PATCH 5/14] knfsd: max_payload per transport
[RFC,PATCH 6/14] knfsd: add svc_sock_is_connection
[RFC,PATCH 7/14] knfsd: export svc_sock_enqueue, svc_sock_received
[RFC,PATCH 8/14] knfsd: centralise SK_CLOSE handling
[RFC,PATCH 9/14] knfsd: centralise SK_CONN handling
[RFC,PATCH 10/14] knfsd: add SK_LISTENER
[RFC,PATCH 11/14] knfsd: centralise SK_DATA handling
[RFC,PATCH 12/14] knfsd: add svc_sock_get
[RFC,PATCH 13/14] knfsd: add svc_sock_init
[RFC,PATCH 14/14] knfsd: centralise SK_ bits some more

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-05-16 20:53:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC,PATCH 0/14] A transport switch for knfsd

On Thu, May 17, 2007 at 05:18:21AM +1000, Greg Banks wrote:
> These 14 patches are an experimental transport switch for knfsd.
> They're based on Tom Tucker's 01-svc-xprt-switch.patch from the
> nfsrdma project November release, but redesigned to provide as simple
> and clean an abstraction as possible to new transport-specific code.
> Various messy details of flags, reference counts and other behaviour
> which are currently redundantly handled in both TCP and UDP code,
> will be handled in generic code now. This makes the task of writing
> new transport code easier and less prone to breakage.

Are there other conjectured future users besides rdma?

What's happened to server-side ipv6, by the way?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-05-17 01:37:07

by Talpey, Thomas

[permalink] [raw]
Subject: Re: [RFC,PATCH 0/14] A transport switch for knfsd

At 04:53 PM 5/16/2007, J. Bruce Fields wrote:
>On Thu, May 17, 2007 at 05:18:21AM +1000, Greg Banks wrote:
>> will be handled in generic code now. This makes the task of writing
>> new transport code easier and less prone to breakage.
>
>Are there other conjectured future users besides rdma?

SCTP is certainly something to consider.

Tom.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-05-17 07:00:12

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC,PATCH 0/14] A transport switch for knfsd

On Wed, May 16, 2007 at 04:53:17PM -0400, J. Bruce Fields wrote:
> On Thu, May 17, 2007 at 05:18:21AM +1000, Greg Banks wrote:
> > These 14 patches are an experimental transport switch for knfsd.
> > They're based on Tom Tucker's 01-svc-xprt-switch.patch from the
> > nfsrdma project November release, but redesigned to provide as simple
> > and clean an abstraction as possible to new transport-specific code.
> > Various messy details of flags, reference counts and other behaviour
> > which are currently redundantly handled in both TCP and UDP code,
> > will be handled in generic code now. This makes the task of writing
> > new transport code easier and less prone to breakage.
>
> Are there other conjectured future users besides rdma?

I don't know of any being planned, but you could imagine support for
DCCP or SCTP (although to be frank those would probably be simple
extensions of existing UDP and TCP code respectively).

You could also imagine transport code that made NFS work fast on
various cluster interconnects that aren't IB or deliberately designed
to pretend to be IB. One example is xpmem, which uses the block
copy offload in Altix hardware to communicate between partitions.
It's a very fast transport but the way IP is encoded on it limits
NFS transfer rates to a small fraction of what the hardware can do.

But basically RDMA is the one that's driving the need for a transport
switch because it's really very different, e.g. it doesn't use sockets.

> What's happened to server-side ipv6, by the way?

Unsure. There's certainly a lot of code support for it, I kept
tripping over it when forward porting these patches. It looks like
you'd need to have rpc.nfsd create it's own socket in userspace and
pass it down via /proc/fs/nfsd/ports.

It's intertesting to note that ipv6 support was added without a
serverside transport switch; on Irix the addition of ipv6 was what
justified a transport switch.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-05-17 12:12:01

by Talpey, Thomas

[permalink] [raw]
Subject: Re: [RFC,PATCH 0/14] A transport switch for knfsd

At 03:00 AM 5/17/2007, Greg Banks wrote:
>It's intertesting to note that ipv6 support was added without a
>serverside transport switch; on Irix the addition of ipv6 was what
>justified a transport switch.

At the moment, the client-side support doesn't use it either, though it
did in earlier experiments. Right now it seems that simply passing in a
AF_INET6 address is sufficient to trigger its use.

I think the key thing about a "transport" in the NFS context is that it's
really a "transport API", i.e. it uses sockets, or RDMA, or has some special
property that requires special interface handling.

For example, while SCTP in its basic single-stream mode uses sockets and
might be a relatively close match to TCP, there are a lot of other aspects
to the protocol which are nothing like it and won't be visible pretending it's
just TCP (e.g. associations, integrity, etc). Having a bottom-edge transport
abstraction helps a lot in taking advantage of them.

Tom.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs