2008-01-03 12:16:53

by Jeff Garzik

[permalink] [raw]
Subject: A new NFSv4 server...

In case some developers are interested... I'm poking at a from-scratch
userland NFSv4 server, as a side project.

In my personal opinion, version 4 of NFS is a quantum-leap improvement
over previous versions. While I used NFS v3 extensively, I always felt
it was a crappy protocol, and unworthy of serious development effort.
That changed with v4.

I chose to use NFSv4 as the basis for experiments (hopefully yielding
production software) that I've long wanted to do in reliable
filesystems, distributed filesystems, and other fun areas.

In the first step down this long path, I've created an NFSv4 userland
server from scratch. Currently it merely serves data straight from RAM,
but the long term goal is to permit modular storage backends. Thus you
could implement a simple RAM backend, an sqlite-based backend or a
complex distributed storage backend.

As this is a first-mention developer-only announcement, I didn't bother
to create source tarballs. Here is the git repo:
git://git.kernel.org/pub/scm/daemon/nfs/nfs4-ram.git

This is the home page, but it's mainly a stub pointing to the git repo:
http://linux.yyz.us/projects/nfsv4.html

The server will
* serve data from RAM, with NFSv4 persistent filehandles and FILE_SYNC4
* destroy all data, when the process exits
* pass 97% of the useful pynfs tests (cvs latest)
* pass fsx-linux stress testing, with Linux NFSv4 client (2.6.recent)
* pass kernel build stress testing, with Linux NFSv4 client (2.6.recent)

It will not, at the present time,
* store any data or metadata in stable storage
* do RPCSEC_GSS (thus, not yet RFC-compliant)
* do delegations (thus, with reduced caching and increased
revalidations, can be slower than disk-based storage)

At this point, I'm quite interested to hear feedback on how the server
works with other NFSv4 clients. I'm interested in making sure the
server is portable to FreeBSD and other OS's, even though it was
developed and tested only on Linux. I also intend to use some
Linux-specific syscalls, most notably sync_file_range(2) and
sendfile/tee/splice/vmsplice, so that will have to be glossed over by
portability code.

Finally, this is a spare time project, something I've mostly been poking
at while having idle time on a not-Internet-connected laptop.
Technically its sponsored by Red Hat, since RH pays my salary for all my
open source work, but this is largely a personal project done for
personal reasons. I just hope others find it interesting or useful, as
it progresses.

Jeff



2008-01-04 09:07:45

by Benny Halevy

[permalink] [raw]
Subject: Re: A new NFSv4 server...

On Jan. 04, 2008, 9:04 +0200, Jeff Garzik <[email protected]> wrote:
>>> I really wish the entire wire protocol were scrapped and replaced with
>>> something more sane, and easier to parse.
>> You had me worried there for a moment, I thought you might be the first
>> person to admit to liking the NFS4 protocol design.
>>
>>> It's tempting to see what would arise from a clean-slate wire protocol
>>> effort, something that is otherwise compatible with NFS 4.x operations,
>>> objects, and data model.
>
> It's more like v4 is a vast relative improvement over prior NFS. Given
> the huge number of NFS users and sites, IMO v4 is a huge improvement for
> Unix file sharing overall.
>
> But if you are dreaming of a truly clean slate protocol... I've got a
> long wish list too :)
>
>
>> Much like the old phone system, the primary value of protocols like NFS
>> is the
>> widespread presence of reliable conformant implementations. Most of the
>> rest of
>> the NFS is problematic. I would argue that some aspects of the NFS
>> operations,
>> objects, and data model is rather more busted than the XDR encoding.
>>
>> The classic persistent file handles, for example, could be considered a
>> major
>> design flaw. Firstly it makes the inode# -> dentry lookup a performance
>> path
>> for the underlying filesystem, which it isn't in any local load.
>
> Oh, certainly. I was mainly thinking a replacement of the wire protocol
> would be an easier step for people to swallow than a new protocol.
>
> But if you are implying there is enough momentum to simply rewrite NFS
> from scratch, I'll cheer and help out with coding :) Or maybe Zach
> Brown will do it for us with CRFS:
> http://linux.conf.au/programme/detail?TalkID=247
>
> A big feature of NFS today is its high Just Works(tm) value (ease of
> configuration and some minimum level of fault tolerance), so any
> replacement would need to have similar attributes.
>
>
>> Another major flaw is putting the client in control of when unstable data is
>> written to disk, but not providing any way for the client to find out how to
>> do that optimally.
>>
>> Then there's the NFS4 approach to extended attributes.
>
> Ugh. Don't get me started. That's not in my server yet, but I can
> already see the mess ahead.
>
> Jeff
>

Jeff, taking into account the amount of effort people and different
organizations have already put into NFSv4 and NFSv4.1 I wish you could
tunnel your inventive energy into making NFSv4.1 better rather than
trying to reinvent NFS/RPC/XDR.

Although It's rather late in the process since the NFSv4 working group
is close to putting the NFSv4.1 Internet-draft up for last
call, we would certainly appreciate more implementation feedback.

Benny

2008-01-04 15:49:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Benny Halevy wrote:
> Jeff, taking into account the amount of effort people and different
> organizations have already put into NFSv4 and NFSv4.1 I wish you could
> tunnel your inventive energy into making NFSv4.1 better rather than
> trying to reinvent NFS/RPC/XDR.
>
> Although It's rather late in the process since the NFSv4 working group
> is close to putting the NFSv4.1 Internet-draft up for last
> call, we would certainly appreciate more implementation feedback.


I am more than happy to give feedback, though (as you say) it is
probably too late for substantial feedback to have any large effect.

My general engineering opinions of pNFS:

* Fills an obvious need: eliminating the need to copy data through the
metadata interface to backend storage. Many clear, tangible benefits here.


* pNFS major issue #1: client storage protocol

Storing and retrieving blobs over the network, with strong
authentication/integrity/security, is a solved problem.

Pick ONE client storage protocol (HTTP? iSCSI OSD2?), and stick to it.
Or maybe HTTP|SCSI but nothing more. Heck, even BitTorrent w/ auth
extensions would be better than yet another protocol for similar
purposes (not that I'm advocating BT, just saying...).

Maximize reuse of existing software and mindshare.


* pNFS major issue #2: abandons NFS's "one true generic" path

I believe pNFS violates the "spirit of NFS" by deviating from a defacto
assumption found in earlier versions: data transfer is simple,
arbitrary blobs, addressed in the same manner, and sent via the same
protocol.

Pick ONE layout type, and stick to it. Banish all other layout types to
other software layers.

Protocol conversion servers, firmware, and other softwares can easily
convert from a generic layout to something more exotic like OSD or
[insert site specific protocol here].

NFS itself should not be delving into low-level storage details like
this. Clients should not need to know low-level details (like stripe
sizes). In Linux, we call this a layering violation.

Working on kernel storage drivers as I do, I can see the attraction of
wanting to do things this way... but we invented layering and
abstraction in computer science for good reasons :)


* pNFS major issue #3: no longer a "closed loop" protocol

By permitting multiple layout types, and in particular undefined
(site-specific) layout types, it is by definition _impossible_ for
anyone to claim full protocol interoperability with other implementations.

The number of possible combinations approaches infinity, with obvious
consequences on testing, and production software quality.

And when a marketing department advertises "fully NFSv4.1 compliant!" on
ther company's appliance, it is trivial for any engineer to construct
another "fully NFSv4.1 compliant" setup -- with equivalent
authentication, metadata and data sets -- that is not interoperable
except via the fallback case (copy through the metadata server).

Such interoperability breakdowns are IMO not in the spirit of NFS.

Jeff

2008-01-04 16:14:07

by Jeff Garzik

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Peter =C5strand wrote:
> Do you know about the n4 project
> (http://cvs.samba.org/cgi-bin/cvsweb/n4/) ? It was abandon many years
> ago, but might have some usefulness. =


Nope, I'll definitely take look. Already imported it into git using =

git-cvsimport. :)


>> term goal is to permit modular storage backends. Thus you could impleme=
nt a
>> simple RAM backend, an sqlite-based backend or a complex distributed sto=
rage
>> backend.
> =

> unfs3 has a basic modular backend system. I wonder if it would be
> possible to merge your server with unfs3, and still have something
> that's readable. If you are interested, take a look at
> http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/unfs3/?root=3Dunfs3.

I'm always interested in [legally] stealing useful ideas and code, so I =

will definitely take a look.

I was sorta thinking about implementing a couple backends, and seeing =

what API organically appears. That's sorta how Linux kernel API =

"design" happens, and it tends to produce something useful and compact, =

if not a bit unique :)

I can imagine that some backends may wish that the server handle some =

details of state and locking, while other backends may wish to record =

all that information into a database in stable storage. So it's =

difficult to forecast how all that will fall out in the end. unfs3 =

probably has many lessons to teach me...

So far my best resource for NFS technical "folklore" is generally =

google, which turns up a wealth of useful mailing list discussions =

involving neilb, meisler, and others.

Jeff


P.S. cvsps, the util git-cvsimport uses, doesn't seem to like the unfs3 =

CVS repository. Any ideas?

Running cvsps...
connect error: Network is unreachable
cvs rlog: Logging unfs3
cvs rlog: Logging unfs3/Config
cvs rlog: Logging unfs3/Extras
cvs rlog: Logging unfs3/contrib
cvs rlog: Logging unfs3/contrib/nfsotpclient
cvs rlog: Logging unfs3/contrib/nfsotpclient/mountclient
cvs rlog: Logging unfs3/contrib/rpcproxy
cvs rlog: Logging unfs3/doc
Fetching LICENSE v 1.1
New LICENSE: 1416 bytes
Fetching Makefile.in v 1.1
Unknown: error

The same command works just fine with 99% of cvs repositories out there, =

pserver, ssh, or whatever.

And a regular CVS checkout works just fine, I am able to check out and =

browse files and look for gems.

2008-01-04 17:47:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: A new NFSv4 server...

On Fri, Jan 04, 2008 at 11:07:45AM +0200, Benny Halevy wrote:
> Jeff, taking into account the amount of effort people and different
> organizations have already put into NFSv4 and NFSv4.1 I wish you could
> tunnel your inventive energy into making NFSv4.1 better rather than
> trying to reinvent NFS/RPC/XDR.

It's also important to have fun. Imagining what you could do from a
clean slate is, if nothing else, a fun exercise. And it may end up with
ideas that turn out to be implementable without starting from scratch.

--b.

2008-01-04 19:52:01

by Benny Halevy

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Jeff Garzik wrote:
> Benny Halevy wrote:
>> Jeff, taking into account the amount of effort people and different
>> organizations have already put into NFSv4 and NFSv4.1 I wish you could
>> tunnel your inventive energy into making NFSv4.1 better rather than
>> trying to reinvent NFS/RPC/XDR.
>>
>> Although It's rather late in the process since the NFSv4 working group
>> is close to putting the NFSv4.1 Internet-draft up for last
>> call, we would certainly appreciate more implementation feedback.
>
>
> I am more than happy to give feedback, though (as you say) it is
> probably too late for substantial feedback to have any large effect.
>
> My general engineering opinions of pNFS:
>
> * Fills an obvious need: eliminating the need to copy data through
> the metadata interface to backend storage. Many clear, tangible
> benefits here.
>
>
> * pNFS major issue #1: client storage protocol
>
> Storing and retrieving blobs over the network, with strong
> authentication/integrity/security, is a solved problem.
>
> Pick ONE client storage protocol (HTTP? iSCSI OSD2?), and stick to it.
> Or maybe HTTP|SCSI but nothing more. Heck, even BitTorrent w/ auth
> extensions would be better than yet another protocol for similar
> purposes (not that I'm advocating BT, just saying...).

Well, two things about that: maybe if pnfs started from scratch this
was the approach that could have been taken but one of the motivating
factors for pNFS (and actually one that I believe will help make it
successful)
was the desire to replace existing proprietary file system protocols
from several vendors such as EMC, IBM, or Panasas that used
different storage protocols. Second, providing support for several
kinds of storage is better for customers having existing storage
they want to harness together with pNFS.

>
> Maximize reuse of existing software and mindshare.

That's always good.
>
>
> * pNFS major issue #2: abandons NFS's "one true generic" path
>
> I believe pNFS violates the "spirit of NFS" by deviating from a
> defacto assumption found in earlier versions: data transfer is
> simple, arbitrary blobs, addressed in the same manner, and sent via
> the same protocol.
>
> Pick ONE layout type, and stick to it. Banish all other layout types
> to other software layers.
I agree wholeheartedly that we should have had one layout data structure
("layout type" is a loaded
term in the spec...). This was my position from day one (and even
before) but unfortunately it wasn't
accepted and each "layout type" got to define it's own layout data
structure while we could have
defined one generic data structure for mapping files onto all different
kinds of storage devices
while keeping only the device addressing information private to the
"layout type" (== storage
protocol class), plus some other data that's internal to the layout
type, e.g. OSD capabilities.

>
> Protocol conversion servers, firmware, and other softwares can easily
> convert from a generic layout to something more exotic like OSD or
> [insert site specific protocol here].
>
> NFS itself should not be delving into low-level storage details like
> this. Clients should not need to know low-level details (like stripe
> sizes). In Linux, we call this a layering violation.
This is the essence of the layout type concept, implemented as a layout
driver in the linux nfsv4.1 client
implementation. The fact that the files layout type definition are part
of the nfsv4.1 protocol are a mere
fact that the files based layout type uses NFSv4.1 as the storage
protocol. It could very well be defined
in a separate RFC exactly like the blocks and objects layout type
specifications and that can keep its
internal data structures opaque to the generic NFSv4.1 protocol (which
behaves as a transport protocol
for the layout-type specific data)

>
> Working on kernel storage drivers as I do, I can see the attraction of
> wanting to do things this way... but we invented layering and
> abstraction in computer science for good reasons :)
Yup. I think that the layout driver is the software layer you're looking
for.
>
>
> * pNFS major issue #3: no longer a "closed loop" protocol
>
> By permitting multiple layout types, and in particular undefined
> (site-specific) layout types, it is by definition _impossible_ for
> anyone to claim full protocol interoperability with other
> implementations.
>
> The number of possible combinations approaches infinity, with obvious
> consequences on testing, and production software quality.
The non-standard layout types are defined as experimental. To claim
interoperability one would
need to publish a suitable specification of the new layout type (see
"Defining new layout types",
section 22.4 of
http://www.nfsv4-editor.org/draft-18/draft-ietf-nfsv4-minorversion1-18.html#pnfsiana).
Though I agree that allowing a single file to be accessed with multiple
layout types (in theory)
complicates testing typically there will be at most one layout type per
file system.
>
> And when a marketing department advertises "fully NFSv4.1 compliant!"
> on ther company's appliance, it is trivial for any engineer to
> construct another "fully NFSv4.1 compliant" setup -- with equivalent
> authentication, metadata and data sets -- that is not interoperable
> except via the fallback case (copy through the metadata server).
The compliance with NFSv4.1 is indeed tied with the legacy I/O path, and
for pnfs, with the files layout type,
as it is a part of NFSv4.1. Other implementation of NFSv4.1 with pNFS
over non-files layout types will
have to claim compliance with their respective standards. For example,
Panasas's implementation will need
to comply with the OSD standard, iSCSI (or FC), the Object-based pNFS
RFC, and finally NFSv4.1
> Such interoperability breakdowns are IMO not in the spirit of NFS.
That's a part of making progress IMO...

Benny
>
> Jeff
>
>


2008-01-04 19:55:53

by Benny Halevy

[permalink] [raw]
Subject: Re: A new NFSv4 server...

J. Bruce Fields wrote:
> On Fri, Jan 04, 2008 at 11:07:45AM +0200, Benny Halevy wrote:
>
>> Jeff, taking into account the amount of effort people and different
>> organizations have already put into NFSv4 and NFSv4.1 I wish you could
>> tunnel your inventive energy into making NFSv4.1 better rather than
>> trying to reinvent NFS/RPC/XDR.
>>
>
> It's also important to have fun. Imagining what you could do from a
> clean slate is, if nothing else, a fun exercise. And it may end up with
> ideas that turn out to be implementable without starting from scratch.
>
> --b.
>
Absolutely :)
And I certainly hope it will benefit all of us.

Benny

2008-01-04 19:58:13

by Peter Åstrand

[permalink] [raw]
Subject: Re: A new NFSv4 server...

On Fri, 4 Jan 2008, Jeff Garzik wrote:

> P.S. cvsps, the util git-cvsimport uses, doesn't seem to like the unfs3 CVS
> repository. Any ideas?

Seems to work for me. I tested with:

$ cvs -d :pserver:[email protected]:/cvsroot/unfs3 co unfs3
$ cd unfs3
$ cvsps

Using cvsps version 2.1.


Rgds,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00

2008-01-04 20:31:03

by Muntz, Daniel

[permalink] [raw]
Subject: RE: A new NFSv4 server...

I like it. NFS 3.5. Drop pNFS into the v3 protocol (a greatly simplified =
pNFS compared to what we ended up with in 4.1), and you'd have yourself a s=
weet little distributed fs. The important part is that any such effort be =
called "NFS x.y". Naming is important, or we'd be running AFS/DFS instead =
of NFS 4.0 (please, try to resist flaming me on the hyperbole).

-Dan

-----Original Message-----
From: Peter =C5strand [mailto:[email protected]] =

Sent: Friday, January 04, 2008 1:15 AM
To: Jeff Garzik
Cc: NFS list; [email protected]
Subject: Re: A new NFSv4 server...


[About v4]


On Fri, 4 Jan 2008, Jeff Garzik wrote:

> > > I really wish the entire wire protocol were scrapped and replaced =

> > > with something more sane, and easier to parse.
> > You had me worried there for a moment, I thought you might be the =

> > first person to admit to liking the NFS4 protocol design.

Couldn't agree more. =




> In my personal opinion, version 4 of NFS is a quantum-leap improvement =

> over previous versions. While I used NFS v3 extensively, I always =

> felt it was a crappy protocol, and unworthy of serious development =

> effort. That changed with v4.
...
> > > It's tempting to see what would arise from a clean-slate wire =

> > > protocol effort, something that is otherwise compatible with NFS =

> > > 4.x operations, objects, and data model.
> =

> It's more like v4 is a vast relative improvement over prior NFS. =

> Given the huge number of NFS users and sites, IMO v4 is a huge =

> improvement for Unix file sharing overall.

Many years ago, before NFSv4 was finished, I felt the same. I was waiting f=
or v4 and thought that everything would be so much better. I wanted to help=
and started the "pynfs" project. Today, I have a different opinion. I thin=
k v3 is a fairly good protocol, if you use it correctly. For example, many =
people don't realize that you don't need the portmapper, that you can use a=
single well-known TCP port, that you can use RPCSEC_GSS and so forth, even=
with v3. =



I think v4 has a few valuable improvements, but it comes with a very high p=
rice. v3 has a minimalistic beauty which v4 lacks. For example, take a look=
at the OPEN operation with 7 arguments, of which many are complex data str=
uctures:

(cfh), seqid, share_access, share_deny, owner, openhow, claim ->
(cfh), stateid, cinfo, rflags, open_confirm, attrset delegation

Not pretty... =




> Oh, certainly. I was mainly thinking a replacement of the wire =

> protocol would be an easier step for people to swallow than a new protoco=
l.

I've been thinking of trying to put together something like NFS v3.5. Some =
parts of v4 are nice, but the complexity is too high. =



Regards, =

---
Peter =C5strand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Link=F6ping Phone: +46-13-21 46 00

2008-01-05 01:46:42

by Greg Banks

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Jeff Garzik wrote:
> Benny Halevy wrote:
>>
>> Although It's rather late in the process since the NFSv4 working group
>> is close to putting the NFSv4.1 Internet-draft up for last
>> call, we would certainly appreciate more implementation feedback.
>
>
> I am more than happy to give feedback, though (as you say) it is
> probably too late for substantial feedback to have any large effect.
About five years too late. Witness the uncomfortable hacks required to
retrofit the extra Sessions fields into the protocol without changing
the basic COMPOUND arguments and results structures, which a minor version
doesn't allow you to do.


>
> My general engineering opinions of pNFS:
>
> * Fills an obvious need: eliminating the need to copy data through
> the metadata interface to backend storage. Many clear, tangible
> benefits here.
>
>
> *[...]
> Pick ONE client storage protocol [...] and stick to it.
>
> * [...]
> Pick ONE layout type, and stick to it.
> * pNFS major issue #3: no longer a "closed loop" protocol
>
>
> And when a marketing department advertises "fully NFSv4.1 compliant!"
> on ther company's appliance, it is trivial for any engineer to
> construct another "fully NFSv4.1 compliant" setup -- with equivalent
> authentication, metadata and data sets -- that is not interoperable
> except via the fallback case (copy through the metadata server).
>
> Such interoperability breakdowns are IMO not in the spirit of NFS.

Strongly agreed with all the above. It's difficult to avoid the
conclusion that the current pNFS spec is designed to allow the
existing parallel filesystem vendors to sell "standards compliant"
solutions that only work with their client software and where all
the MDS and DS machines need to be bought from the same vendor.
If the spec defined a single layout type, and all three protocols
involved were variants of NFSv4.1 and were defined in the spec,
we would have a true open standard.

--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

2008-01-05 07:56:03

by Benny Halevy

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Greg Banks wrote:
> Strongly agreed with all the above. It's difficult to avoid the
> conclusion that the current pNFS spec is designed to allow the
> existing parallel filesystem vendors to sell "standards compliant"
> solutions that only work with their client software and where all
> the MDS and DS machines need to be bought from the same vendor.
> If the spec defined a single layout type, and all three protocols
> involved were variants of NFSv4.1 and were defined in the spec,
> we would have a true open standard.
>
>
Greg, I'm afraid your conclusion is just wrong. What exactly is it
based on?
I'd appreciate if you could look again at the current Internet Drafts
comprising
NFSv4.1 and layout types and please raise any issues you see in the specs
that would jeopardize interoperability between different client software
vendors
and different server / storage vendors.

http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1
http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj
http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-block

There is indeed a third protocol in the overall architecture used by the
MDS to manage
the storage devices and this protocol is outside the scope of NFSv4.1.
This may lead
to non-interoperable implementations of server/storage systems but that
definitely was
not the intent of the design decision to leave the storage management
protocol unspecified
in NFSv4.1.

The object-based layout type. for example. is based on using the
standard OSD protocol
between the MDS and the OSDs for control as well as between the clients
and the OSDs
for data transfer. How does that preclude interoperability between
different client, MDS,
and DS vendors if the MDS, OSDs, and clients comply with T-10 OSD and
MDS and
client comply with NFSv4.1 and the pnfs-obj RFC (when it becomes one)?

Benny

2008-01-03 16:32:00

by J. Bruce Fields

[permalink] [raw]
Subject: Re: A new NFSv4 server...

On Thu, Jan 03, 2008 at 07:16:49AM -0500, Jeff Garzik wrote:
> At this point, I'm quite interested to hear feedback on how the server
> works with other NFSv4 clients.

Any possibility of making Connectathon in May?:

http://www.connectathon.org/

The major NFSv4 implementors normally have their clients there, so it's
usually the quickest way to find and solve any interoperability
problems. Almost all the work will probably be on sessions and pNFS,
but people should be willing to do basic 4.0 testing too.

Glad to hear you're still working on this--it sounds interesting.

--b.

2008-01-04 05:32:53

by Jeff Garzik

[permalink] [raw]
Subject: Re: A new NFSv4 server...

J. Bruce Fields wrote:
> On Thu, Jan 03, 2008 at 07:16:49AM -0500, Jeff Garzik wrote:
>> At this point, I'm quite interested to hear feedback on how the server
>> works with other NFSv4 clients.
>
> Any possibility of making Connectathon in May?:
>
> http://www.connectathon.org/
>
> The major NFSv4 implementors normally have their clients there, so it's
> usually the quickest way to find and solve any interoperability
> problems. Almost all the work will probably be on sessions and pNFS,
> but people should be willing to do basic 4.0 testing too.

As this isn't an official RH project, I would probably have to pay my
own way, which makes it doubtful :)

Plus, surely in this day and age, we can figure out something better
than waiting for face-to-face events to test something. Maybe somebody
could arrange a donation of some slice of a grid (Amazon EC2?), make
various OS images available, and give engineers some way to request a
selection of tests, with a selection of OS images?


> Glad to hear you're still working on this--it sounds interesting.

Certainly pNFS parallels some of the work I want to do... NFSv4.1 is so
darned complex though. I am torn as to whether or not I want to take my
server down that path.

I really wish the entire wire protocol were scrapped and replaced with
something more sane, and easier to parse. The variable-length
structures passed to PCI hardware these days [as seen in the kernel
drivers I hack on, IOW] are just as compact, if not more so, but are
designed to be parsed quickly in large chunks, rather than the "next XDR
may be your last!" approach :)

Sessions are IMO a tad overdone, too... largely due to necessities
forced upon NFSv4.1 by the legacy RPC protocol assumptions. If you
simply /assume/ basic properties of TCP or SCTP, it's a lot easier to do
multi-channel or multi-homed messaging. Multi-channel _isn't_ really
that hard, and we've been doing it since the earliest days of NNTP and
the Usenet Top 1000 pissing contest, if not longer.

It's tempting to see what would arise from a clean-slate wire protocol
effort, something that is otherwise compatible with NFS 4.x operations,
objects, and data model.

Jeff




2008-01-04 06:16:26

by Greg Banks

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Jeff Garzik wrote:
> J. Bruce Fields wrote:
>>
>> http://www.connectathon.org/
>>
>>
> As this isn't an official RH project, I would probably have to pay my
> own way, which makes it doubtful :)
>

It's entirely possible someone might run your server code on a spare
machine,
given functioning install packages and easy instructions.
> Plus, surely in this day and age, we can figure out something better
> than waiting for face-to-face events to test something. Maybe somebody
> could arrange a donation of some slice of a grid (Amazon EC2?), make
> various OS images available, and give engineers some way to request a
> selection of tests, with a selection of OS images?
>
Vendors turn up to cthon with proprietary and unreleased software and
hardware
which they most certainly are not going to let anyone else run for
them. Also,
being in the same hall with all those vendors' technical folks tends to
make bugs
shallow. It's a very valuable exercise for any organisation making a
living from
NFS.

> I really wish the entire wire protocol were scrapped and replaced with
> something more sane, and easier to parse.
You had me worried there for a moment, I thought you might be the first
person to admit to liking the NFS4 protocol design.

> It's tempting to see what would arise from a clean-slate wire protocol
> effort, something that is otherwise compatible with NFS 4.x operations,
> objects, and data model.
>

Much like the old phone system, the primary value of protocols like NFS
is the
widespread presence of reliable conformant implementations. Most of the
rest of
the NFS is problematic. I would argue that some aspects of the NFS
operations,
objects, and data model is rather more busted than the XDR encoding.

The classic persistent file handles, for example, could be considered a
major
design flaw. Firstly it makes the inode# -> dentry lookup a performance
path
for the underlying filesystem, which it isn't in any local load.
Secondly, it's
inherently insecure if you export anything less than an entire
filesystem, unless
you use a slow, buggy, and non-conformant hack like subtree_check.

Another major flaw is putting the client in control of when unstable data is
written to disk, but not providing any way for the client to find out how to
do that optimally.

Then there's the NFS4 approach to extended attributes.

--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-01-04 07:04:47

by Jeff Garzik

[permalink] [raw]
Subject: Re: A new NFSv4 server...

Greg Banks wrote:
> It's entirely possible someone might run your server code on a spare
> machine,
> given functioning install packages and easy instructions.

Easy enough to do...


>> Plus, surely in this day and age, we can figure out something better
>> than waiting for face-to-face events to test something. Maybe somebody
>> could arrange a donation of some slice of a grid (Amazon EC2?), make
>> various OS images available, and give engineers some way to request a
>> selection of tests, with a selection of OS images?
>>
> Vendors turn up to cthon with proprietary and unreleased software and
> hardware
> which they most certainly are not going to let anyone else run for
> them. Also,
> being in the same hall with all those vendors' technical folks tends to
> make bugs
> shallow. It's a very valuable exercise for any organisation making a
> living from
> NFS.

Certainly, but I could see a grid of released, non-proprietary software
as quite a valuable resource in addition to f2f events. Quality can
only increase, if the [Linux | *BSD | OpenSolaris |...] NFS clients
could run regression tests against several different NFS servers, each
time an NFS client receives a set of changes.

Even if it's only the open source operating systems that wish to
participate, having a mix of OS's and platforms would be useful. A
permanent, virtual cthon.


>> I really wish the entire wire protocol were scrapped and replaced with
>> something more sane, and easier to parse.
> You had me worried there for a moment, I thought you might be the first
> person to admit to liking the NFS4 protocol design.
>
>> It's tempting to see what would arise from a clean-slate wire protocol
>> effort, something that is otherwise compatible with NFS 4.x operations,
>> objects, and data model.

It's more like v4 is a vast relative improvement over prior NFS. Given
the huge number of NFS users and sites, IMO v4 is a huge improvement for
Unix file sharing overall.

But if you are dreaming of a truly clean slate protocol... I've got a
long wish list too :)


> Much like the old phone system, the primary value of protocols like NFS
> is the
> widespread presence of reliable conformant implementations. Most of the
> rest of
> the NFS is problematic. I would argue that some aspects of the NFS
> operations,
> objects, and data model is rather more busted than the XDR encoding.
>
> The classic persistent file handles, for example, could be considered a
> major
> design flaw. Firstly it makes the inode# -> dentry lookup a performance
> path
> for the underlying filesystem, which it isn't in any local load.

Oh, certainly. I was mainly thinking a replacement of the wire protocol
would be an easier step for people to swallow than a new protocol.

But if you are implying there is enough momentum to simply rewrite NFS
from scratch, I'll cheer and help out with coding :) Or maybe Zach
Brown will do it for us with CRFS:
http://linux.conf.au/programme/detail?TalkID=247

A big feature of NFS today is its high Just Works(tm) value (ease of
configuration and some minimum level of fault tolerance), so any
replacement would need to have similar attributes.


> Another major flaw is putting the client in control of when unstable data is
> written to disk, but not providing any way for the client to find out how to
> do that optimally.
>
> Then there's the NFS4 approach to extended attributes.

Ugh. Don't get me started. That's not in my server yet, but I can
already see the mess ahead.

Jeff





2008-01-04 09:34:11

by Peter Åstrand

[permalink] [raw]
Subject: Re: A new NFSv4 server...


[About v4]


On Fri, 4 Jan 2008, Jeff Garzik wrote:

> > > I really wish the entire wire protocol were scrapped and replaced with
> > > something more sane, and easier to parse.
> > You had me worried there for a moment, I thought you might be the first
> > person to admit to liking the NFS4 protocol design.

Couldn't agree more.



> In my personal opinion, version 4 of NFS is a quantum-leap improvement over
> previous versions. While I used NFS v3 extensively, I always felt it was a
> crappy protocol, and unworthy of serious development effort. That changed with
> v4.
...
> > > It's tempting to see what would arise from a clean-slate wire protocol
> > > effort, something that is otherwise compatible with NFS 4.x operations,
> > > objects, and data model.
>
> It's more like v4 is a vast relative improvement over prior NFS. Given the
> huge number of NFS users and sites, IMO v4 is a huge improvement for Unix file
> sharing overall.

Many years ago, before NFSv4 was finished, I felt the same. I was waiting
for v4 and thought that everything would be so much better. I wanted to
help and started the "pynfs" project. Today, I have a different opinion. I
think v3 is a fairly good protocol, if you use it correctly. For example,
many people don't realize that you don't need the portmapper, that you can
use a single well-known TCP port, that you can use RPCSEC_GSS and so
forth, even with v3.


I think v4 has a few valuable improvements, but it comes with a very high
price. v3 has a minimalistic beauty which v4 lacks. For example, take a
look at the OPEN operation with 7 arguments, of which many are complex
data structures:

(cfh), seqid, share_access, share_deny, owner, openhow, claim ->
(cfh), stateid, cinfo, rflags, open_confirm, attrset delegation

Not pretty...



> Oh, certainly. I was mainly thinking a replacement of the wire protocol would
> be an easier step for people to swallow than a new protocol.

I've been thinking of trying to put together something like NFS v3.5. Some
parts of v4 are nice, but the complexity is too high.


Regards,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00

2008-01-04 09:34:11

by Peter Åstrand

[permalink] [raw]
Subject: Re: A new NFSv4 server...


[About your implementation]

> In case some developers are interested... I'm poking at a from-scratch
> userland NFSv4 server, as a side project.

Cool! Being one of the unfs3 developers, I believe that userland NFS
servers are very useful.


> In the first step down this long path, I've created an NFSv4 userland server
> from scratch. Currently it merely serves data straight from RAM, but the long

Do you know about the n4 project
(http://cvs.samba.org/cgi-bin/cvsweb/n4/) ? It was abandon many years
ago, but might have some usefulness.


> term goal is to permit modular storage backends. Thus you could implement a
> simple RAM backend, an sqlite-based backend or a complex distributed storage
> backend.

unfs3 has a basic modular backend system. I wonder if it would be
possible to merge your server with unfs3, and still have something
that's readable. If you are interested, take a look at
http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/unfs3/?root=unfs3.


Regards,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00