2007-12-03 20:32:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

On Fri, Nov 30, 2007 at 11:27:16AM -0500, Wendy Cheng wrote:
> Trond Myklebust wrote:
>> Actually, the real problem would be dealing with something like
>> unlink('foo') followed by open('foo', O_CREAT|O_EXCL). How do you ensure
>> that a replay of those actions following a reboot is fully consistent in
>> the face of some other client attempting an open('foo', O_CREAT) at the
>> same time?
>>
>> The problem is that a number of directory operations involve exclusive
>> semantics, and so cannot be replayed. The solution to this sort of
>> problem is going to have to involve exclusive (i.e. write) directory
>> delegations to ensure that whatever transactions one client performs
>> cannot interfere with the transactions performed by another.
>>
>>
>
> Well, a dumb question from me (borrowing Bruce's line :) ) ... even with
> "sync" in place, when server rebooted, the RPC reply cache is gone. How
> does linux server handle re-transmitted non-idempotent requests ?

Badly!

Somebody should figure out whether it would be possible for us to
implement persistent sessions in v4.1:

http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.html#Persistence

It looks hard!

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs



2007-12-03 21:14:07

by Wendy Cheng

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

J. Bruce Fields wrote:
> On Fri, Nov 30, 2007 at 11:27:16AM -0500, Wendy Cheng wrote:
>
>> Well, a dumb question from me (borrowing Bruce's line :) ) ... even with
>> "sync" in place, when server rebooted, the RPC reply cache is gone. How
>> does linux server handle re-transmitted non-idempotent requests ?
>>
>
> Badly!
>
> Somebody should figure out whether it would be possible for us to
> implement persistent sessions in v4.1:
>
> http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.html#Persistence
>
> It looks hard!
>
Or use cluster (a backup server is quite affordable nowadays) ? Was
about to kick off a new discussion about this ...

I did a prototype about 4 years ago on 2.4 kernel where the RPC reply
cache (slightly modified to include raw NFS request packets) was
mirrored by backup server (in memory). The reply was delayed to go back
to client until the mirrored reply cache entry was acknowledged by the
backup server. Upon crash, the backup server piggybacked its logic on
ext3's journal recovery code. For reply cache entries not replayed or
not recognized by jbd, nfsd resent the NFS raw requests down to
filesystem just like any new arrived requested. The prototype code was
able to gain at least 70% of the async mode performance without losing
the data.

One of other issues with our current linux-based NFS cluster failover is
also right in this arena - that is, upon failover, the non-idempotent
could introduce stale filehandle errors that have been causing headaches
with some of the applications. So mirroring RPC reply cache (to another
machine) seems to be attractive.

Any comment ? Mind I write this up and send out for discussion ?

-- Wendy

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2007-12-03 21:30:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
> Or use cluster (a backup server is quite affordable nowadays) ? Was about
> to kick off a new discussion about this ...
>
> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache
> (slightly modified to include raw NFS request packets) was mirrored by
> backup server (in memory). The reply was delayed to go back to client until
> the mirrored reply cache entry was acknowledged by the backup server. Upon
> crash, the backup server piggybacked its logic on ext3's journal recovery
> code. For reply cache entries not replayed or not recognized by jbd, nfsd
> resent the NFS raw requests down to filesystem just like any new arrived
> requested. The prototype code was able to gain at least 70% of the async
> mode performance without losing the data.
>
> One of other issues with our current linux-based NFS cluster failover is
> also right in this arena - that is, upon failover, the non-idempotent could
> introduce stale filehandle errors that have been causing headaches with
> some of the applications.

How exactly do the stale filehandles happen?

> So mirroring RPC reply cache (to another machine)
> seems to be attractive.
>
> Any comment ? Mind I write this up and send out for discussion ?

I'd be interested?

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2007-12-03 21:38:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

On Mon, Dec 03, 2007 at 04:30:04PM -0500, J. Bruce Fields wrote:
> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
> > Or use cluster (a backup server is quite affordable nowadays) ? Was about
> > to kick off a new discussion about this ...
> >
> > I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache
> > (slightly modified to include raw NFS request packets) was mirrored by
> > backup server (in memory). The reply was delayed to go back to client until
> > the mirrored reply cache entry was acknowledged by the backup server. Upon
> > crash, the backup server piggybacked its logic on ext3's journal recovery
> > code. For reply cache entries not replayed or not recognized by jbd, nfsd
> > resent the NFS raw requests down to filesystem just like any new arrived
> > requested. The prototype code was able to gain at least 70% of the async
> > mode performance without losing the data.
> >
> > One of other issues with our current linux-based NFS cluster failover is
> > also right in this arena - that is, upon failover, the non-idempotent could
> > introduce stale filehandle errors that have been causing headaches with
> > some of the applications.
>
> How exactly do the stale filehandles happen?
>
> > So mirroring RPC reply cache (to another machine)
> > seems to be attractive.
> >
> > Any comment ? Mind I write this up and send out for discussion ?
>
> I'd be interested?

Um. That was meant to be an !, not a ?.

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2007-12-03 21:50:27

by Wendy Cheng

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

J. Bruce Fields wrote:
> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
>
>> Or use cluster (a backup server is quite affordable nowadays) ? Was about
>> to kick off a new discussion about this ...
>>
>> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache
>> (slightly modified to include raw NFS request packets) was mirrored by
>> backup server (in memory). The reply was delayed to go back to client until
>> the mirrored reply cache entry was acknowledged by the backup server. Upon
>> crash, the backup server piggybacked its logic on ext3's journal recovery
>> code. For reply cache entries not replayed or not recognized by jbd, nfsd
>> resent the NFS raw requests down to filesystem just like any new arrived
>> requested. The prototype code was able to gain at least 70% of the async
>> mode performance without losing the data.
>>
>> One of other issues with our current linux-based NFS cluster failover is
>> also right in this arena - that is, upon failover, the non-idempotent could
>> introduce stale filehandle errors that have been causing headaches with
>> some of the applications.
>>
>
> How exactly do the stale filehandles happen?
>

Unless someone has fixed it .. last time we looked .. one of the causes
was like this:

A "delete" was successfully executed on one server but before replying
to client, failover occurred. The retransmitted request was sent to
take-over server that subsequently couldn't find the file (since the
file had gone). A stale filehandle (or maybe an EACCESS or EPERM, forgot
the details though) was returned.

-- Wendy


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2007-12-03 22:08:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)

On Mon, Dec 03, 2007 at 04:49:42PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
>>
>>> Or use cluster (a backup server is quite affordable nowadays) ? Was about
>>> to kick off a new discussion about this ...
>>>
>>> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply
>>> cache (slightly modified to include raw NFS request packets) was mirrored
>>> by backup server (in memory). The reply was delayed to go back to client
>>> until the mirrored reply cache entry was acknowledged by the backup
>>> server. Upon crash, the backup server piggybacked its logic on ext3's
>>> journal recovery code. For reply cache entries not replayed or not
>>> recognized by jbd, nfsd resent the NFS raw requests down to filesystem
>>> just like any new arrived requested. The prototype code was able to gain
>>> at least 70% of the async mode performance without losing the data.
>>>
>>> One of other issues with our current linux-based NFS cluster failover is
>>> also right in this arena - that is, upon failover, the non-idempotent
>>> could introduce stale filehandle errors that have been causing headaches
>>> with some of the applications.
>>>
>>
>> How exactly do the stale filehandles happen?
>>
>
> Unless someone has fixed it .. last time we looked .. one of the causes was
> like this:
>
> A "delete" was successfully executed on one server but before replying to
> client, failover occurred. The retransmitted request was sent to take-over
> server that subsequently couldn't find the file (since the file had gone).
> A stale filehandle (or maybe an EACCESS or EPERM, forgot the details
> though) was returned.

OK, makes sense. But the REMOVE operation takes a filehandle (for a
parent) and a name (the name of the thing to remove), so if it's already
been removed you'd expect something like ENOENT.

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs