2008-03-18 01:31:05

by Ray Ferguson

[permalink] [raw]
Subject: NFSERR_NOSPC nfs-client bug

I've discovered a bug in the linux nfs client. Specifically it ignores
NFSERR_NOSPC messages (code 28) from an NFS server and happily continues
pounding it with data.

This causes some rather unfortunate consequences on linux nfs servers by
exhausting resources. In 2.4, all cpus peg at 100% usage under the system
catagory. In 2.6, at least one core gets pegged at 100% iowait, but this
still triggers cascading load issues.

So far I've tested:

Opensuse-10.3 = Linux 2.6.22 (client bug confirmed)
RHAS4 = 2.6.9 (client bug confirmed)
RHAS3 = 2.4.21(No Bug: Pre-nfs4)
Solaris 9 = (No Bug)

This can be reproduced by creating a small filesystem and exporting it via
nfs. Then mount it with a buggy client and "cat /dev/zero > /nfs-share/foo"

The expected behavior is for the client to error out the write with a message
informing you that the filesystem is out of space. Instead, the client keeps
sending data and the servers kernel take a beating.

I've checked the wire and confirmed that the server is sending the NOSPC
message back to the client. Most of my testing has been nfs3 though I did
some brief testing w/ nfs2 (bug still present). I have kernel sysrq debug
data and packet captures if anyone is interested.

If this is not the correct place to report this, I would be grateful if anyone
could redirect me.

Thank you for your help.

-
Ray Ferguson


2008-03-18 01:40:14

by Greg Banks

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug

Ray Ferguson wrote:
> I've discovered a bug in the linux nfs client. Specifically it ignores
> NFSERR_NOSPC messages (code 28) from an NFS server and happily continues
> pounding it with data.
>

It doesn't ignore ENOSPC, it reports it on close(). Of course this is
often several gigabytes of lost data too late.

Sensible clients (e.g. Irix) store that error on the inode and report it
on the next call to write().

--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-03-18 01:50:23

by Ray Ferguson

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug

On Monday 17 March 2008 20:43, Greg Banks wrote:
> It doesn't ignore ENOSPC, it reports it on close(). Of course this is
> often several gigabytes of lost data too late.

In our case, the thrashing it gave our Linux NFS cluster was severe enough to
take it out of commission. The lost data from the file transfer that
triggered the event was the least of our worries.

-
Ray Ferguson

2008-03-18 02:04:36

by Greg Banks

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug

Ray Ferguson wrote:
> On Monday 17 March 2008 20:43, Greg Banks wrote:
>
>> It doesn't ignore ENOSPC, it reports it on close(). Of course this is
>> often several gigabytes of lost data too late.
>>
>
> In our case, the thrashing it gave our Linux NFS cluster was severe enough to
> take it out of commission. The lost data from the file transfer that
> triggered the event was the least of our worries.
>
>
Agreed, it sucks.

--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-03-18 03:09:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug


On Tue, 2008-03-18 at 13:08 +1100, Greg Banks wrote:
> Ray Ferguson wrote:
> > On Monday 17 March 2008 20:43, Greg Banks wrote:
> >
> >> It doesn't ignore ENOSPC, it reports it on close(). Of course this is
> >> often several gigabytes of lost data too late.
> >>
> >
> > In our case, the thrashing it gave our Linux NFS cluster was severe enough to
> > take it out of commission. The lost data from the file transfer that
> > triggered the event was the least of our worries.
> >
> >
> Agreed, it sucks.

Try a more recent kernel: 2.6.24 and more recent will report these
errors more promptly.

Trond


2008-03-18 03:59:31

by Greg Banks

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug

Trond Myklebust wrote:
> On Tue, 2008-03-18 at 13:08 +1100, Greg Banks wrote:
>
>> Ray Ferguson wrote:
>>
>>> On Monday 17 March 2008 20:43, Greg Banks wrote:
>>>
>>>
>>>> It doesn't ignore ENOSPC, it reports it on close(). Of course this is
>>>> often several gigabytes of lost data too late.
>>>>
>>>>
>>> In our case, the thrashing it gave our Linux NFS cluster was severe enough to
>>> take it out of commission. The lost data from the file transfer that
>>> triggered the event was the least of our worries.
>>>
>>>
>>>
>> Agreed, it sucks.
>>
>
> Try a more recent kernel: 2.6.24 and more recent will report these
> errors more promptly.
>
So that would be
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7b159fc18d417980f57aef64cab3417ee6af70f8

?
Is my reading right, that a solitary transient ENOSPC is still reported
at close(), but a sequence of two or more ENOSPC is reported in the next
write() call?

--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-03-18 06:30:16

by NeilBrown

[permalink] [raw]
Subject: Re: NFSERR_NOSPC nfs-client bug

On Tue, March 18, 2008 3:03 pm, Greg Banks wrote:
> So that would be
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7b159fc18d417980f57aef64cab3417ee6af70f8
>
> ?
> Is my reading right, that a solitary transient ENOSPC is still reported
> at close(), but a sequence of two or more ENOSPC is reported in the next
> write() call?

I don't think so.
After an error report is received by the client, the next write/fsync/close
will report the error (even if that write didn't actually have an error).

Also, further writes will be attempted synchronously, so the correct error
is reported, until a write succeeds. At this point we go back to
async writes.

So a solitary transient ENOSPC will be reported against a subsequent
write.

When a write fails, the page remains DIRTY, so a flush will be attempted
on every subsequent 'sync', including those triggered by a write while
the error flag is set.
So if you keep writing, you will keep getting an error until all dirty
pages have been safely written to the server.

If you give up and close the file, you won't be able to tell just by
looking at the error codes which pages were successfully written and which
aren't. But it would seem unwise to expect to be able to do that in
any case.

NeilBrown