2021-11-07 13:11:33

by Rick Macklem

[permalink] [raw]
Subject: NFS client pNFS handling of NFS4ERR_NOSPC

Hi,

I ran a simple test using a Linux 5.12 client NFSv4.2 mount
against a FreeBSD pNFS server, where the DS is out of space
(intentionally, by creating a large file on it).

I tried to write a file on the Linux NFS client mount and the
mount point gets "stuck" (will not <ctrl>C nor "umount -f").
--> The client is attempting writes against the DS repeatedly,
with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
over and over and over again.)
--> The client is repeatedly sending RPCs with LayoutError in
them to the MDS, reporting the NFS4ERR_NOSPC.

I'll leave it up to others, but failing the program trying to
write the file with ENOSPC would seem preferable to the
"stuck" mount?
--> Removing the large file from the DS so that the Writes
can succeed does cause the client to recover.

rick


2021-11-07 13:16:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client pNFS handling of NFS4ERR_NOSPC

On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote:
> Hi,
>
> I ran a simple test using a Linux 5.12 client NFSv4.2 mount
> against a FreeBSD pNFS server, where the DS is out of space
> (intentionally, by creating a large file on it).
>
> I tried to write a file on the Linux NFS client mount and the
> mount point gets "stuck" (will not <ctrl>C nor "umount -f").
> --> The client is attempting writes against the DS repeatedly,
>        with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
>        over and over and over again.)
> --> The client is repeatedly sending RPCs with LayoutError in
>        them to the MDS, reporting the NFS4ERR_NOSPC.
>
> I'll leave it up to others, but failing the program trying to
> write the file with ENOSPC would seem preferable to the
> "stuck" mount?
> --> Removing the large file from the DS so that the Writes
>       can succeed does cause the client to recover.
>

The client expectation is that the MDS will either remedy the
situation, or it will return an appropriate application-level error to
the LAYOUTGET.

What we do not expect is for the client to have to handle DS level
errors.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-08 07:28:34

by Rick Macklem

[permalink] [raw]
Subject: Re: NFS client pNFS handling of NFS4ERR_NOSPC

Trond Myklebust wrote:
> On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote:
> > Hi,
> >
> > I ran a simple test using a Linux 5.12 client NFSv4.2 mount
> > against a FreeBSD pNFS server, where the DS is out of space
> > (intentionally, by creating a large file on it).
> >
> > I tried to write a file on the Linux NFS client mount and the
> > mount point gets "stuck" (will not <ctrl>C nor "umount -f").
> > --> The client is attempting writes against the DS repeatedly,
> > with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
> > over and over and over again.)
> > --> The client is repeatedly sending RPCs with LayoutError in
> > them to the MDS, reporting the NFS4ERR_NOSPC.
> >
> > I'll leave it up to others, but failing the program trying to
> > write the file with ENOSPC would seem preferable to the
> > "stuck" mount?
> > --> Removing the large file from the DS so that the Writes
> > can succeed does cause the client to recover.
> >
>
> The client expectation is that the MDS will either remedy the
> situation, or it will return an appropriate application-level error to
> the LAYOUTGET.
Thanks Trond, that worked fine for NFSv4.2. I tweaked the pNFS server
to reply NFS4ERR_NOSPC to LayoutGet and that worked fine.
(This is triggered by the LayoutError.)

For NFSv4.1, things don't work as well, since there is no LayoutError
operation. The LayoutReturn has the NFS4ERR_NOSPC error in it,
but that doesn't happen until it finishes (which doesn't happen until
I free up space on the DS).
But I can live with only 4.2 working well. I can't be bothered endlessly
probing the DSs to see if they are out of space.

rick


What we do not expect is for the client to have to handle DS level
errors.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-08 07:29:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client pNFS handling of NFS4ERR_NOSPC

On Mon, 2021-11-08 at 02:27 +0000, Rick Macklem wrote:
> Trond Myklebust wrote:
> > On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote:
> > > Hi,
> > >
> > > I ran a simple test using a Linux 5.12 client NFSv4.2 mount
> > > against a FreeBSD pNFS server, where the DS is out of space
> > > (intentionally, by creating a large file on it).
> > >
> > > I tried to write a file on the Linux NFS client mount and the
> > > mount point gets "stuck" (will not <ctrl>C nor "umount -f").
> > > --> The client is attempting writes against the DS repeatedly,
> > >        with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
> > >        over and over and over again.)
> > > --> The client is repeatedly sending RPCs with LayoutError in
> > >        them to the MDS, reporting the NFS4ERR_NOSPC.
> > >
> > > I'll leave it up to others, but failing the program trying to
> > > write the file with ENOSPC would seem preferable to the
> > > "stuck" mount?
> > > --> Removing the large file from the DS so that the Writes
> > >       can succeed does cause the client to recover.
> > >
> >
> > The client expectation is that the MDS will either remedy the
> > situation, or it will return an appropriate application-level error
> > to
> > the LAYOUTGET.
> Thanks Trond, that worked fine for NFSv4.2. I tweaked the pNFS server
> to reply NFS4ERR_NOSPC to LayoutGet and that worked fine.
> (This is triggered by the LayoutError.)
>
> For NFSv4.1, things don't work as well, since there is no LayoutError
> operation. The LayoutReturn has the NFS4ERR_NOSPC error in it,
> but that doesn't happen until it finishes (which doesn't happen until
> I free up space on the DS).

Hmm... The ENOSPC error from the DS should in principle be marking the
layout for return. You're saying that the return isn't happening?

Does a newer client fix the issue?

> But I can live with only 4.2 working well. I can't be bothered
> endlessly
> probing the DSs to see if they are out of space.

Agreed. Your server should be able to rely on the layout error reports
from the client (either in LAYOUTERROR or in the LAYOUTRETURN) in order
to figure out when the DS might be out of space.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-08 20:32:15

by Rick Macklem

[permalink] [raw]
Subject: Re: NFS client pNFS handling of NFS4ERR_NOSPC

Trond Myklebust wrote:
> On Mon, 2021-11-08 at 02:27 +0000, Rick Macklem wrote:
> > Trond Myklebust wrote:
> > > On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote:
> > > > Hi,
> > > >
> > > > I ran a simple test using a Linux 5.12 client NFSv4.2 mount
> > > > against a FreeBSD pNFS server, where the DS is out of space
> > > > (intentionally, by creating a large file on it).
> > > >
> > > > I tried to write a file on the Linux NFS client mount and the
> > > > mount point gets "stuck" (will not <ctrl>C nor "umount -f").
> > > > --> The client is attempting writes against the DS repeatedly,
> > > > with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
> > > > over and over and over again.)
> > > > --> The client is repeatedly sending RPCs with LayoutError in
> > > > them to the MDS, reporting the NFS4ERR_NOSPC.
> > > >
> > > > I'll leave it up to others, but failing the program trying to
> > > > write the file with ENOSPC would seem preferable to the
> > > > "stuck" mount?
> > > > --> Removing the large file from the DS so that the Writes
> > > > can succeed does cause the client to recover.
> > > >
> > >
> > > The client expectation is that the MDS will either remedy the
> > > situation, or it will return an appropriate application-level error
> > > to
> > > the LAYOUTGET.
> > Thanks Trond, that worked fine for NFSv4.2. I tweaked the pNFS server
> > to reply NFS4ERR_NOSPC to LayoutGet and that worked fine.
> > (This is triggered by the LayoutError.)
> >
> > For NFSv4.1, things don't work as well, since there is no LayoutError
> > operation. The LayoutReturn has the NFS4ERR_NOSPC error in it,
> > but that doesn't happen until it finishes (which doesn't happen until
> > I free up space on the DS).
>
> Hmm... The ENOSPC error from the DS should in principle be marking the
> layout for return. You're saying that the return isn't happening?
Not until the end, after I have deleted the large file, so there is space on the
DS for the writes. It is in the same compound as Close.
The packet capture is here, in case you are interested:
https://people.freebsd.org/~rmacklem/linux-ds-out-of-space.pcap
(Taken at the MDS, so it doesn't show the DS RPCs, but they're just
a lot of writes that fail with NFS4ERR_NOSPC until near the end.)

If you look, you'll see it gets a layout for the entire file first,
then it repeatedly does LayoutGets that are a little weird.
- For 4K only, but always on for an offset that is an exact multiple
of 1Mbyte.
--> Then, once I free up space on the DS, it does the compound
that includes both Close and LayoutReturn (which has the
NFS4ERR_NOSPC error report in it).

> Does a newer client fix the issue?
This was 5.12. I'll build/test a newer kernel in the next couple of
days and report back (it's an old single core i386, so it takes a while;-).

rick

> But I can live with only 4.2 working well. I can't be bothered
> endlessly
> probing the DSs to see if they are out of space.

Agreed. Your server should be able to rely on the layout error reports
from the client (either in LAYOUTERROR or in the LAYOUTRETURN) in order
to figure out when the DS might be out of space.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-10 00:48:22

by Rick Macklem

[permalink] [raw]
Subject: Re: NFS client pNFS handling of NFS4ERR_NOSPC

Rick Macklem wrote:
> Trond Myklebust wrote:
> > On Mon, 2021-11-08 at 02:27 +0000, Rick Macklem wrote:
> > > Trond Myklebust wrote:
> > > > On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote:
> > > > > Hi,
> > > > >
> > > > > I ran a simple test using a Linux 5.12 client NFSv4.2 mount
> > > > > against a FreeBSD pNFS server, where the DS is out of space
> > > > > (intentionally, by creating a large file on it).
> > > > >
> > > > > I tried to write a file on the Linux NFS client mount and the
> > > > > mount point gets "stuck" (will not <ctrl>C nor "umount -f").
> > > > > --> The client is attempting writes against the DS repeatedly,
> > > > > with the DS replying NFS4ERR_NOSPC. (Same byte offsets,
> > > > > over and over and over again.)
> > > > > --> The client is repeatedly sending RPCs with LayoutError in
> > > > > them to the MDS, reporting the NFS4ERR_NOSPC.
> > > > >
> > > > > I'll leave it up to others, but failing the program trying to
> > > > > write the file with ENOSPC would seem preferable to the
> > > > > "stuck" mount?
> > > > > --> Removing the large file from the DS so that the Writes
> > > > > can succeed does cause the client to recover.
> > > > >
> > > >
> > > > The client expectation is that the MDS will either remedy the
> > > > situation, or it will return an appropriate application-level error
> > > > to
> > > > the LAYOUTGET.
> > > Thanks Trond, that worked fine for NFSv4.2. I tweaked the pNFS server
> > > to reply NFS4ERR_NOSPC to LayoutGet and that worked fine.
> > > (This is triggered by the LayoutError.)
> > >
> > > For NFSv4.1, things don't work as well, since there is no LayoutError
> > > operation. The LayoutReturn has the NFS4ERR_NOSPC error in it,
> > > but that doesn't happen until it finishes (which doesn't happen until
> > > I free up space on the DS).
> >
> > Hmm... The ENOSPC error from the DS should in principle be marking the
> > layout for return. You're saying that the return isn't happening?
> Not until the end, after I have deleted the large file, so there is space on the
> DS for the writes. It is in the same compound as Close.
> The packet capture is here, in case you are interested:
> https://people.freebsd.org/~rmacklem/linux-ds-out-of-space.pcap
> (Taken at the MDS, so it doesn't show the DS RPCs, but they're just
> a lot of writes that fail with NFS4ERR_NOSPC until near the end.)
>
> If you look, you'll see it gets a layout for the entire file first,
> then it repeatedly does LayoutGets that are a little weird.
> - For 4K only, but always on for an offset that is an exact multiple
> of 1Mbyte.
> --> Then, once I free up space on the DS, it does the compound
> that includes both Close and LayoutReturn (which has the
> NFS4ERR_NOSPC error report in it).
>
> > Does a newer client fix the issue?
> This was 5.12. I'll build/test a newer kernel in the next couple of
> days and report back (it's an old single core i386, so it takes a while;-).
5.15.1 exhibits the same behaviour. The only difference is that LayoutReturn
was in a separate RPC from Close, but still didn't happen until the
end, after I free'd up space on the DS and the writes to the DS
succeeded. (This time I had delegations enabled, which might be why
the LayoutReturn wasn't in the same compound RPC as Close?)

rick

> rick
>
> > > But I can live with only 4.2 working well. I can't be bothered
> > > endlessly
> > > probing the DSs to see if they are out of space.
>
> > Agreed. Your server should be able to rely on the layout error reports
> > from the client (either in LAYOUTERROR or in the LAYOUTRETURN) in order
> > to figure out when the DS might be out of space.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]