Hi Scott, Trond,
Commit ce368536dd614452407dc31e2449eb84681a06af ("nfs: nfs_file_write()
should check for writeback errors") seems to have affected NFS v3 soft
mount behavior, causing applications to fail on a slow band connection
with a properly functioning server. I checked this with recent Linux
5.10-rc5, and on 5.8.18 to where this commit is backported.
Question: while the NFS v4 protocol talks about a soft mount timeout
behavior at "RFC7530 section 3.1.1" (see reference and patchset
addressing it in [1]), is it valid to assume that a similar guarantee
for NFS v3 soft mounts is expected?
The reason why it is important, is because the fulfilment of this
guarantee seemed to have changed with this recent patch.
Details on reproduction - using the following mount option:
vers=3,rsize=1048576,wsize=1048576,soft,proto=tcp,timeo=50,retrans=16
This is done along with rate limiting on the outgoing interface:
tc qdisc add dev eth0 root tbf rate 4000kbit latency 1ms burst 1540
And performing following parallel work on the mountpoint:
for i in `seq 1 100` ; do (dd if=/dev/zero of=x$i &) ; done
Result is that EIOs are returned to `dd`, whereas without this commit
the IOs simply performed slowly, and no errors observed by dd. It
appears in traces that the NFS layer is doing the retries.
[1] https://patchwork.kernel.org/project/linux-nfs/cover/[email protected]/
--
Dan Aloni
On Thu, 2020-11-26 at 12:47 +0200, Dan Aloni wrote:
> Hi Scott, Trond,
>
> Commit ce368536dd614452407dc31e2449eb84681a06af ("nfs:
> nfs_file_write()
> should check for writeback errors") seems to have affected NFS v3
> soft
> mount behavior, causing applications to fail on a slow band
> connection
> with a properly functioning server. I checked this with recent Linux
> 5.10-rc5, and on 5.8.18 to where this commit is backported.
>
> Question: while the NFS v4 protocol talks about a soft mount timeout
> behavior at "RFC7530 section 3.1.1" (see reference and patchset
> addressing it in [1]), is it valid to assume that a similar guarantee
> for NFS v3 soft mounts is expected?
>
> The reason why it is important, is because the fulfilment of this
> guarantee seemed to have changed with this recent patch.
>
> Details on reproduction - using the following mount option:
>
>
> vers=3,rsize=1048576,wsize=1048576,soft,proto=tcp,timeo=50,retrans=16
Sorry, but those are completely silly timeo and retrans values for a
TCP connection. I see no reason why we should try to support them.
>
> This is done along with rate limiting on the outgoing interface:
>
> tc qdisc add dev eth0 root tbf rate 4000kbit latency 1ms burst
> 1540
>
> And performing following parallel work on the mountpoint:
>
> for i in `seq 1 100` ; do (dd if=/dev/zero of=x$i &) ; done
>
> Result is that EIOs are returned to `dd`, whereas without this commit
> the IOs simply performed slowly, and no errors observed by dd. It
> appears in traces that the NFS layer is doing the retries.
>
> [1]
> https://patchwork.kernel.org/project/linux-nfs/cover/[email protected]/
>
Yes. If you artificially create congestion by telling the client to
keep resending all your outstanding data every 5 seconds, then it is
trivial to set up this kind of situation. That has always been the
case, and the patch you point to has nothing to do with this.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Thu, 2020-11-26 at 08:48 -0500, Trond Myklebust wrote:
> On Thu, 2020-11-26 at 12:47 +0200, Dan Aloni wrote:
> > Hi Scott, Trond,
> >
> > Commit ce368536dd614452407dc31e2449eb84681a06af ("nfs:
> > nfs_file_write()
> > should check for writeback errors") seems to have affected NFS v3
> > soft
> > mount behavior, causing applications to fail on a slow band
> > connection
> > with a properly functioning server. I checked this with recent
> > Linux
> > 5.10-rc5, and on 5.8.18 to where this commit is backported.
> >
> > Question: while the NFS v4 protocol talks about a soft mount
> > timeout
> > behavior at "RFC7530 section 3.1.1" (see reference and patchset
> > addressing it in [1]), is it valid to assume that a similar
> > guarantee
> > for NFS v3 soft mounts is expected?
> >
> > The reason why it is important, is because the fulfilment of this
> > guarantee seemed to have changed with this recent patch.
> >
> > Details on reproduction - using the following mount option:
> >
> >
> > vers=3,rsize=1048576,wsize=1048576,soft,proto=tcp,timeo=50,retrans=
> > 16
>
> Sorry, but those are completely silly timeo and retrans values for a
> TCP connection. I see no reason why we should try to support them.
To clarify _why_ the values make no sense:
timeo=50 means "I expect that all my RPC requests are normally
processed by the server, and a reply will be sent within 5 seconds
whether or not the server is congested".
I suggest you look at your nfsiostats output to see if that kind of
latency expectancy is really warranted (look at the maximum latency
values).
retrans=16 means "however I expect my server to drop RPC requests so
often, that some requests need to be retransmitted 16 times in order to
compensate"
Dropping requests is typically something rare on a server. It can
happen when the server is congested, but usually that will also cause
the server to drop the connection as well. I suggest you check your
nfsstats on the server to see if that is really the case.
>
> >
> > This is done along with rate limiting on the outgoing interface:
> >
> > tc qdisc add dev eth0 root tbf rate 4000kbit latency 1ms burst
> > 1540
> >
> > And performing following parallel work on the mountpoint:
> >
> > for i in `seq 1 100` ; do (dd if=/dev/zero of=x$i &) ; done
> >
> > Result is that EIOs are returned to `dd`, whereas without this
> > commit
> > the IOs simply performed slowly, and no errors observed by dd. It
> > appears in traces that the NFS layer is doing the retries.
> >
> > [1]
> > https://patchwork.kernel.org/project/linux-nfs/cover/[email protected]/
> >
>
> Yes. If you artificially create congestion by telling the client to
> keep resending all your outstanding data every 5 seconds, then it is
> trivial to set up this kind of situation. That has always been the
> case, and the patch you point to has nothing to do with this.
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On Nov 26, 2020, at 8:48 AM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2020-11-26 at 12:47 +0200, Dan Aloni wrote:
>> Hi Scott, Trond,
>>
>> Commit ce368536dd614452407dc31e2449eb84681a06af ("nfs:
>> nfs_file_write()
>> should check for writeback errors") seems to have affected NFS v3
>> soft
>> mount behavior, causing applications to fail on a slow band
>> connection
>> with a properly functioning server. I checked this with recent Linux
>> 5.10-rc5, and on 5.8.18 to where this commit is backported.
>>
>> Question: while the NFS v4 protocol talks about a soft mount timeout
>> behavior at "RFC7530 section 3.1.1" (see reference and patchset
>> addressing it in [1]), is it valid to assume that a similar guarantee
>> for NFS v3 soft mounts is expected?
>>
>> The reason why it is important, is because the fulfilment of this
>> guarantee seemed to have changed with this recent patch.
>>
>> Details on reproduction - using the following mount option:
>>
>>
>> vers=3,rsize=1048576,wsize=1048576,soft,proto=tcp,timeo=50,retrans=16
>
> Sorry, but those are completely silly timeo and retrans values for a
> TCP connection. I see no reason why we should try to support them.
Indeed. Is there a reason to allow administrators to set these values?
>> This is done along with rate limiting on the outgoing interface:
>>
>> tc qdisc add dev eth0 root tbf rate 4000kbit latency 1ms burst
>> 1540
>>
>> And performing following parallel work on the mountpoint:
>>
>> for i in `seq 1 100` ; do (dd if=/dev/zero of=x$i &) ; done
>>
>> Result is that EIOs are returned to `dd`, whereas without this commit
>> the IOs simply performed slowly, and no errors observed by dd. It
>> appears in traces that the NFS layer is doing the retries.
>>
>> [1]
>> https://patchwork.kernel.org/project/linux-nfs/cover/[email protected]/
>>
>
> Yes. If you artificially create congestion by telling the client to
> keep resending all your outstanding data every 5 seconds, then it is
> trivial to set up this kind of situation. That has always been the
> case, and the patch you point to has nothing to do with this.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
--
Chuck Lever
On Thu, Nov 26, 2020 at 01:48:23PM +0000, Trond Myklebust wrote:
> On Thu, 2020-11-26 at 12:47 +0200, Dan Aloni wrote:
> > Hi Scott, Trond,
> >
> > Commit ce368536dd614452407dc31e2449eb84681a06af ("nfs:
> > nfs_file_write()
> > should check for writeback errors") seems to have affected NFS v3
> > soft
> > mount behavior, causing applications to fail on a slow band
> > connection
> > with a properly functioning server. I checked this with recent Linux
> > 5.10-rc5, and on 5.8.18 to where this commit is backported.
> >
> > Question: while the NFS v4 protocol talks about a soft mount timeout
> > behavior at "RFC7530 section 3.1.1" (see reference and patchset
> > addressing it in [1]), is it valid to assume that a similar guarantee
> > for NFS v3 soft mounts is expected?
> >
> > The reason why it is important, is because the fulfilment of this
> > guarantee seemed to have changed with this recent patch.
> >
> > Details on reproduction - using the following mount option:
> >
> > ???
> > vers=3,rsize=1048576,wsize=1048576,soft,proto=tcp,timeo=50,retrans=16
>
> Sorry, but those are completely silly timeo and retrans values for a
> TCP connection. I see no reason why we should try to support them.
The same issue is reproducible with a similar majortimeo effect, for
example timeo=400,retrans=1.
Now looking under `/sys/kernel/debug`, what I see is an accumulation of
RPC tasks that are ready to transmit, by the thousands, and so if the
outgoing throughput constraint is such that the amount of WRITE backlog
is bigger than what is possible to transmit in the time frame of the
majortimeo, the tasks end with EIO. This may sound contrived, but it is
achievable with network interfaces of regular throughput, given enough
writers.
This was not the case prior to Linux v5.1, according to my observation -
with the older sunrpc implementation, these tasks would have waited
under 'reserved' state, not incurring a timeout calculation on them at
all, and the behavior was that tasks move to the transmit stage and
start counting down to a timeout only when there's write space on the
socket that allows to transmit them.
I looked around and saw that many vendors are recommending to change the
`sunrpc.tcp_max_slot_table_entries` sysctl to 128 down from 65536. This
has the effect that the transmit queue would be small instead of growing
to the tens of thousands of tasks, keeping the remaining tasks in the
backlog without failure. With the older SunRPC, the 65536 maximum did
not matter due to write space restriction, which 'naturally' did that.
And indeed, the lower setting is able to fix the issue I originally
addressed and help to retain the old behavior, where soft mount's goal
(at least in my case) is to detect EIOs that are stuck at the server
rather than at the client.
--
Dan Aloni