LinuxLists.cc - Kerberized NFS client and slow user write performance

2016-10-07 16:43:13

Subject: Kerberized NFS client and slow user write performance

We seem to be increasingly hit by this bug:

https://access.redhat.com/solutions/2040223
"On RHEL 6 NFS client usring kerberos (krb5), one user experiences
slow write performance, another does not"

You need a RH subscription to see that in its entirety. But the
subject basically says it all: randomly, one or more users will be
subjected to *terrible* NFS write performance that persists until
reboot.

There is a root cause shown, but that is cryptic to non-kernel devs;
it doesn't explain from a user perspective what triggers this state.
(That's why it appears to be random to me.)

There is no solution or workaround given. This appears to be on a
per-user + per-server basis, so a crude workaround is to migrate the
user to a different server. And we do regular reboots, which somewhat
hides the problem.

Does this bug also exist in upstream (i.e. non distro specific Linux
NFS code)? If so, is there any more detail on it, and/or a fix?

Thanks,
Matt

2016-10-07 17:06:37

by Benjamin Coddington

[permalink] [raw]

Subject: Re: Kerberized NFS client and slow user write performance

On 7 Oct 2016, at 12:43, Matt Garman wrote:

> We seem to be increasingly hit by this bug:
>
> https://access.redhat.com/solutions/2040223
> "On RHEL 6 NFS client usring kerberos (krb5), one user experiences
> slow write performance, another does not"
>
> You need a RH subscription to see that in its entirety. But the
> subject basically says it all: randomly, one or more users will be
> subjected to *terrible* NFS write performance that persists until
> reboot.
>
> There is a root cause shown, but that is cryptic to non-kernel devs;
> it doesn't explain from a user perspective what triggers this state.
> (That's why it appears to be random to me.)
>
> There is no solution or workaround given. This appears to be on a
> per-user + per-server basis, so a crude workaround is to migrate the
> user to a different server. And we do regular reboots, which somewhat
> hides the problem.
>
> Does this bug also exist in upstream (i.e. non distro specific Linux
> NFS code)? If so, is there any more detail on it, and/or a fix?

Hi Matt, a fix for this problem went in upstream:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ce52914eb76efd62aa48d738cf845b37852bf920
which landed in 4.8.

I expect this will get fixed in RHEL6 and RHEL7 shortly, though I don't
have
any BZ numbers handy.. let me know directly if you need them.

Ben