2018-04-02 17:57:35

by Orion Poplawski

[permalink] [raw]
Subject: NFS troubles

I'm having a lot of trouble with NFS going out to lunch between my RHEL7
machines. Users cannot access files, and get errors like:

$ touch blah
touch: cannot touch ‘blah’: Input/output error

I'm attaching a pcap trace of the above touch during the problem. It seems
that the server is returning NFS4ERR_EXPIRED.

Reboots/restarts of nfs help for a bit but then the problems return.

Other symptoms of trouble are messages like:

RPC: fragment too large: 613351424

on the client.

Any help with trying to track this down would be greatly appreciated.

--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 https://www.nwra.com/


Attachments:
nfs-bad.pcap (2.16 kB)

2018-04-02 18:30:31

by Benjamin Coddington

[permalink] [raw]
Subject: Re: NFS troubles

NFS4ERR_EXPIRED means the client is trying to use state that the server
believes to have expired or that has been supplanted by newer state.
Can we get kernel versions for the client and server? Have you talked
to your Red Hat support channel about this?

This capture doesn't appear to show any bugs or bad behaviors, but a
longer capture may..

Ben

On 2 Apr 2018, at 13:50, Orion Poplawski wrote:

> I'm having a lot of trouble with NFS going out to lunch between my
> RHEL7
> machines. Users cannot access files, and get errors like:
>
> $ touch blah
> touch: cannot touch ‘blah’: Input/output error
>
> I'm attaching a pcap trace of the above touch during the problem. It
> seems
> that the server is returning NFS4ERR_EXPIRED.
>
> Reboots/restarts of nfs help for a bit but then the problems return.
>
> Other symptoms of trouble are messages like:
>
> RPC: fragment too large: 613351424
>
> on the client.
>
> Any help with trying to track this down would be greatly appreciated.
>
> --
> Orion Poplawski
> Manager of NWRA Technical Systems 720-772-5637
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane [email protected]
> Boulder, CO 80301 https://www.nwra.com/

2018-04-03 15:44:07

by Orion Poplawski

[permalink] [raw]
Subject: Re: NFS troubles

Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these
systems.

I discovered that I'd been forcing vers=4.0 mounts in order to work around a
mounting issue. I'm moving back to the default 4.1 mounts as it seems to work
better for this issue. If the issue returns I'll try to grab a longer trace.

Thanks.

On 04/02/2018 12:30 PM, Benjamin Coddington wrote:
> NFS4ERR_EXPIRED means the client is trying to use state that the server
> believes to have expired or that has been supplanted by newer state.  Can we
> get kernel versions for the client and server?  Have you talked to your Red
> Hat support channel about this?
>
> This capture doesn't appear to show any bugs or bad behaviors, but a longer
> capture may..
>
> Ben
>
> On 2 Apr 2018, at 13:50, Orion Poplawski wrote:
>
>> I'm having a lot of trouble with NFS going out to lunch between my RHEL7
>> machines.  Users cannot access files, and get errors like:
>>
>> $ touch blah
>> touch: cannot touch ‘blah’: Input/output error
>>
>> I'm attaching a pcap trace of the above touch during the problem.  It seems
>> that the server is returning NFS4ERR_EXPIRED.
>>
>> Reboots/restarts of nfs help for a bit but then the problems return.
>>
>> Other symptoms of trouble are messages like:
>>
>> RPC: fragment too large: 613351424
>>
>> on the client.
>>
>> Any help with trying to track this down would be greatly appreciated.
>>
>> -- 
>> Orion Poplawski
>> Manager of NWRA Technical Systems          720-772-5637
>> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
>> 3380 Mitchell Lane                       [email protected]
>> Boulder, CO 80301                 https://www.nwra.com/


--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 https://www.nwra.com/

2018-04-04 14:08:07

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFS troubles

I wonder if this is the issue we ran into during the NFS bakeathon
testing last week. Problem was that a previous NFS4.0 mount left state
so that umount didn't actually unmount. Next mount only did a
PUTROOTFH and there was no SETCLIENTID, then any operations that tried
to use the clientid got ERR_EXPIRED. We will be trying to reproduce it
again and trying to fix it.

On Tue, Apr 3, 2018 at 11:44 AM, Orion Poplawski <[email protected]> wrote:
> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for th=
ese
> systems.
>
> I discovered that I'd been forcing vers=3D4.0 mounts in order to work aro=
und a
> mounting issue. I'm moving back to the default 4.1 mounts as it seems to=
work
> better for this issue. If the issue returns I'll try to grab a longer tr=
ace.
>
> Thanks.
>
> On 04/02/2018 12:30 PM, Benjamin Coddington wrote:
>> NFS4ERR_EXPIRED means the client is trying to use state that the server
>> believes to have expired or that has been supplanted by newer state. Ca=
n we
>> get kernel versions for the client and server? Have you talked to your =
Red
>> Hat support channel about this?
>>
>> This capture doesn't appear to show any bugs or bad behaviors, but a lon=
ger
>> capture may..
>>
>> Ben
>>
>> On 2 Apr 2018, at 13:50, Orion Poplawski wrote:
>>
>>> I'm having a lot of trouble with NFS going out to lunch between my RHEL=
7
>>> machines. Users cannot access files, and get errors like:
>>>
>>> $ touch blah
>>> touch: cannot touch =E2=80=98blah=E2=80=99: Input/output error
>>>
>>> I'm attaching a pcap trace of the above touch during the problem. It s=
eems
>>> that the server is returning NFS4ERR_EXPIRED.
>>>
>>> Reboots/restarts of nfs help for a bit but then the problems return.
>>>
>>> Other symptoms of trouble are messages like:
>>>
>>> RPC: fragment too large: 613351424
>>>
>>> on the client.
>>>
>>> Any help with trying to track this down would be greatly appreciated.
>>>
>>> --
>>> Orion Poplawski
>>> Manager of NWRA Technical Systems 720-772-5637
>>> NWRA, Boulder/CoRA Office FAX: 303-415-9702
>>> 3380 Mitchell Lane [email protected]
>>> Boulder, CO 80301 https://www.nwra.com/
>
>
> --
> Orion Poplawski
> Manager of NWRA Technical Systems 720-772-5637
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane [email protected]
> Boulder, CO 80301 https://www.nwra.com/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-04-06 16:07:57

by Orion Poplawski

[permalink] [raw]
Subject: Re: NFS troubles

On 04/03/2018 09:44 AM, Orion Poplawski wrote:
> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these
> systems.
>
> I discovered that I'd been forcing vers=4.0 mounts in order to work around a
> mounting issue.

And I'm back to seeing the mount issue at boot. Here's the situation - we're
forcing kerberos on the public network, but allowing sec=sys on some private
networks:

/etc/exports:
/ -ro,async,fsid=0 192.168.1.0/24(sec=sys)
192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
/export/home -rw,async,nohide 192.168.1.0/24(sec=sys)
192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)

So for a while after boot, attempts to mount with sec=sys fail:

# mount -t nfs4 -s -o
sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
earthib.cora.nwra.com:/export/home/greg /mnt
mount.nfs4: Operation not permitted

But then later they work:

# mount -t nfs4 -s -o
sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
earthib.cora.nwra.com:/export/home/greg /mnt
# umount /mnt

This can cycle back and forth.

I've attached a packet capture of some failed mount attempts. It seems that
even with specifying sec=sys, some kerberos stuff is going on.

It appears to be related to mounting a different sec=krb5 mount over the
public network from the same server. While that mount is active, the sec=sys
mounts fail. When it is unmounted, they work. At least now I think I can
work around this...

--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 https://www.nwra.com/


Attachments:
mount-fail.pcap (9.40 kB)

2018-04-06 16:24:21

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS troubles



> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
>=20
> On 04/03/2018 09:44 AM, Orion Poplawski wrote:
>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support =
for these
>> systems.
>>=20
>> I discovered that I'd been forcing vers=3D4.0 mounts in order to work =
around a
>> mounting issue. =20
>=20
> And I'm back to seeing the mount issue at boot. Here's the situation =
- we're
> forcing kerberos on the public network, but allowing sec=3Dsys on some =
private
> networks:
>=20
> /etc/exports:
> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys)
> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys)
> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>=20
> So for a while after boot, attempts to mount with sec=3Dsys fail:
>=20
> # mount -t nfs4 -s -o
> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
> earthib.cora.nwra.com:/export/home/greg /mnt
> mount.nfs4: Operation not permitted
>=20
> But then later they work:
>=20
> # mount -t nfs4 -s -o
> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
> earthib.cora.nwra.com:/export/home/greg /mnt
> # umount /mnt
>=20
> This can cycle back and forth.
>=20
> I've attached a packet capture of some failed mount attempts. It =
seems that
> even with specifying sec=3Dsys, some kerberos stuff is going on.
>=20
> It appears to be related to mounting a different sec=3Dkrb5 mount over =
the
> public network from the same server. While that mount is active, the =
sec=3Dsys
> mounts fail. When it is unmounted, they work. At least now I think I =
can
> work around this...

For NFSv4, the client is going to use krb5i to do lease management even
on sec=3Dsys mounts. An NFSv4 server has to know for sure when it is =
talking
to the same client on different network interfaces or with different
security flavors. Thus the client has to use the same security flavor =
for
lease management on all of its mounts of that server. That's not =
controlled
by the sec=3D mount option.

I assume that "but then later" lasts only a few multiples of the =
server's
lease time (90 seconds by default)?

Clients that use only the private network interface should be able to =
use
sec=3Dsys. But clients that use both the public and private interfaces =
should
need to use sec=3Dkrb5 on both.


--
Chuck Lever

2018-04-06 18:16:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS troubles

On Fri, Apr 06, 2018 at 12:24:21PM -0400, Chuck Lever wrote:
>
>
> > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
> >
> > On 04/03/2018 09:44 AM, Orion Poplawski wrote:
> >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these
> >> systems.
> >>
> >> I discovered that I'd been forcing vers=4.0 mounts in order to work around a
> >> mounting issue.
> >
> > And I'm back to seeing the mount issue at boot. Here's the situation - we're
> > forcing kerberos on the public network, but allowing sec=sys on some private
> > networks:
> >
> > /etc/exports:
> > / -ro,async,fsid=0 192.168.1.0/24(sec=sys)
> > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
> > /export/home -rw,async,nohide 192.168.1.0/24(sec=sys)
> > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
> >
> > So for a while after boot, attempts to mount with sec=sys fail:
> >
> > # mount -t nfs4 -s -o
> > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
> > earthib.cora.nwra.com:/export/home/greg /mnt
> > mount.nfs4: Operation not permitted
> >
> > But then later they work:
> >
> > # mount -t nfs4 -s -o
> > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
> > earthib.cora.nwra.com:/export/home/greg /mnt
> > # umount /mnt
> >
> > This can cycle back and forth.
> >
> > I've attached a packet capture of some failed mount attempts. It seems that
> > even with specifying sec=sys, some kerberos stuff is going on.
> >
> > It appears to be related to mounting a different sec=krb5 mount over the
> > public network from the same server. While that mount is active, the sec=sys
> > mounts fail. When it is unmounted, they work. At least now I think I can
> > work around this...
>
> For NFSv4, the client is going to use krb5i to do lease management even
> on sec=sys mounts. An NFSv4 server has to know for sure when it is talking
> to the same client on different network interfaces or with different
> security flavors. Thus the client has to use the same security flavor for
> lease management on all of its mounts of that server. That's not controlled
> by the sec= mount option.
>
> I assume that "but then later" lasts only a few multiples of the server's
> lease time (90 seconds by default)?
>
> Clients that use only the private network interface should be able to use
> sec=sys. But clients that use both the public and private interfaces should
> need to use sec=krb5 on both.

Are you saying that the behavior he's seeing is expected?

I'd expect sec=sys and sec=krb5 mounts to the same server to coexist and
both use krb5i to manage the (shared) lease state.

--b.

2018-04-06 18:18:46

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS troubles



> On Apr 6, 2018, at 2:16 PM, [email protected] wrote:
>=20
> On Fri, Apr 06, 2018 at 12:24:21PM -0400, Chuck Lever wrote:
>>=20
>>=20
>>> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
>>>=20
>>> On 04/03/2018 09:44 AM, Orion Poplawski wrote:
>>>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support =
for these
>>>> systems.
>>>>=20
>>>> I discovered that I'd been forcing vers=3D4.0 mounts in order to =
work around a
>>>> mounting issue. =20
>>>=20
>>> And I'm back to seeing the mount issue at boot. Here's the =
situation - we're
>>> forcing kerberos on the public network, but allowing sec=3Dsys on =
some private
>>> networks:
>>>=20
>>> /etc/exports:
>>> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys)
>>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>>> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys)
>>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>>>=20
>>> So for a while after boot, attempts to mount with sec=3Dsys fail:
>>>=20
>>> # mount -t nfs4 -s -o
>>> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
>>> earthib.cora.nwra.com:/export/home/greg /mnt
>>> mount.nfs4: Operation not permitted
>>>=20
>>> But then later they work:
>>>=20
>>> # mount -t nfs4 -s -o
>>> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
>>> earthib.cora.nwra.com:/export/home/greg /mnt
>>> # umount /mnt
>>>=20
>>> This can cycle back and forth.
>>>=20
>>> I've attached a packet capture of some failed mount attempts. It =
seems that
>>> even with specifying sec=3Dsys, some kerberos stuff is going on.
>>>=20
>>> It appears to be related to mounting a different sec=3Dkrb5 mount =
over the
>>> public network from the same server. While that mount is active, =
the sec=3Dsys
>>> mounts fail. When it is unmounted, they work. At least now I think =
I can
>>> work around this...
>>=20
>> For NFSv4, the client is going to use krb5i to do lease management =
even
>> on sec=3Dsys mounts. An NFSv4 server has to know for sure when it is =
talking
>> to the same client on different network interfaces or with different
>> security flavors. Thus the client has to use the same security flavor =
for
>> lease management on all of its mounts of that server. That's not =
controlled
>> by the sec=3D mount option.
>>=20
>> I assume that "but then later" lasts only a few multiples of the =
server's
>> lease time (90 seconds by default)?
>>=20
>> Clients that use only the private network interface should be able to =
use
>> sec=3Dsys. But clients that use both the public and private =
interfaces should
>> need to use sec=3Dkrb5 on both.
>=20
> Are you saying that the behavior he's seeing is expected?

I spoke without looking at the PCAP, perhaps I was hasty.


> I'd expect sec=3Dsys and sec=3Dkrb5 mounts to the same server to =
coexist and
> both use krb5i to manage the (shared) lease state.

Me too, if the NFS client's trunking detection is working.


--
Chuck Lever

2018-04-06 22:05:48

by Orion Poplawski

[permalink] [raw]
Subject: Re: NFS troubles

On 04/06/2018 10:24 AM, Chuck Lever wrote:
>
>
>> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
>>
>> On 04/03/2018 09:44 AM, Orion Poplawski wrote:
>>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these
>>> systems.
>>>
>>> I discovered that I'd been forcing vers=4.0 mounts in order to work around a
>>> mounting issue.
>>
>> And I'm back to seeing the mount issue at boot. Here's the situation - we're
>> forcing kerberos on the public network, but allowing sec=sys on some private
>> networks:
>>
>> /etc/exports:
>> / -ro,async,fsid=0 192.168.1.0/24(sec=sys)
>> 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
>> /export/home -rw,async,nohide 192.168.1.0/24(sec=sys)
>> 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
>>
>> So for a while after boot, attempts to mount with sec=sys fail:
>>
>> # mount -t nfs4 -s -o
>> sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
>> earthib.cora.nwra.com:/export/home/greg /mnt
>> mount.nfs4: Operation not permitted
>>
>> But then later they work:
>>
>> # mount -t nfs4 -s -o
>> sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
>> earthib.cora.nwra.com:/export/home/greg /mnt
>> # umount /mnt
>>
>> This can cycle back and forth.
>>
>> I've attached a packet capture of some failed mount attempts. It seems that
>> even with specifying sec=sys, some kerberos stuff is going on.
>>
>> It appears to be related to mounting a different sec=krb5 mount over the
>> public network from the same server. While that mount is active, the sec=sys
>> mounts fail. When it is unmounted, they work. At least now I think I can
>> work around this...
>
> For NFSv4, the client is going to use krb5i to do lease management even
> on sec=sys mounts. An NFSv4 server has to know for sure when it is talking
> to the same client on different network interfaces or with different
> security flavors. Thus the client has to use the same security flavor for
> lease management on all of its mounts of that server. That's not controlled
> by the sec= mount option.
>
> I assume that "but then later" lasts only a few multiples of the server's
> lease time (90 seconds by default)?
>
> Clients that use only the private network interface should be able to use
> sec=sys. But clients that use both the public and private interfaces should
> need to use sec=krb5 on both.

Testing again with RHEL 7.5 beta:
3.10.0-830.el7.x86_64
nfs-utils-1.3.0-0.52.el7.x86_64

Near as I can tell, with NFS version 4.1 mounts, as long as there is an active
sec=krb5 mount to a server, sec=sys mounts to that server fail (with the above
/etc/exports). As soon as I unmount the sec=krb5 mount, the sec=sys mount
works. And vice-versa - an active sec=sys mount prevents sec=krb5 mounts.
Waiting does not appear to help. The sporadic behavior noted before was due
to the use of the automounter - as mounts came and went different ones would
break.

If I use NFS version 4.0, I can have both. But then I get the locking bug in
my original post.

--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 https://www.nwra.com/

2018-04-07 00:15:40

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS troubles


> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
>=20
> On 04/03/2018 09:44 AM, Orion Poplawski wrote:
>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support =
for these
>> systems.
>>=20
>> I discovered that I'd been forcing vers=3D4.0 mounts in order to work =
around a
>> mounting issue. =20
>=20
> And I'm back to seeing the mount issue at boot. Here's the situation =
- we're
> forcing kerberos on the public network, but allowing sec=3Dsys on some =
private
> networks:
>=20
> /etc/exports:
> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys)
> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys)
> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>=20
> So for a while after boot, attempts to mount with sec=3Dsys fail:
>=20
> # mount -t nfs4 -s -o
> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
> earthib.cora.nwra.com:/export/home/greg /mnt
> mount.nfs4: Operation not permitted
>=20
> But then later they work:
>=20
> # mount -t nfs4 -s -o
> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
> earthib.cora.nwra.com:/export/home/greg /mnt
> # umount /mnt
>=20
> This can cycle back and forth.
>=20
> I've attached a packet capture of some failed mount attempts. It =
seems that
> even with specifying sec=3Dsys, some kerberos stuff is going on.

> It appears to be related to mounting a different sec=3Dkrb5 mount over =
the
> public network from the same server. While that mount is active, the =
sec=3Dsys
> mounts fail. When it is unmounted, they work. At least now I think I =
can
> work around this...

Bruce-

I examined the attached network capture. There are two attempts to do an
EXCHANGE_ID operation. Both times:

- a fresh GSS context is established successfully
- a fresh TCP connection is established by the client
- EXCHANGE_ID is sent using krb5i and the previously established GSS =
context
-- client owner verifier is 0x5ac794e81d0a1d81
-- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com"
-- state protection is SP4_MACH_CRED
- the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and =
MOVED_REFER flags are set
- the client destroys the GSS context
- the client closes the TCP connection


--
Chuck Lever




2018-04-07 02:46:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS troubles

On Fri, Apr 06, 2018 at 08:15:35PM -0400, Chuck Lever wrote:
>
> > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
> >
> > On 04/03/2018 09:44 AM, Orion Poplawski wrote:
> >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these
> >> systems.
> >>
> >> I discovered that I'd been forcing vers=4.0 mounts in order to work around a
> >> mounting issue.
> >
> > And I'm back to seeing the mount issue at boot. Here's the situation - we're
> > forcing kerberos on the public network, but allowing sec=sys on some private
> > networks:
> >
> > /etc/exports:
> > / -ro,async,fsid=0 192.168.1.0/24(sec=sys)
> > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
> > /export/home -rw,async,nohide 192.168.1.0/24(sec=sys)
> > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5)
> >
> > So for a while after boot, attempts to mount with sec=sys fail:
> >
> > # mount -t nfs4 -s -o
> > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
> > earthib.cora.nwra.com:/export/home/greg /mnt
> > mount.nfs4: Operation not permitted
> >
> > But then later they work:
> >
> > # mount -t nfs4 -s -o
> > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1
> > earthib.cora.nwra.com:/export/home/greg /mnt
> > # umount /mnt
> >
> > This can cycle back and forth.
> >
> > I've attached a packet capture of some failed mount attempts. It seems that
> > even with specifying sec=sys, some kerberos stuff is going on.
>
> > It appears to be related to mounting a different sec=krb5 mount over the
> > public network from the same server. While that mount is active, the sec=sys
> > mounts fail. When it is unmounted, they work. At least now I think I can
> > work around this...
>
> Bruce-
>
> I examined the attached network capture. There are two attempts to do an
> EXCHANGE_ID operation. Both times:
>
> - a fresh GSS context is established successfully
> - a fresh TCP connection is established by the client
> - EXCHANGE_ID is sent using krb5i and the previously established GSS context
> -- client owner verifier is 0x5ac794e81d0a1d81
> -- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com"
> -- state protection is SP4_MACH_CRED
> - the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and MOVED_REFER flags are set
> - the client destroys the GSS context
> - the client closes the TCP connection

Huh. If this is a second mount to the same server, it shouldn't need to
do another EXCHANGE_ID at all, should it? I suppose the trunking
detection code's being overzealous. Anyway, doesn't sound like the
trace tells us much. Sounds easy to reproduce, so maybe we just need to
try it and see where exactly the client code is failing.

--b.

2018-04-07 21:23:56

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS troubles



> On Apr 6, 2018, at 10:46 PM, Bruce Fields <[email protected]> =
wrote:
>=20
> On Fri, Apr 06, 2018 at 08:15:35PM -0400, Chuck Lever wrote:
>>=20
>>> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <[email protected]> wrote:
>>>=20
>>> On 04/03/2018 09:44 AM, Orion Poplawski wrote:
>>>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support =
for these
>>>> systems.
>>>>=20
>>>> I discovered that I'd been forcing vers=3D4.0 mounts in order to =
work around a
>>>> mounting issue. =20
>>>=20
>>> And I'm back to seeing the mount issue at boot. Here's the =
situation - we're
>>> forcing kerberos on the public network, but allowing sec=3Dsys on =
some private
>>> networks:
>>>=20
>>> /etc/exports:
>>> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys)
>>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>>> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys)
>>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5)
>>>=20
>>> So for a while after boot, attempts to mount with sec=3Dsys fail:
>>>=20
>>> # mount -t nfs4 -s -o
>>> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
>>> earthib.cora.nwra.com:/export/home/greg /mnt
>>> mount.nfs4: Operation not permitted
>>>=20
>>> But then later they work:
>>>=20
>>> # mount -t nfs4 -s -o
>>> =
sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv=
e,actimeo=3D1
>>> earthib.cora.nwra.com:/export/home/greg /mnt
>>> # umount /mnt
>>>=20
>>> This can cycle back and forth.
>>>=20
>>> I've attached a packet capture of some failed mount attempts. It =
seems that
>>> even with specifying sec=3Dsys, some kerberos stuff is going on.
>>=20
>>> It appears to be related to mounting a different sec=3Dkrb5 mount =
over the
>>> public network from the same server. While that mount is active, =
the sec=3Dsys
>>> mounts fail. When it is unmounted, they work. At least now I think =
I can
>>> work around this...
>>=20
>> Bruce-
>>=20
>> I examined the attached network capture. There are two attempts to do =
an
>> EXCHANGE_ID operation. Both times:
>>=20
>> - a fresh GSS context is established successfully
>> - a fresh TCP connection is established by the client
>> - EXCHANGE_ID is sent using krb5i and the previously established GSS =
context
>> -- client owner verifier is 0x5ac794e81d0a1d81
>> -- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com"
>> -- state protection is SP4_MACH_CRED
>> - the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and =
MOVED_REFER flags are set
>> - the client destroys the GSS context
>> - the client closes the TCP connection
>=20
> Huh. If this is a second mount to the same server, it shouldn't need =
to
> do another EXCHANGE_ID at all, should it?

The EXCHANGE_ID attempts are five seconds apart. It could be that there
were two separate mount attempts.


> I suppose the trunking
> detection code's being overzealous. Anyway, doesn't sound like the
> trace tells us much. Sounds easy to reproduce, so maybe we just need =
to
> try it and see where exactly the client code is failing.


--
Chuck Lever