2013-07-17 18:55:37

by Andre Heider

[permalink] [raw]
Subject: Error writing to nfs4 with 3.11-rc1

Hi,

I'm having problems using 3.11-rc1 as nfs4 client (with a FreeBSD 9.1
server) using sec=sys.

With the same server+client setup, just booting different kernels:
3.9.10 works without issues
3.10.1 works too, but introduced "RPC: AUTH_GSS upcall timed out." in
dmesg (iirc I don't need gss with sec=sys)
3.11-rc1 reading from the server still works, writing fails

Even a simple touch on the share fails with:
touch: cannot touch ‘/mnt/andre/test’: Input/output error

mount output of the share in question:
192.168.0.1:/home/andre on /mnt/andre type nfs4
(rw,nosuid,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.40,local_lock=none,addr=192.168.0.1)

The 3.11 .config was cp'ed from 3.10.1 and updated via `make
oldconfig`. Toggling the 3.11 nfs4.2 kernel options doesn't seem to
make a difference.

Any ideas?

Thanks,
Andre


2013-07-17 19:35:56

by Andre Heider

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

On Wed, Jul 17, 2013 at 9:10 PM, Chuck Lever <[email protected]> wrote:
>
> On Jul 17, 2013, at 2:55 PM, Andre Heider <[email protected]> wrote:
>
>> Hi,
>>
>> I'm having problems using 3.11-rc1 as nfs4 client (with a FreeBSD 9.1
>> server) using sec=sys.
>>
>> With the same server+client setup, just booting different kernels:
>> 3.9.10 works without issues
>> 3.10.1 works too, but introduced "RPC: AUTH_GSS upcall timed out." in
>> dmesg (iirc I don't need gss with sec=sys)
>
> Not a requirement, but running gssd should make that message go away. The client is attempting to use krb5i to manage its lease on the server, and falling back to AUTH_UNIX when it sees gssd is not running.
>
>> 3.11-rc1 reading from the server still works, writing fails
>>
>> Even a simple touch on the share fails with:
>> touch: cannot touch ‘/mnt/andre/test’: Input/output error
>
> A network capture is a reasonable place to start.
>
> # tcpdump -s0 -w /tmp/raw
>
> Then try your touch test again. Stop the tcpdump. You can post a compressed version of the raw dump here if it's short.

Attached two dumps, one from 3.10 (works) and one from 3.11 (doesn't work).

Thanks,
Andre


Attachments:
dump.tar.xz (1.00 kB)

2013-07-19 17:55:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

On Fri, Jul 19, 2013 at 12:33:11PM -0400, Chuck Lever wrote:
>
> On Jul 19, 2013, at 12:13 PM, "J. Bruce Fields" <[email protected]> wrote:
>
> > On Wed, Jul 17, 2013 at 04:54:59PM -0400, Chuck Lever wrote:
> >> By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.
> >
> > What exactly is screwed up?
>
> For example, if you undisclose all of the parsed elements of an OPEN reply, the parsing of the XDR is wrong and the following replies in the compound become unparsable "data".
>
> I've reproduced this for every installation of 1.10.0, Linux and Mac OS, that I've done.
>
> I'm looking at a trace where the client sends an unchecked create OPEN via this compound:
>
> PUTFH, OPEN, GETFH, ACCESS, GETATTR
>
> And here is the reply, as parsed by Wireshark. The replies after the OPEN are rendered as 216 bytes of unparsed "data". The Delegation Type is 2950154659. fattr4_mode looks wrong as well, the requested mode was 644.
>
> Close examination of the actual bytes in this reply shows that the reply sent by the server was correct XDR and a plausible result, and the client continues normal and correct operation.

Hm, that might be:

https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=8920

?

--b.

>
> Network File System, Ops(5): PUTFH OPEN
> [Program Version: 4]
> [V4 Procedure: COMPOUND (1)]
> Status: NFS4_OK (0)
> Tag: <EMPTY>
> length: 0
> contents: <EMPTY>
> Operations (count: 5)
> Opcode: PUTFH (22)
> Status: NFS4_OK (0)
> Opcode: OPEN (18)
> Status: NFS4_OK (0)
> stateid
> [StateID Hash: 0xaf36]
> seqid: 0x00000001
> Data: 01db0fb0ac64000000000000
> change_info
> Atomic: Yes
> changeid (before): 5901652787232948527
> changeid (after): 5901653255478300828
> results_flags: Unknown (0x00000006)
> .... .... .... .... .... .... .... ...0 = mlock: Unknown (0)
> .... .... .... .... .... .... .... ..1. = confirm: OPEN4_RESULT_MLOCK (1)
> Attr mask[0]: 0x00000010 (SIZE)
> reqd_attr: SIZE (4)
> size: 42949672960
> Attr mask[1]: 0x00000002 (MODE)
> reco_attr: MODE (33)
> fattr4_mode: 044
> .... .... .... .... 000. .... .... .... = Name: Unknown (0)
> .... .... .... .... .... 0... .... .... = Set user id on exec: No
> .... .... .... .... .... .0.. .... .... = Set group id on exec: No
> .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
> .... .... .... .... .... ...0 .... .... = Read permission for owner: No
> .... .... .... .... .... .... 0... .... = Write permission for owner: No
> .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
> .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
> .... .... .... .... .... .... ...0 .... = Write permission for group: No
> .... .... .... .... .... .... .... 0... = Execute permission for group: No
> .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
> .... .... .... .... .... .... .... ..0. = Write permission for others: No
> .... .... .... .... .... .... .... ...0 = Execute permission for others: No
> Delegation Type: Unknown (2950154659)
> [Main Opcode: OPEN (18)]
> Data (216 bytes)
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2013-07-24 14:26:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

On Fri, Jul 19, 2013 at 01:55:35PM -0400, J. Bruce Fields wrote:
> On Fri, Jul 19, 2013 at 12:33:11PM -0400, Chuck Lever wrote:
> >
> > On Jul 19, 2013, at 12:13 PM, "J. Bruce Fields" <[email protected]> wrote:
> >
> > > On Wed, Jul 17, 2013 at 04:54:59PM -0400, Chuck Lever wrote:
> > >> By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.
> > >
> > > What exactly is screwed up?
> >
> > For example, if you undisclose all of the parsed elements of an OPEN reply, the parsing of the XDR is wrong and the following replies in the compound become unparsable "data".
> >
> > I've reproduced this for every installation of 1.10.0, Linux and Mac OS, that I've done.
> >
> > I'm looking at a trace where the client sends an unchecked create OPEN via this compound:
> >
> > PUTFH, OPEN, GETFH, ACCESS, GETATTR
> >
> > And here is the reply, as parsed by Wireshark. The replies after the OPEN are rendered as 216 bytes of unparsed "data". The Delegation Type is 2950154659. fattr4_mode looks wrong as well, the requested mode was 644.
> >
> > Close examination of the actual bytes in this reply shows that the reply sent by the server was correct XDR and a plausible result, and the client continues normal and correct operation.
>
> Hm, that might be:
>
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=8920
>
> ?

Whoops, should have looked more closely: wireshark was right, the
attribute length was wrong--so probably newer wireshark needs to be
fixed to at least warn.

--b.

>
> --b.
>
> >
> > Network File System, Ops(5): PUTFH OPEN
> > [Program Version: 4]
> > [V4 Procedure: COMPOUND (1)]
> > Status: NFS4_OK (0)
> > Tag: <EMPTY>
> > length: 0
> > contents: <EMPTY>
> > Operations (count: 5)
> > Opcode: PUTFH (22)
> > Status: NFS4_OK (0)
> > Opcode: OPEN (18)
> > Status: NFS4_OK (0)
> > stateid
> > [StateID Hash: 0xaf36]
> > seqid: 0x00000001
> > Data: 01db0fb0ac64000000000000
> > change_info
> > Atomic: Yes
> > changeid (before): 5901652787232948527
> > changeid (after): 5901653255478300828
> > results_flags: Unknown (0x00000006)
> > .... .... .... .... .... .... .... ...0 = mlock: Unknown (0)
> > .... .... .... .... .... .... .... ..1. = confirm: OPEN4_RESULT_MLOCK (1)
> > Attr mask[0]: 0x00000010 (SIZE)
> > reqd_attr: SIZE (4)
> > size: 42949672960
> > Attr mask[1]: 0x00000002 (MODE)
> > reco_attr: MODE (33)
> > fattr4_mode: 044
> > .... .... .... .... 000. .... .... .... = Name: Unknown (0)
> > .... .... .... .... .... 0... .... .... = Set user id on exec: No
> > .... .... .... .... .... .0.. .... .... = Set group id on exec: No
> > .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
> > .... .... .... .... .... ...0 .... .... = Read permission for owner: No
> > .... .... .... .... .... .... 0... .... = Write permission for owner: No
> > .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
> > .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
> > .... .... .... .... .... .... ...0 .... = Write permission for group: No
> > .... .... .... .... .... .... .... 0... = Execute permission for group: No
> > .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
> > .... .... .... .... .... .... .... ..0. = Write permission for others: No
> > .... .... .... .... .... .... .... ...0 = Execute permission for others: No
> > Delegation Type: Unknown (2950154659)
> > [Main Opcode: OPEN (18)]
> > Data (216 bytes)
> >
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
> >
> >

2013-07-17 19:10:48

by Chuck Lever

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1


On Jul 17, 2013, at 2:55 PM, Andre Heider <[email protected]> wrote:

> Hi,
>
> I'm having problems using 3.11-rc1 as nfs4 client (with a FreeBSD 9.1
> server) using sec=sys.
>
> With the same server+client setup, just booting different kernels:
> 3.9.10 works without issues
> 3.10.1 works too, but introduced "RPC: AUTH_GSS upcall timed out." in
> dmesg (iirc I don't need gss with sec=sys)

Not a requirement, but running gssd should make that message go away. The client is attempting to use krb5i to manage its lease on the server, and falling back to AUTH_UNIX when it sees gssd is not running.

> 3.11-rc1 reading from the server still works, writing fails
>
> Even a simple touch on the share fails with:
> touch: cannot touch ?/mnt/andre/test?: Input/output error

A network capture is a reasonable place to start.

# tcpdump -s0 -w /tmp/raw

Then try your touch test again. Stop the tcpdump. You can post a compressed version of the raw dump here if it's short.

> mount output of the share in question:
> 192.168.0.1:/home/andre on /mnt/andre type nfs4
> (rw,nosuid,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.40,local_lock=none,addr=192.168.0.1)
>
> The 3.11 .config was cp'ed from 3.10.1 and updated via `make
> oldconfig`. Toggling the 3.11 nfs4.2 kernel options doesn't seem to
> make a difference.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-07-17 21:59:47

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

T24gV2VkLCAyMDEzLTA3LTE3IGF0IDIxOjM1ICswMjAwLCBBbmRyZSBIZWlkZXIgd3JvdGU6DQo+
IE9uIFdlZCwgSnVsIDE3LCAyMDEzIGF0IDk6MTAgUE0sIENodWNrIExldmVyIDxjaHVjay5sZXZl
ckBvcmFjbGUuY29tPiB3cm90ZToNCj4gPg0KPiA+IE9uIEp1bCAxNywgMjAxMywgYXQgMjo1NSBQ
TSwgQW5kcmUgSGVpZGVyIDxhLmhlaWRlckBnbWFpbC5jb20+IHdyb3RlOg0KPiA+DQo+ID4+IEhp
LA0KPiA+Pg0KPiA+PiBJJ20gaGF2aW5nIHByb2JsZW1zIHVzaW5nIDMuMTEtcmMxIGFzIG5mczQg
Y2xpZW50ICh3aXRoIGEgRnJlZUJTRCA5LjENCj4gPj4gc2VydmVyKSB1c2luZyBzZWM9c3lzLg0K
PiA+Pg0KPiA+PiBXaXRoIHRoZSBzYW1lIHNlcnZlcitjbGllbnQgc2V0dXAsIGp1c3QgYm9vdGlu
ZyBkaWZmZXJlbnQga2VybmVsczoNCj4gPj4gMy45LjEwIHdvcmtzIHdpdGhvdXQgaXNzdWVzDQo+
ID4+IDMuMTAuMSB3b3JrcyB0b28sIGJ1dCBpbnRyb2R1Y2VkICJSUEM6IEFVVEhfR1NTIHVwY2Fs
bCB0aW1lZCBvdXQuIiBpbg0KPiA+PiBkbWVzZyAoaWlyYyBJIGRvbid0IG5lZWQgZ3NzIHdpdGgg
c2VjPXN5cykNCj4gPg0KPiA+IE5vdCBhIHJlcXVpcmVtZW50LCBidXQgcnVubmluZyBnc3NkIHNo
b3VsZCBtYWtlIHRoYXQgbWVzc2FnZSBnbyBhd2F5LiAgVGhlIGNsaWVudCBpcyBhdHRlbXB0aW5n
IHRvIHVzZSBrcmI1aSB0byBtYW5hZ2UgaXRzIGxlYXNlIG9uIHRoZSBzZXJ2ZXIsIGFuZCBmYWxs
aW5nIGJhY2sgdG8gQVVUSF9VTklYIHdoZW4gaXQgc2VlcyBnc3NkIGlzIG5vdCBydW5uaW5nLg0K
PiA+DQo+ID4+IDMuMTEtcmMxIHJlYWRpbmcgZnJvbSB0aGUgc2VydmVyIHN0aWxsIHdvcmtzLCB3
cml0aW5nIGZhaWxzDQo+ID4+DQo+ID4+IEV2ZW4gYSBzaW1wbGUgdG91Y2ggb24gdGhlIHNoYXJl
IGZhaWxzIHdpdGg6DQo+ID4+IHRvdWNoOiBjYW5ub3QgdG91Y2gg4oCYL21udC9hbmRyZS90ZXN0
4oCZOiBJbnB1dC9vdXRwdXQgZXJyb3INCj4gPg0KPiA+IEEgbmV0d29yayBjYXB0dXJlIGlzIGEg
cmVhc29uYWJsZSBwbGFjZSB0byBzdGFydC4NCj4gPg0KPiA+ICAgIyB0Y3BkdW1wIC1zMCAtdyAv
dG1wL3Jhdw0KPiA+DQo+ID4gVGhlbiB0cnkgeW91ciB0b3VjaCB0ZXN0IGFnYWluLiAgU3RvcCB0
aGUgdGNwZHVtcC4gIFlvdSBjYW4gcG9zdCBhIGNvbXByZXNzZWQgdmVyc2lvbiBvZiB0aGUgcmF3
IGR1bXAgaGVyZSBpZiBpdCdzIHNob3J0Lg0KPiANCj4gQXR0YWNoZWQgdHdvIGR1bXBzLCBvbmUg
ZnJvbSAzLjEwICh3b3JrcykgYW5kIG9uZSBmcm9tIDMuMTEgKGRvZXNuJ3Qgd29yaykuDQoNClRo
ZSBGcmVlQlNEIHNlcnZlciBpcyByZXR1cm5pbmcgTkZTNEVSUl9BVFRSTk9UU1VQUCB0byB0aGUg
T1BFTiwgZGVzcGl0ZQ0KdGhlIGZhY3QgdGhhdCB3ZSdyZSByZXF1ZXN0aW5nIHRoZSBzYW1lIGF0
dHJpYnV0ZXMgaW4gYm90aCBjYXNlcy4gSSdsbA0KYmV0IGl0IGlzIHRoZSBmYWN0IHRoYXQgd2Ug
c2VuZCBhIGJpdG1hcCBvZiAzIHdvcmRzIGluc3RlYWQgb2YgMi4NCg0KVGhpcyBpcyBhIHByb2Js
ZW06IHRoZSBMaW51eCBjbGllbnQgY2xlYXJseSBoYXMgdGhlIHNwZWMgb24gaXRzIHNpZGUsDQph
bmQgdGhlIEZyZWVCU0Qgc2VydmVyIGlzIHdyb25nIHRvIHJlamVjdCBhIDMgd29yZCBiaXRtYXAs
IGFzIGxvbmcgYXMNCndlJ3JlIG5vdCByZXF1ZXN0aW5nIGFueSBhY3R1YWwgYXR0cmlidXRlcyB0
aGF0IGl0IGRvZXNuJ3Qgc3VwcG9ydC4NCkFjY29yZGluZyB0byB0aGUgc3BlYywgd2UgY291bGQg
c2VuZCBhIGJpdG1hcCBvZiBhbnkgbGVuZ3RoIHdlIGxpa2UuDQoNCk9uIHRoZSBvdGhlciBoYW5k
LCB3ZSBoYXZlIGEgc2l0dWF0aW9uIHdoZXJlIHNvbWV0aGluZyB1c2VkIHRvIHdvcmssIGFuZA0K
bm93IGRvZXNuJ3QuIFBsZWFzZSBjaGVjayBvdXQgdGhlIDIgcGF0Y2hlcyBJIGp1c3Qgc2VudCBv
dXQsIGFuZCBzZWUgaWYNCnRoZXkgaGVscC4NCg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGlu
dXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFw
cC5jb20NCnd3dy5uZXRhcHAuY29tDQo=

2013-07-19 18:03:59

by Chuck Lever

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1


On Jul 19, 2013, at 1:55 PM, "J. Bruce Fields" <[email protected]> wrote:

> On Fri, Jul 19, 2013 at 12:33:11PM -0400, Chuck Lever wrote:
>>
>> On Jul 19, 2013, at 12:13 PM, "J. Bruce Fields" <[email protected]> wrote:
>>
>>> On Wed, Jul 17, 2013 at 04:54:59PM -0400, Chuck Lever wrote:
>>>> By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.
>>>
>>> What exactly is screwed up?
>>
>> For example, if you undisclose all of the parsed elements of an OPEN reply, the parsing of the XDR is wrong and the following replies in the compound become unparsable "data".
>>
>> I've reproduced this for every installation of 1.10.0, Linux and Mac OS, that I've done.
>>
>> I'm looking at a trace where the client sends an unchecked create OPEN via this compound:
>>
>> PUTFH, OPEN, GETFH, ACCESS, GETATTR
>>
>> And here is the reply, as parsed by Wireshark. The replies after the OPEN are rendered as 216 bytes of unparsed "data". The Delegation Type is 2950154659. fattr4_mode looks wrong as well, the requested mode was 644.
>>
>> Close examination of the actual bytes in this reply shows that the reply sent by the server was correct XDR and a plausible result, and the client continues normal and correct operation.
>
> Hm, that might be:
>
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=8920
>
> ?

Wow, that's darn recent. Could be.

> --b.
>
>>
>> Network File System, Ops(5): PUTFH OPEN
>> [Program Version: 4]
>> [V4 Procedure: COMPOUND (1)]
>> Status: NFS4_OK (0)
>> Tag: <EMPTY>
>> length: 0
>> contents: <EMPTY>
>> Operations (count: 5)
>> Opcode: PUTFH (22)
>> Status: NFS4_OK (0)
>> Opcode: OPEN (18)
>> Status: NFS4_OK (0)
>> stateid
>> [StateID Hash: 0xaf36]
>> seqid: 0x00000001
>> Data: 01db0fb0ac64000000000000
>> change_info
>> Atomic: Yes
>> changeid (before): 5901652787232948527
>> changeid (after): 5901653255478300828
>> results_flags: Unknown (0x00000006)
>> .... .... .... .... .... .... .... ...0 = mlock: Unknown (0)
>> .... .... .... .... .... .... .... ..1. = confirm: OPEN4_RESULT_MLOCK (1)
>> Attr mask[0]: 0x00000010 (SIZE)
>> reqd_attr: SIZE (4)
>> size: 42949672960
>> Attr mask[1]: 0x00000002 (MODE)
>> reco_attr: MODE (33)
>> fattr4_mode: 044
>> .... .... .... .... 000. .... .... .... = Name: Unknown (0)
>> .... .... .... .... .... 0... .... .... = Set user id on exec: No
>> .... .... .... .... .... .0.. .... .... = Set group id on exec: No
>> .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
>> .... .... .... .... .... ...0 .... .... = Read permission for owner: No
>> .... .... .... .... .... .... 0... .... = Write permission for owner: No
>> .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
>> .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
>> .... .... .... .... .... .... ...0 .... = Write permission for group: No
>> .... .... .... .... .... .... .... 0... = Execute permission for group: No
>> .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
>> .... .... .... .... .... .... .... ..0. = Write permission for others: No
>> .... .... .... .... .... .... .... ...0 = Execute permission for others: No
>> Delegation Type: Unknown (2950154659)
>> [Main Opcode: OPEN (18)]
>> Data (216 bytes)
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-07-17 20:55:07

by Chuck Lever

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1


On Jul 17, 2013, at 3:35 PM, Andre Heider <[email protected]> wrote:

> On Wed, Jul 17, 2013 at 9:10 PM, Chuck Lever <[email protected]> wrote:
>>
>> On Jul 17, 2013, at 2:55 PM, Andre Heider <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm having problems using 3.11-rc1 as nfs4 client (with a FreeBSD 9.1
>>> server) using sec=sys.
>>>
>>> With the same server+client setup, just booting different kernels:
>>> 3.9.10 works without issues
>>> 3.10.1 works too, but introduced "RPC: AUTH_GSS upcall timed out." in
>>> dmesg (iirc I don't need gss with sec=sys)
>>
>> Not a requirement, but running gssd should make that message go away. The client is attempting to use krb5i to manage its lease on the server, and falling back to AUTH_UNIX when it sees gssd is not running.
>>
>>> 3.11-rc1 reading from the server still works, writing fails
>>>
>>> Even a simple touch on the share fails with:
>>> touch: cannot touch ?/mnt/andre/test?: Input/output error
>>
>> A network capture is a reasonable place to start.
>>
>> # tcpdump -s0 -w /tmp/raw
>>
>> Then try your touch test again. Stop the tcpdump. You can post a compressed version of the raw dump here if it's short.
>
> Attached two dumps, one from 3.10 (works) and one from 3.11 (doesn't work).

Commit a09df2ca "NFSv4: Extend fattr bitmaps to support all 3 words", introduced in 3.11-rc1, changes the Linux OPEN operation to send a 3-word bitmask for the fattr4 data type. This was done to support NFSv4 minor version 2, which adds bits in the 3rd word.

Apparently FreeBSD servers support only 2-word fattr4 bitmasks...?

RFC 3530 defines the bitmask4 type as a variable length array of uint32_t. The minor versioning rules in 3530bis Chapter 11 say:

? Minor versions may append attributes to the bitmap4 that
represents sets of attributes and to the fattr4 that
represents sets of attribute values.

I don't see any other limit placed on the size of the fattr4's bitmask array in RFC 3530, other than the number of bits that are defined for fattr4. But I didn't look carefully.

By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-07-19 16:33:25

by Chuck Lever

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1


On Jul 19, 2013, at 12:13 PM, "J. Bruce Fields" <[email protected]> wrote:

> On Wed, Jul 17, 2013 at 04:54:59PM -0400, Chuck Lever wrote:
>> By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.
>
> What exactly is screwed up?

For example, if you undisclose all of the parsed elements of an OPEN reply, the parsing of the XDR is wrong and the following replies in the compound become unparsable "data".

I've reproduced this for every installation of 1.10.0, Linux and Mac OS, that I've done.

I'm looking at a trace where the client sends an unchecked create OPEN via this compound:

PUTFH, OPEN, GETFH, ACCESS, GETATTR

And here is the reply, as parsed by Wireshark. The replies after the OPEN are rendered as 216 bytes of unparsed "data". The Delegation Type is 2950154659. fattr4_mode looks wrong as well, the requested mode was 644.

Close examination of the actual bytes in this reply shows that the reply sent by the server was correct XDR and a plausible result, and the client continues normal and correct operation.

Network File System, Ops(5): PUTFH OPEN
[Program Version: 4]
[V4 Procedure: COMPOUND (1)]
Status: NFS4_OK (0)
Tag: <EMPTY>
length: 0
contents: <EMPTY>
Operations (count: 5)
Opcode: PUTFH (22)
Status: NFS4_OK (0)
Opcode: OPEN (18)
Status: NFS4_OK (0)
stateid
[StateID Hash: 0xaf36]
seqid: 0x00000001
Data: 01db0fb0ac64000000000000
change_info
Atomic: Yes
changeid (before): 5901652787232948527
changeid (after): 5901653255478300828
results_flags: Unknown (0x00000006)
.... .... .... .... .... .... .... ...0 = mlock: Unknown (0)
.... .... .... .... .... .... .... ..1. = confirm: OPEN4_RESULT_MLOCK (1)
Attr mask[0]: 0x00000010 (SIZE)
reqd_attr: SIZE (4)
size: 42949672960
Attr mask[1]: 0x00000002 (MODE)
reco_attr: MODE (33)
fattr4_mode: 044
.... .... .... .... 000. .... .... .... = Name: Unknown (0)
.... .... .... .... .... 0... .... .... = Set user id on exec: No
.... .... .... .... .... .0.. .... .... = Set group id on exec: No
.... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
.... .... .... .... .... ...0 .... .... = Read permission for owner: No
.... .... .... .... .... .... 0... .... = Write permission for owner: No
.... .... .... .... .... .... .0.. .... = Execute permission for owner: No
.... .... .... .... .... .... ..1. .... = Read permission for group: Yes
.... .... .... .... .... .... ...0 .... = Write permission for group: No
.... .... .... .... .... .... .... 0... = Execute permission for group: No
.... .... .... .... .... .... .... .1.. = Read permission for others: Yes
.... .... .... .... .... .... .... ..0. = Write permission for others: No
.... .... .... .... .... .... .... ...0 = Execute permission for others: No
Delegation Type: Unknown (2950154659)
[Main Opcode: OPEN (18)]
Data (216 bytes)

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-07-19 16:13:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

On Wed, Jul 17, 2013 at 04:54:59PM -0400, Chuck Lever wrote:
> By the way, the NFSv4 OPEN request parsing in Wireshark 1.10.0 is totally screwed up. Has anyone reported this to the Wireshark community? Wireshark 1.8.8 appears to parse OPEN requests correctly, and is able to handle the 3-word bitmask correctly.

What exactly is screwed up?

For what it's worth, I took a quick look at the 3.11 trace with the
latest head from http://code.wireshark.org/git/wireshark and nothing
jumped out at me.

--b.

2013-07-18 14:57:44

by Andre Heider

[permalink] [raw]
Subject: Re: Error writing to nfs4 with 3.11-rc1

On Wed, Jul 17, 2013 at 11:59 PM, Myklebust, Trond
<[email protected]> wrote:
> On Wed, 2013-07-17 at 21:35 +0200, Andre Heider wrote:
>> On Wed, Jul 17, 2013 at 9:10 PM, Chuck Lever <[email protected]> wrote:
>> >
>> > On Jul 17, 2013, at 2:55 PM, Andre Heider <[email protected]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm having problems using 3.11-rc1 as nfs4 client (with a FreeBSD 9.1
>> >> server) using sec=sys.
>> >>
>> >> With the same server+client setup, just booting different kernels:
>> >> 3.9.10 works without issues
>> >> 3.10.1 works too, but introduced "RPC: AUTH_GSS upcall timed out." in
>> >> dmesg (iirc I don't need gss with sec=sys)
>> >
>> > Not a requirement, but running gssd should make that message go away. The client is attempting to use krb5i to manage its lease on the server, and falling back to AUTH_UNIX when it sees gssd is not running.
>> >
>> >> 3.11-rc1 reading from the server still works, writing fails
>> >>
>> >> Even a simple touch on the share fails with:
>> >> touch: cannot touch ‘/mnt/andre/test’: Input/output error
>> >
>> > A network capture is a reasonable place to start.
>> >
>> > # tcpdump -s0 -w /tmp/raw
>> >
>> > Then try your touch test again. Stop the tcpdump. You can post a compressed version of the raw dump here if it's short.
>>
>> Attached two dumps, one from 3.10 (works) and one from 3.11 (doesn't work).
>
> The FreeBSD server is returning NFS4ERR_ATTRNOTSUPP to the OPEN, despite
> the fact that we're requesting the same attributes in both cases. I'll
> bet it is the fact that we send a bitmap of 3 words instead of 2.
>
> This is a problem: the Linux client clearly has the spec on its side,
> and the FreeBSD server is wrong to reject a 3 word bitmap, as long as
> we're not requesting any actual attributes that it doesn't support.
> According to the spec, we could send a bitmap of any length we like.
>
> On the other hand, we have a situation where something used to work, and
> now doesn't. Please check out the 2 patches I just sent out, and see if
> they help.

With these 2 patches, nfsv4 connectivity to a FreeBSD server is working again.

Thanks for the quick patches,
Andre