2016-09-15 22:32:19

by Ben Greear

[permalink] [raw]
Subject: nfs broken on Fedora-24, 32-bit?

I have a Fedora-24 machine mounting an NFS server running Fedora-13 (kernel 2.6.34.9-69.fc13.x86_64).

F24 machine has this in /etc/fstab:

192.168.100.3:/mnt/d2 /mnt/d2 nfs nfsvers=3 0 0

When I copy a file from f24-32 to the F-13 machine, the file size is the same,
but the file is corrupted on the file server. I see a different md5sum each time.

Various other systems (F21, F19, etc) can all copy to the F13 machine fine.

And, F24-64 machine can copy to the F13 machine fine.

Anyone seen something similar?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com



2016-09-15 23:00:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs broken on Fedora-24, 32-bit?

Hi Ben,

> On Sep 15, 2016, at 18:32, Ben Greear <[email protected]> wrote:
>=20
> I have a Fedora-24 machine mounting an NFS server running Fedora-13 (kern=
el 2.6.34.9-69.fc13.x86_64).
>=20
> F24 machine has this in /etc/fstab:
>=20
> 192.168.100.3:/mnt/d2 /mnt/d2 nfs nfsvers=3D3 =
0 0
>=20
> When I copy a file from f24-32 to the F-13 machine, the file size is the =
same,
> but the file is corrupted on the file server. I see a different md5sum e=
ach time.
>=20
> Various other systems (F21, F19, etc) can all copy to the F13 machine fin=
e.
>=20
> And, F24-64 machine can copy to the F13 machine fine.
>=20
> Anyone seen something similar?

Do you know if the corruption is happening on the read()s or on the write()=
s? Do you, for instance get the same corruption if you copy from a local fi=
le on the F-24 client to the server? ..or if you copy from a file on the se=
rver to a local directory on the F-24 client?

Cheers
Trond


2016-09-15 23:06:54

by Ben Greear

[permalink] [raw]
Subject: Re: nfs broken on Fedora-24, 32-bit?

On 09/15/2016 04:00 PM, Trond Myklebust wrote:
> Hi Ben,
>
>> On Sep 15, 2016, at 18:32, Ben Greear <[email protected]> wrote:
>>
>> I have a Fedora-24 machine mounting an NFS server running Fedora-13 (kernel 2.6.34.9-69.fc13.x86_64).
>>
>> F24 machine has this in /etc/fstab:
>>
>> 192.168.100.3:/mnt/d2 /mnt/d2 nfs nfsvers=3 0 0
>>
>> When I copy a file from f24-32 to the F-13 machine, the file size is the same,
>> but the file is corrupted on the file server. I see a different md5sum each time.
>>
>> Various other systems (F21, F19, etc) can all copy to the F13 machine fine.
>>
>> And, F24-64 machine can copy to the F13 machine fine.
>>
>> Anyone seen something similar?
>
> Do you know if the corruption is happening on the read()s or on the write()s? Do you, for instance get the same corruption if you copy from a local file on the F-24 client to the server? ..or if you copy from a file on the server to a local directory on the F-24 client?
>
> Cheers
> Trond
>

Seems to be a write issue:

# This is the nfs server:

[greearb@fs3 candela_cdrom.5.3.5]$ md5sum gua-f21-32
ad4073fa8b806bb82b85a645e21f5e67 gua-f21-32
[greearb@fs3 candela_cdrom.5.3.5]$ md5sum ../greearb/tmp/gua-f21-32
582bfea0cc8cc52aa38dc0f5048d0156 ../greearb/tmp/gua-f21-32
[greearb@fs3 candela_cdrom.5.3.5]$


# This is the v-f24-32 client:

greearb@v-f24-32 ~]$ cp /mnt/d2/pub/candela_cdrom.5.3.5/gua-f21-32 ./
[greearb@v-f24-32 ~]$ md5sum gua-f21-32
ad4073fa8b806bb82b85a645e21f5e67 gua-f21-32
[greearb@v-f24-32 ~]$ cp gua-f21-32 /mnt/d2/pub/greearb/tmp/
[greearb@v-f24-32 ~]$ md5sum /mnt/d2/pub/greearb/tmp/gua-f21-32
ad4073fa8b806bb82b85a645e21f5e67 /mnt/d2/pub/greearb/tmp/gua-f21-32


Interesting that the client reads back the file it copied over as if it were correct, but
it shows up wrong on the nfs server. Maybe it is just reading a local cache?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2016-09-16 23:31:55

by Ben Greear

[permalink] [raw]
Subject: Re: nfs broken on Fedora-24, 32-bit?

On 09/15/2016 04:06 PM, Ben Greear wrote:
> On 09/15/2016 04:00 PM, Trond Myklebust wrote:
>> Hi Ben,
>>
>>> On Sep 15, 2016, at 18:32, Ben Greear <[email protected]> wrote:
>>>
>>> I have a Fedora-24 machine mounting an NFS server running Fedora-13 (kernel 2.6.34.9-69.fc13.x86_64).
>>>
>>> F24 machine has this in /etc/fstab:
>>>
>>> 192.168.100.3:/mnt/d2 /mnt/d2 nfs nfsvers=3 0 0
>>>
>>> When I copy a file from f24-32 to the F-13 machine, the file size is the same,
>>> but the file is corrupted on the file server. I see a different md5sum each time.
>>>
>>> Various other systems (F21, F19, etc) can all copy to the F13 machine fine.
>>>
>>> And, F24-64 machine can copy to the F13 machine fine.
>>>
>>> Anyone seen something similar?
>>
>> Do you know if the corruption is happening on the read()s or on the write()s? Do you, for instance get the same corruption if you copy from a local file on
>> the F-24 client to the server? ..or if you copy from a file on the server to a local directory on the F-24 client?
>>
>> Cheers
>> Trond
>>
>
> Seems to be a write issue:
>
> # This is the nfs server:
>
> [greearb@fs3 candela_cdrom.5.3.5]$ md5sum gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67 gua-f21-32
> [greearb@fs3 candela_cdrom.5.3.5]$ md5sum ../greearb/tmp/gua-f21-32
> 582bfea0cc8cc52aa38dc0f5048d0156 ../greearb/tmp/gua-f21-32
> [greearb@fs3 candela_cdrom.5.3.5]$
>
>
> # This is the v-f24-32 client:
>
> greearb@v-f24-32 ~]$ cp /mnt/d2/pub/candela_cdrom.5.3.5/gua-f21-32 ./
> [greearb@v-f24-32 ~]$ md5sum gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67 gua-f21-32
> [greearb@v-f24-32 ~]$ cp gua-f21-32 /mnt/d2/pub/greearb/tmp/
> [greearb@v-f24-32 ~]$ md5sum /mnt/d2/pub/greearb/tmp/gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67 /mnt/d2/pub/greearb/tmp/gua-f21-32
>
>
> Interesting that the client reads back the file it copied over as if it were correct, but
> it shows up wrong on the nfs server. Maybe it is just reading a local cache?
>
> Thanks,
> Ben
>

Here is some more info on this:

We can only reproduce this on virtual machines using the KVM infrastructure, and only
when we use the rtl8139 virtual hardware (in bridge mode). With the e1000 virtual hardware
we cannot reproduce the problem.

Also, multiple different nfs servers (including much newer kernels) all show the same
behaviour with this broken nfs client.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com