2003-06-03 11:00:05

by Michael Frank

[permalink] [raw]
Subject: NFS io errors on transfer from system running 2.4 to system running 2.5

Speaking of weird errors:

For the last few months I encounter this:

When doing rsync or cp _from_ system running 2.4 _to_ system running 2.5
get Input/output error errors with random files.

- 2-5 > 2.4 is OK!
- SRebootting, swapping kernel causes the error on the system running 2.4
- Fast machine > slow machine or slow machine > fast machine
is no different
- Both systems run same distribution
- Encountered since 2.4.20 with about 2.5.64 (my first 2.5 kernel)

Example:

/temp contains a couple of crap files

system mhfl2 is running 2.5.6x to 2.5.70-mm3 mounted on
/mnt/mhfl2.

On system running 2.4.20 or 2.4.21-x:
while ((1)); do cp -f /temp/* /mnt/mhfl2/temp; done

cp: cannot create regular file `/mnt/mhfl2/temp/blah: Input/output error
cp: writing `/mnt/mhfl2/temp/blah: Input/output error

Errors are random, so the files change every run, sometimes there are no errors,
sometimes thre are 3 errors

Q? Any (in)compatibility reason or should I investigate further?

Regards
Michael Frank


2003-06-03 12:30:32

by Michael Frank

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tuesday 03 June 2003 20:24, Jakob Oestergaard wrote:
> > When doing rsync or cp _from_ system running 2.4 _to_ system running 2.5
> > get Input/output error errors with random files.
>
> Do you use soft mounts?

Yes

>
> If so, try hard instead. soft will fail, sooner or later.

I don't like hard mounts because these do not timeout.

Also, this does not explain why 2.5 > 2.4 (and 2.4 > 2.4) is OK - _never_ had any problem

Regards
Michael

2003-06-03 12:39:28

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tue, Jun 03, 2003 at 08:43:28PM +0800, Michael Frank wrote:
> On Tuesday 03 June 2003 20:24, Jakob Oestergaard wrote:
> > > When doing rsync or cp _from_ system running 2.4 _to_ system running 2.5
> > > get Input/output error errors with random files.
> >
> > Do you use soft mounts?
>
> Yes

Then this is why you get the error

>
> >
> > If so, try hard instead. soft will fail, sooner or later.
>
> I don't like hard mounts because these do not timeout.

You get what you ask for, then: timeouts

>
> Also, this does not explain why 2.5 > 2.4 (and 2.4 > 2.4) is OK - _never_ had any problem

Leave it running for a million years, and I'm sure a sporadic error will
show up in those two situations as well.

You just now found a case where sporadic errors show up more often.

soft-mount = fail upon (sporadic) error
hard-mount = retry (forever or until interrupted if used with 'intr') upon error

I always use hard,intr so that I can manually interrupt hanging jobs,
but also know that they do not randomly fail just because a few packets
get dropped on my network. This seems to be the common setup, as far as
I know.

Cheers,

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2003-06-03 12:48:24

by Michael Frank

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tuesday 03 June 2003 20:52, Jakob Oestergaard wrote:
>
> I always use hard,intr so that I can manually interrupt hanging jobs,
> but also know that they do not randomly fail just because a few packets
> get dropped on my network. This seems to be the common setup, as far as
> I know.
>

Thank you,

I will try hard, intr

Regards
Michael

2003-06-03 13:12:46

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tue, Jun 03, 2003 at 09:01:27PM +0800, Michael Frank wrote:
> On Tuesday 03 June 2003 20:52, Jakob Oestergaard wrote:
> >
> > I always use hard,intr so that I can manually interrupt hanging jobs,
> > but also know that they do not randomly fail just because a few packets
> > get dropped on my network. This seems to be the common setup, as far as
> > I know.
> >
>
> Thank you,
>
> I will try hard, intr

no prob.

Please let the list know if it solves your problem or not - I'm sure
there are people who want to know if it doesn't, and if it does then the
solution will be in the archives for the next to find.

After all, I could be mistaken... naaahh... ;)

Cheers,

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2003-06-03 14:29:43

by Andy

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tue, Jun 03, 2003 at 07:12:51PM +0800, Michael Frank wrote:
> Speaking of weird errors:
>
> For the last few months I encounter this:
>
> When doing rsync or cp _from_ system running 2.4 _to_ system running 2.5
> get Input/output error errors with random files.
>
> - Encountered since 2.4.20 with about 2.5.64 (my first 2.5 kernel)
>
I am having a similar problem writing to NFS mounted non-linux system on
kernels past 2.4.20-pre3. I get an input/output error while writing. I
have sent email to Trond Myklebust (who made the changes between pre3 and
pre4). And he said to switch to using the TCP protocol for mounts. That
worked, but I should not have to do that because

1. It worked to 2.4.20pre3 without a problem
2. Other OSes such as FreeBSD do not have issues writing to other OSes using
UDP soft mounts.

To me, there is something wrong with the changes that went in in 2.4.20pre4,
it should work as it does in pre3 and/or other unix OSes such as FreeBSD.
We should not have to work around the problem with hard links or using TCP
instead of UDP.

Andy

2003-06-03 14:56:28

by Michael Frank

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tuesday 03 June 2003 22:42, Andrew Ryan wrote:
> On Tue, Jun 03, 2003 at 07:12:51PM +0800, Michael Frank wrote:
> > Speaking of weird errors:
> >
> > For the last few months I encounter this:
> >
> > When doing rsync or cp _from_ system running 2.4 _to_ system running 2.5
> > get Input/output error errors with random files.
> >
> > - Encountered since 2.4.20 with about 2.5.64 (my first 2.5 kernel)
>
> I am having a similar problem writing to NFS mounted non-linux system on
> kernels past 2.4.20-pre3. I get an input/output error while writing. I
> have sent email to Trond Myklebust (who made the changes between pre3 and
> pre4). And he said to switch to using the TCP protocol for mounts. That
> worked, but I should not have to do that because
>
> 1. It worked to 2.4.20pre3 without a problem
> 2. Other OSes such as FreeBSD do not have issues writing to other OSes
> using UDP soft mounts.
>
> To me, there is something wrong with the changes that went in in
> 2.4.20pre4, it should work as it does in pre3 and/or other unix OSes such
> as FreeBSD. We should not have to work around the problem with hard links
> or using TCP instead of UDP.
>

Error frequency does not change between copying from a [email protected] > [email protected]
and [email protected] > [email protected]. IIRC, I had errors from a [email protected] to a [email protected]
too.

Even if I run them both against each other, the error only happens on 2.4 and
the frequency does not increase. This is not a simple timout problem.

I'll think it through and build some scripts so everone can reproduce it
and test it out.


Regards
Michael


2003-06-03 15:56:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

>>>>> " " == Andrew Ryan <[email protected]> writes:

> To me, there is something wrong with the changes that went in
> in 2.4.20pre4, it should work as it does in pre3 and/or other
> unix OSes such as FreeBSD. We should not have to work around
> the problem with hard links or using TCP instead of UDP.

Tough. 'soft' is not a priority of mine. It is a broken feature...

Cheers,
Trond

2003-06-03 16:22:11

by Michael Frank

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Wednesday 04 June 2003 00:10, Trond Myklebust wrote:
>
> Tough. 'soft' is not a priority of mine. It is a broken feature...

Well, a "hard" fact life is that it can't be "soft", at least we know where we stand...

Regards
Michael

2003-06-04 20:44:49

by Andy

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tue, Jun 03, 2003 at 06:10:01PM +0200, Trond Myklebust wrote:
>
> Tough. 'soft' is not a priority of mine. It is a broken feature...
>
No, it is a broken feature in *LINUX* post 2.4.20pre3, it's not broken in
FreeBSD or Tru64. Regardless of what Trond says about soft mounts they
should work in Linux just as well as they do in other OSes, such as FreeBSD.

I've tried to debug and I have seen no timeouts. I believe something is up
with the congestion routines that were added.

Yes, hard mounts work. But so soft ones. Linux should not have a
broken NFS.

Andy


2003-06-16 00:50:40

by Michael Frank

[permalink] [raw]
Subject: Re: NFS io errors on transfer from system running 2.4 to system running 2.5

On Tuesday 03 June 2003 21:26, Jakob Oestergaard wrote:
> On Tue, Jun 03, 2003 at 09:01:27PM +0800, Michael Frank wrote:
> > On Tuesday 03 June 2003 20:52, Jakob Oestergaard wrote:
> > > I always use hard,intr so that I can manually interrupt hanging jobs,
> > > but also know that they do not randomly fail just because a few packets
> > > get dropped on my network. This seems to be the common setup, as far
> > > as I know.
> >
> > Thank you,
> >
> > I will try hard, intr
>
> no prob.
>
> Please let the list know if it solves your problem or not - I'm sure
> there are people who want to know if it doesn't, and if it does then the
> solution will be in the archives for the next to find.
>
> After all, I could be mistaken... naaahh... ;)
>

If have tested mounting nfs partitions mode hard,intr and transfered
kernel bitkeeper repos between systems running combinations of recent
2.4 and 2.5 kernels, and also did bk resync and bk resolve via the network.

It is working dependably and I won't touch soft mounting mode again ...

Regards
Michael

--
Powered by linux-2.5.70-mm3, compiled with gcc-2.95-3 because it's rock solid

My current linux related activities in rough order of priority:
- Testing of Swsusp for 2.4
- Learning 2.5 kernel debugging with kgdb - it's in the -mm tree
- Studying 2.5 serial and ide drivers, ACPI, S3

The 2.5 kernel could use your usage. More info on setting up 2.5 kernel at
http://www.codemonkey.org.uk/post-halloween-2.5.txt