2008-06-06 01:01:12

by Ricardo Labiaga

Subject: Re: [NFS] I/O Errors with hard mounts

You have a=A0significant number of dropped connections, as indicated by=
the high EAGAIN count.
I wouldn't be surprised if the 2.6.16 kernel isn't handling the reconne=
ction correctly and propagating
EIO to the application.=A0 There's=A0been a fair amount of client side =
work in the RPC=A0reconnection=20
code=A0recently .=A0 Can you try with a recent kernel?
A network trace and rpcdebug output would be invaluable when you're abl=
e to reproduce this.
- ricardo
On Wed, Jun 4, 2008 at 3:45 PM, Ricardo Labiaga <[email protected]> wro=
>> Does /var/log/messages show any errors around the same time?=A0=20
>> In addition to the network trace and rpcdebug on the client, take a =
look at "nfsstat -d" on the filer.=20
>>=A0Is the filer dropping the connection?=A0 Look for "dropped with EA=
GAIN" or "dropped from vol offline"=20
>> in the output.=A0 This will help narrow down the problem.
> So, sometimes when somebody deletes a lot of data (like the problem w=
> just observed),
> the deleting host, and often other hosts, do report=A0 'filer not
> responding' in the logs.
> However, operations that aren't happening in the delete dir, tend to
> work just fine (for example, iozone could be running and doing pretty
> well)).=A0 Further, the most recent time this happened, the host didn=
> report filer not responding.
> This is the only EAGAN reference I see:
> assist queue (queued, split mbufs, drop for EAGAIN) =3D (0, 64478612,=
> Dave


