2006-12-11 16:09:45

by Martin Knoblauch

[permalink] [raw]
Subject: [2.6.19] NFS: server error: fileid changed

Hi, [please CC me, as I am not subscribed]

after updating a RHEL4 box (EM64T based) to a plain 2.6.19 kernel, we
are seeing repeated occurences of the following messages (about every
45-50 minutes).

It is always the same server (a NetApp filer, mounted via the
user-space automounter "amd") and the expected/got numbers seem to
repeat.

Is there a way to find out which files are involved? Nothing seems to
be obviously breaking, but I do not like to get my logfiles filled up.

[ 9337.747546] NFS: server nvgm022 error: fileid changed
[ 9337.747549] fsid 0:25: expected fileid 0x7a6f3d, got 0x65be80
[ 9338.020427] NFS: server nvgm022 error: fileid changed
[ 9338.020430] fsid 0:25: expected fileid 0x15f5d7c, got 0x9f9900
[ 9338.070147] NFS: server nvgm022 error: fileid changed
[ 9338.070150] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9338.338896] NFS: server nvgm022 error: fileid changed
[ 9338.338899] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9338.370207] NFS: server nvgm022 error: fileid changed
[ 9338.370210] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9338.634437] NFS: server nvgm022 error: fileid changed
[ 9338.634439] fsid 0:25: expected fileid 0x7a6f3d, got 0x22070e
[ 9338.698383] NFS: server nvgm022 error: fileid changed
[ 9338.698385] fsid 0:25: expected fileid 0x7a6f3d, got 0x352777
[ 9338.949952] NFS: server nvgm022 error: fileid changed
[ 9338.949954] fsid 0:25: expected fileid 0x15f5d7c, got 0x5988c4
[ 9339.042473] NFS: server nvgm022 error: fileid changed
[ 9339.042476] fsid 0:25: expected fileid 0x7a6f3d, got 0x9f9900
[ 9339.267338] NFS: server nvgm022 error: fileid changed
[ 9339.267341] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9339.309921] NFS: server nvgm022 error: fileid changed
[ 9339.309923] fsid 0:25: expected fileid 0x15f5d7c, got 0x65be80
[ 9339.405146] NFS: server nvgm022 error: fileid changed
[ 9339.405149] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9339.433816] NFS: server nvgm022 error: fileid changed
[ 9339.433819] fsid 0:25: expected fileid 0x15f5d7c, got 0x65be80
[ 9340.149325] NFS: server nvgm022 error: fileid changed
[ 9340.149328] fsid 0:25: expected fileid 0x7a6f3d, got 0x19bc55
[ 9340.173278] NFS: server nvgm022 error: fileid changed
[ 9340.173281] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
[ 9340.324517] NFS: server nvgm022 error: fileid changed
[ 9340.324520] fsid 0:25: expected fileid 0x15f5d7c, got 0x11c9001

Thanks
Martin


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de


2006-12-11 17:23:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed

On Mon, 2006-12-11 at 08:09 -0800, Martin Knoblauch wrote:
> Hi, [please CC me, as I am not subscribed]
>
> after updating a RHEL4 box (EM64T based) to a plain 2.6.19 kernel, we
> are seeing repeated occurences of the following messages (about every
> 45-50 minutes).
>
> It is always the same server (a NetApp filer, mounted via the
> user-space automounter "amd") and the expected/got numbers seem to
> repeat.

Are you seeing it _without_ amd? The usual reason for the errors you see
are bogus replay cache replies. For that reason, the kernel is usually
very careful when initialising its value for the XID: we set part of it
using the clock value, and part of it using a random number generator.
I'm not so sure that other services are as careful.

> Is there a way to find out which files are involved? Nothing seems to
> be obviously breaking, but I do not like to get my logfiles filled up.

The fileid is the same as the inode number. Just convert those
hexadecimal values into ordinary numbers, then search for them using 'ls
-i'.

Trond

> [ 9337.747546] NFS: server nvgm022 error: fileid changed
> [ 9337.747549] fsid 0:25: expected fileid 0x7a6f3d, got 0x65be80
> [ 9338.020427] NFS: server nvgm022 error: fileid changed
> [ 9338.020430] fsid 0:25: expected fileid 0x15f5d7c, got 0x9f9900
> [ 9338.070147] NFS: server nvgm022 error: fileid changed
> [ 9338.070150] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9338.338896] NFS: server nvgm022 error: fileid changed
> [ 9338.338899] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9338.370207] NFS: server nvgm022 error: fileid changed
> [ 9338.370210] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9338.634437] NFS: server nvgm022 error: fileid changed
> [ 9338.634439] fsid 0:25: expected fileid 0x7a6f3d, got 0x22070e
> [ 9338.698383] NFS: server nvgm022 error: fileid changed
> [ 9338.698385] fsid 0:25: expected fileid 0x7a6f3d, got 0x352777
> [ 9338.949952] NFS: server nvgm022 error: fileid changed
> [ 9338.949954] fsid 0:25: expected fileid 0x15f5d7c, got 0x5988c4
> [ 9339.042473] NFS: server nvgm022 error: fileid changed
> [ 9339.042476] fsid 0:25: expected fileid 0x7a6f3d, got 0x9f9900
> [ 9339.267338] NFS: server nvgm022 error: fileid changed
> [ 9339.267341] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9339.309921] NFS: server nvgm022 error: fileid changed
> [ 9339.309923] fsid 0:25: expected fileid 0x15f5d7c, got 0x65be80
> [ 9339.405146] NFS: server nvgm022 error: fileid changed
> [ 9339.405149] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9339.433816] NFS: server nvgm022 error: fileid changed
> [ 9339.433819] fsid 0:25: expected fileid 0x15f5d7c, got 0x65be80
> [ 9340.149325] NFS: server nvgm022 error: fileid changed
> [ 9340.149328] fsid 0:25: expected fileid 0x7a6f3d, got 0x19bc55
> [ 9340.173278] NFS: server nvgm022 error: fileid changed
> [ 9340.173281] fsid 0:25: expected fileid 0x15f5d7c, got 0x22070e
> [ 9340.324517] NFS: server nvgm022 error: fileid changed
> [ 9340.324520] fsid 0:25: expected fileid 0x15f5d7c, got 0x11c9001
>
> Thanks
> Martin
>
>
> ------------------------------------------------------
> Martin Knoblauch
> email: k n o b i AT knobisoft DOT de
> www: http://www.knobisoft.de
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-12-11 23:44:36

by Martin Knoblauch

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed


--- Trond Myklebust <[email protected]> wrote:

> On Mon, 2006-12-11 at 08:09 -0800, Martin Knoblauch wrote:
> > Hi, [please CC me, as I am not subscribed]
> >
> > after updating a RHEL4 box (EM64T based) to a plain 2.6.19 kernel,
> we
> > are seeing repeated occurences of the following messages (about
> every
> > 45-50 minutes).
> >
> > It is always the same server (a NetApp filer, mounted via the
> > user-space automounter "amd") and the expected/got numbers seem to
> > repeat.
>
> Are you seeing it _without_ amd? The usual reason for the errors you
> see are bogus replay cache replies. For that reason, the kernel is
> usually very careful when initialising its value for the
> XID: we set part of it using the clock value, and part of it
> using a random number generator.
> I'm not so sure that other services are as careful.
>

So far, we are only seeing it on amd-mounted filesystems, not on
static NFS mounts. Unfortunatelly, it is difficult to avoid "amd" in
our environment.

> > Is there a way to find out which files are involved? Nothing
> seems to
> > be obviously breaking, but I do not like to get my logfiles filled
> up.
>
> The fileid is the same as the inode number. Just convert those
> hexadecimal values into ordinary numbers, then search for them using
> 'ls
> -i'.
>

thanks. will check that out.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-12 00:26:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed

On Mon, 2006-12-11 at 15:44 -0800, Martin Knoblauch wrote:
> So far, we are only seeing it on amd-mounted filesystems, not on
> static NFS mounts. Unfortunatelly, it is difficult to avoid "amd" in
> our environment.

Any chance you could try substituting a recent version of autofs? This
sort of problem is more likely to happen on partitions that are
unmounted and then remounted often. I'd just like to figure out if this
is something that we need to fix in the kernel, or if it is purely an
amd problem.

Cheers
Trond

2006-12-12 03:17:01

by Martin Knoblauch

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed


--- Trond Myklebust <[email protected]> wrote:

> On Mon, 2006-12-11 at 15:44 -0800, Martin Knoblauch wrote:
> > So far, we are only seeing it on amd-mounted filesystems, not on
> > static NFS mounts. Unfortunatelly, it is difficult to avoid "amd"
> in
> > our environment.
>
> Any chance you could try substituting a recent version of autofs?
> This
> sort of problem is more likely to happen on partitions that are
> unmounted and then remounted often. I'd just like to figure out if
> this
> is something that we need to fix in the kernel, or if it is purely an
> amd problem.
>
> Cheers
> Trond
>
Hi Trond,

unfortunatelly I have no controll over the mounting maps, as they are
maintained from different people. So the answer is no. Unfortunatelly
the customer has decided on using am-utils. This has been hurting us
(and them) for years ...

Your are likely correct when you hint towards partitions which are
frequently remounted.

In any case, your help is appreciated.

Cheers
Martin


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-15 17:22:50

by Martin Knoblauch

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed


--- Trond Myklebust <[email protected]> wrote:

>
> > Is there a way to find out which files are involved? Nothing
> seems to
> > be obviously breaking, but I do not like to get my logfiles filled
> up.
>
> The fileid is the same as the inode number. Just convert those
> hexadecimal values into ordinary numbers, then search for them using
> 'ls
> -i'.
>
> Trond
>
> > [ 9337.747546] NFS: server nvgm022 error: fileid changed
> > [ 9337.747549] fsid 0:25: expected fileid 0x7a6f3d, got 0x65be80
Hi Trond,

just curious: how is the fsid related to mounted filesystems? What
does "0:25" stand for?

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-15 19:30:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.19] NFS: server error: fileid changed

On Fri, 2006-12-15 at 09:16 -0800, Martin Knoblauch wrote:
> --- Trond Myklebust <[email protected]> wrote:
>
> >
> > > Is there a way to find out which files are involved? Nothing
> > seems to
> > > be obviously breaking, but I do not like to get my logfiles filled
> > up.
> >
> > The fileid is the same as the inode number. Just convert those
> > hexadecimal values into ordinary numbers, then search for them using
> > 'ls
> > -i'.
> >
> > Trond
> >
> > > [ 9337.747546] NFS: server nvgm022 error: fileid changed
> > > [ 9337.747549] fsid 0:25: expected fileid 0x7a6f3d, got 0x65be80
> Hi Trond,
>
> just curious: how is the fsid related to mounted filesystems? What
> does "0:25" stand for?
>

In this case the fsid is just the major:minor device numbers assigned to
that filesystem.
Look for it using 'stat -printf "%D\n" /mountpoint'

Cheers
Trond