We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
the root filesystem and Linux 2.6 mounted filesystems. A simple test
runs to copy files from one mount point to another (both are different
directories on the same NFS server mounted at differet points).
After 30 copies of a hundred files are made, the system is rebooted and
the test repeats.
After 2 reboots, an NFS file is created, and we get the following error
from the kernel:
nfs_refresh_inode: inode number mismatch
expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
We're just trying to figure out what to do to figure out what the
problem is. Is there a good place to place printks or breakpoints?
Thanks for any assistance you can provide.
Chris Carlson
CONFIDENTIALITY NOTICE:
This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to [email protected]
and delete this email, along with any attachments, from your computer.
Thank you.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
A few weeks ago, I asked for assistance in finding the cause for an
issue with NFS we were experiencing. The original message is below.
We followed a path down a response we received having to do with an old
version of the OnTap system on our NetApps servers. Apparently, it is a
caching problem that is known when using NetApps NFS servers.
Suddenly, we discovered the same problem with our Snap Appliance
servers. Now we can't blame it on NetApps.
A theory we came up with was that the real-time clock on our boards is
not operational. Is it possible that during our frequent reboots, the
sequence number of NFS RPC calls is coinciding with previous runs, and
the server is responding with cached packets having the same sequence
number on the previous run?
I have noticed that in Linux 2.4, the random seed appears to be
generated from the lower 16 bits of the MAC address. This implies to me
that it is quite likely the sequence numbers would be identical from one
run to the next.
Does our theory that the server is sending cached responses sound plausible?
Thanks for your time,
Chris
> On Tue, 11 Sep 2007 09:41:51 -0700
> "Chris Carlson" <[email protected]> wrote:
>
>
>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
>> the root filesystem and Linux 2.6 mounted filesystems. A simple test
>> runs to copy files from one mount point to another (both are different
>> directories on the same NFS server mounted at differet points).
>>
>> After 30 copies of a hundred files are made, the system is rebooted and
>> the test repeats.
>>
>> After 2 reboots, an NFS file is created, and we get the following error
>> from the kernel:
>>
>> nfs_refresh_inode: inode number mismatch
>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>
>>
>
>
>
CONFIDENTIALITY NOTICE:
This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to [email protected]
and delete this email, along with any attachments, from your computer.
Thank you.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Tue, 2007-09-11 at 09:41 -0700, Chris Carlson wrote:
> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
> the root filesystem and Linux 2.6 mounted filesystems. A simple test
> runs to copy files from one mount point to another (both are different
> directories on the same NFS server mounted at differet points).
>
> After 30 copies of a hundred files are made, the system is rebooted and
> the test repeats.
>
> After 2 reboots, an NFS file is created, and we get the following error
> from the kernel:
>
> nfs_refresh_inode: inode number mismatch
> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>
> We're just trying to figure out what to do to figure out what the
> problem is. Is there a good place to place printks or breakpoints?
It is a known problem with some versions of OnTap: they sometimes return
corrupted attribute information when an operation is denied due to a
'read-only' export option.
You should be able to fix the problem by upgrading to a more recent
version of OnTap.
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Tue, 11 Sep 2007 09:41:51 -0700
"Chris Carlson" <[email protected]> wrote:
>
> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
> the root filesystem and Linux 2.6 mounted filesystems. A simple test
> runs to copy files from one mount point to another (both are different
> directories on the same NFS server mounted at differet points).
>
> After 30 copies of a hundred files are made, the system is rebooted and
> the test repeats.
>
> After 2 reboots, an NFS file is created, and we get the following error
> from the kernel:
>
> nfs_refresh_inode: inode number mismatch
> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>
This may be a bug in the NetApp. I saw some similar messages when
working on an issue and it turned out to be a filer bug. I ended
up tracking it down by doing network captures, and then searching them
for the 'expected' and 'got' sequence of bytes in wireshark. It showed
that in some cases the netapp was sending back a new fileid in the
WCC attributes for the dir when a create call would fail.
> We're just trying to figure out what to do to figure out what the
> problem is. Is there a good place to place printks or breakpoints?
>
> Thanks for any assistance you can provide.
>
> Chris Carlson
>
>
> CONFIDENTIALITY NOTICE:
>
> This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
> individual(s) to which it is addressed and may contain information that is privileged, confidential or
> exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
> email in error, please notify Aristos Logic Corporation by sending an email to [email protected]
> and delete this email, along with any attachments, from your computer.
>
> Thank you.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
--
Jeff Layton <[email protected]>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Tue, 11 Sep 2007 13:02:26 -0400
Jeff Layton <[email protected]> wrote:
> On Tue, 11 Sep 2007 09:41:51 -0700
> "Chris Carlson" <[email protected]> wrote:
>
> >
> > We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
> > the root filesystem and Linux 2.6 mounted filesystems. A simple test
> > runs to copy files from one mount point to another (both are different
> > directories on the same NFS server mounted at differet points).
> >
> > After 30 copies of a hundred files are made, the system is rebooted and
> > the test repeats.
> >
> > After 2 reboots, an NFS file is created, and we get the following error
> > from the kernel:
> >
> > nfs_refresh_inode: inode number mismatch
> > expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
> >
>
> This may be a bug in the NetApp. I saw some similar messages when
> working on an issue and it turned out to be a filer bug. I ended
> up tracking it down by doing network captures, and then searching them
> for the 'expected' and 'got' sequence of bytes in wireshark. It showed
> that in some cases the netapp was sending back a new fileid in the
> WCC attributes for the dir when a create call would fail.
>
For the record, the NetApp engineers I worked with on this issue referenced
NetApp BURT:
244015: We should not have pre/post attributes in case of an error coming
from exports code
You might want to check that your filer has that fix.
--
Jeff Layton <[email protected]>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Thanks a lot, Jeff. It looks like this is the problem. You just saved
me from days of research and testing.
Chris
Jeff Layton wrote:
> On Tue, 11 Sep 2007 13:02:26 -0400
> Jeff Layton <[email protected]> wrote:
>
>
>> On Tue, 11 Sep 2007 09:41:51 -0700
>> "Chris Carlson" <[email protected]> wrote:
>>
>>
>>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as
>>> the root filesystem and Linux 2.6 mounted filesystems. A simple test
>>> runs to copy files from one mount point to another (both are different
>>> directories on the same NFS server mounted at differet points).
>>>
>>> After 30 copies of a hundred files are made, the system is rebooted and
>>> the test repeats.
>>>
>>> After 2 reboots, an NFS file is created, and we get the following error
>>> from the kernel:
>>>
>>> nfs_refresh_inode: inode number mismatch
>>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>>
>>>
>> This may be a bug in the NetApp. I saw some similar messages when
>> working on an issue and it turned out to be a filer bug. I ended
>> up tracking it down by doing network captures, and then searching them
>> for the 'expected' and 'got' sequence of bytes in wireshark. It showed
>> that in some cases the netapp was sending back a new fileid in the
>> WCC attributes for the dir when a create call would fail.
>>
>>
>
> For the record, the NetApp engineers I worked with on this issue referenced
> NetApp BURT:
>
> 244015: We should not have pre/post attributes in case of an error coming
> from exports code
>
> You might want to check that your filer has that fix.
>
>
CONFIDENTIALITY NOTICE:
This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to [email protected]
and delete this email, along with any attachments, from your computer.
Thank you.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Thanks, Chuck. Yours and Jeff's directions helped a lot.
We are running Linux 2.4.20 on our clients and NetApps ONTAP 6.3.3 on
the servers that are causing the problem. Based on all of this info, it
appears the Linux client is blameless.
Thanks again,
Chris
Chuck Lever wrote:
> Hi Chris-
>
> Chris Carlson wrote:
>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers
>> as the root filesystem and Linux 2.6 mounted filesystems. A simple
>> test runs to copy files from one mount point to another (both are
>> different directories on the same NFS server mounted at differet
>> points).
>>
>> After 30 copies of a hundred files are made, the system is rebooted
>> and the test repeats.
>>
>> After 2 reboots, an NFS file is created, and we get the following
>> error from the kernel:
>>
>> nfs_refresh_inode: inode number mismatch
>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>
>> We're just trying to figure out what to do to figure out what the
>> problem is. Is there a good place to place printks or breakpoints?
>
> This may be due to an RPC XID collision. Which 2.4 kernel are you
> using? The Linux NFS client may be sending the same XID sequence on
> the same port number after each reboot, in which case the server will
> respond with a cached reply rather than doing real work. The cached
> reply may contain old file ID information, which triggers the "inode
> number mismatch" message you see in your log.
>
> One way to detect if this is happening is to use "pktt" on the filer.
> You can capture a packet trace across client reboots to determine if
>
> A) the transport socket's port number is the same across reboots, and
>
> B) the RPC XID sequence is the same
CONFIDENTIALITY NOTICE:
This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to [email protected]
and delete this email, along with any attachments, from your computer.
Thank you.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs