2006-07-20 04:29:52

by NeilBrown

[permalink] [raw]
Subject: Re: Bug starting 2.6.16 - rpc: bad TCP reclen

On Wednesday July 19, [email protected] wrote:
> Chuck Lever wrote:
>
> > I haven't seen other reports like this recently. It could be a
> > hardware problem on your server or in your network (like a bad server
> > NIC). A network trace would be the way to start tracking this down.
> >
> > sudo tcpdump -s0 -w /tmp/dump
> >
> > on your server. Stop the dump when the server starts reporting the
> > TCP stream problems. Then take a look at the end of the dump with
> > tethereal.
> >
>
> A hardware problem is almost impossible since we test it on a lot of
> different computers. I'm really interested if someone can explain what
> that message is meaning, it doesn't make much sense to me cause i don't
> know anything about rpc.

The message means that data on the TCP connection is corrupted.
With tcp, every RPC message is prefixed by a 4 byte header.
The msb of this number is set to one to show it is the last of a
sequence of fragments (we don't support multiple rpc-fragments). The
rest of the number is the number of bytes in the RPC message.
This should be less than about 32000 as we don't support any messages
bigger than this. You are seeing number with the msb clear, and
numbers bigger than 32000.

It is very probably that this is not the first corruption in the TCP
stream, just the first that is being reported. I have no idea where
the corruption could be coming from.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-07-20 08:35:52

by Bernd Schubert

[permalink] [raw]
Subject: Re: Bug starting 2.6.16 - rpc: bad TCP reclen

> The message means that data on the TCP connection is corrupted.
> With tcp, every RPC message is prefixed by a 4 byte header.
> The msb of this number is set to one to show it is the last of a
> sequence of fragments (we don't support multiple rpc-fragments). The
> rest of the number is the number of bytes in the RPC message.
> This should be less than about 32000 as we don't support any messages
> bigger than this. You are seeing number with the msb clear, and
> numbers bigger than 32000.
>
> It is very probably that this is not the first corruption in the TCP
> stream, just the first that is being reported. I have no idea where
> the corruption could be coming from.
>

Wouldn't tcp errors corrected get corrected by the tcp checksum (retransmit=
)? =

At least thats what my text book is saying.

We are occasionally seeing those messages, too. Don't know how to reproduce =

it, though. =


on our fileserver fileserver: RPC: bad TCP reclen 0x040d0a0d (non-terminal)=
=

on our compute cluster server: RPC: bad TCP reclen 0x2dacc6c9 (large)

Here its not related to 2.6.16 only. The compute cluster server shows those =

messages - both server and clients are using 2.6.15. Until recently it run =

2.6.11 and we never had those messages that time.
Our main fileserver is running 2.6.13, most clients still 2.6.11 and some =

clients 2.6.16. As far as I remember, those messages began when we updated =

the clients from 2.6.11 to 2.6.16.

Thanks,
Bernd

-- =

Bernd Schubert
PCI / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs