2008-03-14 19:06:48

by Bruce Fields

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

On Fri, Mar 14, 2008 at 09:57:00PM +0300, Michael Tokarev wrote:
> Tom Tucker wrote:
>> Michael:
>>
>>>>>>> On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> After upgrading to 2.6.24 (from .23), we're seeing ALOT
>>>>>>>> of messages like in $subj in dmesg:
>>>>>>>>
>>>>>>>> Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>> Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
>>>>>>>> Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>> Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
>>>>>>>> Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>> Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
>>>>>>>> ...
>>>>>>>>
>>
>> Are you seeing this with the latest bits? I just want to make sure that
>> this particular close path issue is fixed.
>
> Err. I completely forgot about that issue, due to many many other
> issues popped up last few weeks...
>
> Ok.
>
> I tried to reproduce it here. It happened only once here, when I changed
> the kernel on the NFS server from 2.6.23-i686 to 2.6.24-x86-64, without
> rebooting/remounting clients. The messages shown above were on the server.
> After remounting the filesystem on clients, the message disappeared.
>
> After that, I tried the same thing with other machines (that one was
> our main production server so no experiments there) -- same clients but
> another server. I did many reboots with different kernels while the
> clients had filesystems mounted - but wasn't able to reproduce the same
> messages again.
>
> So I don't really know what happened, and even if whatever happened
> was due to single client or not - I wasn't thought about tcpdump at
> the time when I were remounting the clients. Maybe it was a random
> glitch, maybe it IS a bug - I don't really know by now.
>
> There was another issue before, when after upgrading the server,
> clients were needed to remount stuff or else "ESTALE" were always
> returned. I think it was around 2.6.21=>2.6.22. Again, I can't
> reproduce it anymore (with current kernels).
>
> So I think the case can be closed now - esp. since noone (it seems)
> reported similar issues.

OK. But we do expect clients to continue working normally even when the
server's kernel is upgraded, so continue reporting such problems when
you run across them; hints on how to reproduce such problems are
particularly helpful.

--b.


2008-03-14 19:25:29

by Tom Tucker

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)


Michael:

Thanks for the update. BTW, the perfect "positive fix indication" would be
seeing a single "...bad TCP reclen..." message in the log for the
reconnecting/confused client.

Thanks,
Tom


On 3/14/08 2:06 PM, "J. Bruce Fields" <[email protected]> wrote:

> On Fri, Mar 14, 2008 at 09:57:00PM +0300, Michael Tokarev wrote:
>> Tom Tucker wrote:
>>> Michael:
>>>
>>>>>>>> On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> After upgrading to 2.6.24 (from .23), we're seeing ALOT
>>>>>>>>> of messages like in $subj in dmesg:
>>>>>>>>>
>>>>>>>>> Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>>> Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
>>>>>>>>> Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>>> Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
>>>>>>>>> Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
>>>>>>>>> Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
>>>>>>>>> ...
>>>>>>>>>
>>>
>>> Are you seeing this with the latest bits? I just want to make sure that
>>> this particular close path issue is fixed.
>>
>> Err. I completely forgot about that issue, due to many many other
>> issues popped up last few weeks...
>>
>> Ok.
>>
>> I tried to reproduce it here. It happened only once here, when I changed
>> the kernel on the NFS server from 2.6.23-i686 to 2.6.24-x86-64, without
>> rebooting/remounting clients. The messages shown above were on the server.
>> After remounting the filesystem on clients, the message disappeared.
>>
>> After that, I tried the same thing with other machines (that one was
>> our main production server so no experiments there) -- same clients but
>> another server. I did many reboots with different kernels while the
>> clients had filesystems mounted - but wasn't able to reproduce the same
>> messages again.
>>
>> So I don't really know what happened, and even if whatever happened
>> was due to single client or not - I wasn't thought about tcpdump at
>> the time when I were remounting the clients. Maybe it was a random
>> glitch, maybe it IS a bug - I don't really know by now.
>>
>> There was another issue before, when after upgrading the server,
>> clients were needed to remount stuff or else "ESTALE" were always
>> returned. I think it was around 2.6.21=>2.6.22. Again, I can't
>> reproduce it anymore (with current kernels).
>>
>> So I think the case can be closed now - esp. since noone (it seems)
>> reported similar issues.
>
> OK. But we do expect clients to continue working normally even when the
> server's kernel is upgraded, so continue reporting such problems when
> you run across them; hints on how to reproduce such problems are
> particularly helpful.
>
> --b.



2008-03-15 00:10:17

by Bruce Fields

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

On Fri, Mar 14, 2008 at 02:25:21PM -0500, Tom Tucker wrote:
>
> Michael:
>
> Thanks for the update. BTW, the perfect "positive fix indication" would be
> seeing a single "...bad TCP reclen..." message in the log for the
> reconnecting/confused client.

Hm. This may be entirely unrelated (and not at all your responsibility)
but

./testserver.py server:/ --rundeps -v WRT5

with newpynfs:

http://www.citi.umich.edu/projects/nfsv4/pynfs/

gets me a "RPC: bad TCP reclen 0x00000800 (non-terminal)" and a test
failure. Hm. It may be that newpynfs is doing something intolerably
weird and the server's just dropping the request instead of returning an
error. OK, probably totally unrelated. I haven't looked hard enough
yet.

--b.