2008-02-18 12:59:58

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

(suitable cc added)

(regression)

On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[email protected]> wrote:

> Hello!
>
> After upgrading to 2.6.24 (from .23), we're seeing ALOT
> of messages like in $subj in dmesg:
>
> Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> ...
>
> with linux NFS server. The clients are all linux too, mostly 2.6.23
> and some 2.6.22.
>
> I found the "offending" piece of code in net/sunrpc/svcsock.c,
> in routine svc_tcp_recvfrom() with condition being:
>
> if (svsk->sk_reclen > serv->sv_max_mesg) ...
>
> This happens after a server reboot. At this point, client(s) are trying
> to perform some NFS transaction and fail, and server starts generating
> the above messages - till I do a umount followed by mount on all clients.
> Before, such situation (nfs server reboot) were handled transparently,
> ie, there was nothing to do, the mount continued working just fine when
> the server comes back online.
>
> Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> problem. Some time ago we also upgraded nfs-kernel-server (Debian)
> package, and the remount-after-nfs-server-reboot problem started to
> occur at THAT time (and it is something to worry about as well, I just
> had no time to deal with it); but the dmesg spamming only appeared
> with 2.6.24.
>
> How to debug the issue further on from this point?
>




2008-02-18 13:05:12

by Michael Tokarev

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

Andrew Morton wrote:
> (suitable cc added)

Thanks. I was meant to sent it to linux-nfs originally, but
looks like i mistyped the address.

> (regression)

Now, after we did some more experiments with it, I don't think it's
a regression. I'll post a bit more details in a few hours when the
ongoing testing finishes. Thanks!

/mjt

2008-02-18 21:16:03

by Tom Tucker

[permalink] [raw]
Subject: Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)


On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
> (suitable cc added)
>
> (regression)
>
> On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[email protected]> wrote:
>
> > Hello!
> >
> > After upgrading to 2.6.24 (from .23), we're seeing ALOT
> > of messages like in $subj in dmesg:
> >
> > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> > ...
> >
> > with linux NFS server. The clients are all linux too, mostly 2.6.23
> > and some 2.6.22.
> >
> > I found the "offending" piece of code in net/sunrpc/svcsock.c,
> > in routine svc_tcp_recvfrom() with condition being:
> >
> > if (svsk->sk_reclen > serv->sv_max_mesg) ...

The problem might be that the client is setting a bit in the RPC message
length field that is meant to be interpreted and masked off by the
server -- and we're not doing it yet. My bet is that 0x20000 is the bit
we're looking for. I'll poke around...

> >
> > This happens after a server reboot. At this point, client(s) are trying
> > to perform some NFS transaction and fail, and server starts generating
> > the above messages - till I do a umount followed by mount on all clients.
> > Before, such situation (nfs server reboot) were handled transparently,
> > ie, there was nothing to do, the mount continued working just fine when
> > the server comes back online.
> >
> > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> > problem. Some time ago we also upgraded nfs-kernel-server (Debian)
> > package, and the remount-after-nfs-server-reboot problem started to
> > occur at THAT time (and it is something to worry about as well, I just
> > had no time to deal with it); but the dmesg spamming only appeared
> > with 2.6.24.
> >
> > How to debug the issue further on from this point?
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html