2003-11-18 13:44:16

by Andi Kleen

[permalink] [raw]
Subject: Re: possible bug in tcp_input.c

Tomas Szepe <[email protected]> writes:

> /* tcp_input.c, line 1138 */
> static inline int tcp_head_timedout(struct sock *sk, struct tcp_opt *tp)
> {
> return tp->packets_out && tcp_skb_timedout(tp, skb_peek(&sk->write_queue));
> }

tp->packets_out > 0 implies that there is at least one packet in the write
queue (it counts the number of unacked packets in flight, which are kept
in the write queue). When that's not the case something else is wrong.

-Andi


2003-11-18 14:02:18

by Andi Kleen

[permalink] [raw]
Subject: Re: possible bug in tcp_input.c

On Tue, 18 Nov 2003 14:58:05 +0100
Tomas Szepe <[email protected]> wrote:

> On Oct-24 2003, Fri, 19:57 +0200
> Andi Kleen <[email protected]> wrote:
>
> > > /* tcp_input.c, line 1138 */
> > > static inline int tcp_head_timedout(struct sock *sk, struct tcp_opt *tp)
> > > {
> > > return tp->packets_out && tcp_skb_timedout(tp, skb_peek(&sk->write_queue));
> > > }
> >
> > tp->packets_out > 0 implies that there is at least one packet in the write
> > queue (it counts the number of unacked packets in flight, which are kept
> > in the write queue). When that's not the case something else is wrong.
>
> Yes, that's exactly what davem said. The corruption is happening somewhere
> in netsched/imq code that's not even part of the official kernel tree (and
> I'm told there's nobody to maintain the patch at present).

Ignore the mail. It was some machine flushing out an old mail queue
(with some very old mails from me that never made it out)

I actually wrote it before DaveM if you check the dates ;-)

-Andi

2003-11-18 13:59:41

by Tomas Szepe

[permalink] [raw]
Subject: Re: possible bug in tcp_input.c

On Oct-24 2003, Fri, 19:57 +0200
Andi Kleen <[email protected]> wrote:

> > /* tcp_input.c, line 1138 */
> > static inline int tcp_head_timedout(struct sock *sk, struct tcp_opt *tp)
> > {
> > return tp->packets_out && tcp_skb_timedout(tp, skb_peek(&sk->write_queue));
> > }
>
> tp->packets_out > 0 implies that there is at least one packet in the write
> queue (it counts the number of unacked packets in flight, which are kept
> in the write queue). When that's not the case something else is wrong.

Yes, that's exactly what davem said. The corruption is happening somewhere
in netsched/imq code that's not even part of the official kernel tree (and
I'm told there's nobody to maintain the patch at present).

Thanks,
--
Tomas Szepe <[email protected]>

P.S. I can post the patchset we've been using on the crashing machines
in case someone's interested, it's reasonably short:

9101 Jul 6 11:48 bridge-nf-0.0.7-against-2.4.22pre3.diff.gz
4123 Jul 6 11:14 imq-2.4.22pre3-1.diff.gz
1883 Jul 6 12:01 imq-nf-20030625-2.4.22pre3.diff.gz