2010-07-08 08:22:15

by Tejun Heo

[permalink] [raw]
Subject: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Hello,

We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
Please see the attached photoshoot. This is happening on a HPC
cluster and very interestingly caused by one particular job. How long
it takes isn't clear yet (at least more than a day) but when it
happens it happens on a lot of machines in relatively short time.

With a bit of disassemblying, I've found that the oops is happening
during tcp_for_write_queue_from() because the skb->next points to
NULL.

void tcp_xmit_retransmit_queue(struct sock *sk)
{
...
if (tp->retransmit_skb_hint) {
skb = tp->retransmit_skb_hint;
last_lost = TCP_SKB_CB(skb)->end_seq;
if (after(last_lost, tp->retransmit_high))
last_lost = tp->retransmit_high;
} else {
skb = tcp_write_queue_head(sk);
last_lost = tp->snd_una;
}

=> tcp_for_write_queue_from(skb, sk) {
__u8 sacked = TCP_SKB_CB(skb)->sacked;

if (skb == tcp_send_head(sk))
break;
/* we could do better than to assign each time */
if (hole == NULL)

This can happen for one of the following reasons,

1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
too. ie. tcp_xmit_retransmit_queue() is called on an empty write
queue for some reason.

2. tp->retransmit_skb_hint is pointing to a skb which is not on the
write_queue. ie. somebody forgot to update hint while removing the
skb from the write queue.

3. The hint is pointing to a skb on the list but the list itself is
corrupt.

I added some debug code and the crash is happening when
tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next
is NULL. So, #1 is out; unfortunately, I didn't have debug code in
place to discern between #2 and #3.

Does anything ring a bell? This is a production system and debugging
affects quite a number of people. I can put debug code in to discern
between #2 and #3 but I'm basically shooting in the dark and it would
be great if someone has a better idea.

Thanks.

--
tejun


Attachments:
oops.jpg (90.41 kB)

2010-07-11 02:36:21

by David Miller

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

From: Tejun Heo <[email protected]>
Date: Thu, 08 Jul 2010 10:22:02 +0200

> We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
...
> Does anything ring a bell?

A long time ago we had a packet scheduler bug that could corrupt
the TCP socket queues, but that was fixed in 2.6.27 so would
definitely be fixed in your kernel.

--------------------
commit 69747650c814a8a79fef412c7416adf823293a3e
Author: David S. Miller <[email protected]>
Date: Sun Aug 17 23:55:36 2008 -0700

pkt_sched: Fix return value corruption in HTB and TBF.

Based upon a bug report by Josip Rodin.

Packet schedulers should only return NET_XMIT_DROP iff
the packet really was dropped. If the packet does reach
the device after we return NET_XMIT_DROP then TCP can
crash because it depends upon the enqueue path return
values being accurate.

Signed-off-by: David S. Miller <[email protected]>
--------------------

Nothing else jumps to mind, sorry.

2010-07-11 16:09:45

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Thu, 8 Jul 2010, Tejun Heo wrote:

> We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> Please see the attached photoshoot. This is happening on a HPC
> cluster and very interestingly caused by one particular job. How long
> it takes isn't clear yet (at least more than a day) but when it
> happens it happens on a lot of machines in relatively short time.
>
> With a bit of disassemblying, I've found that the oops is happening
> during tcp_for_write_queue_from() because the skb->next points to
> NULL.
>
> void tcp_xmit_retransmit_queue(struct sock *sk)
> {
> ...
> if (tp->retransmit_skb_hint) {
> skb = tp->retransmit_skb_hint;
> last_lost = TCP_SKB_CB(skb)->end_seq;
> if (after(last_lost, tp->retransmit_high))
> last_lost = tp->retransmit_high;
> } else {
> skb = tcp_write_queue_head(sk);
> last_lost = tp->snd_una;
> }
>
> => tcp_for_write_queue_from(skb, sk) {
> __u8 sacked = TCP_SKB_CB(skb)->sacked;
>
> if (skb == tcp_send_head(sk))
> break;
> /* we could do better than to assign each time */
> if (hole == NULL)
>
> This can happen for one of the following reasons,
>
> 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> queue for some reason.
>
> 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> write_queue. ie. somebody forgot to update hint while removing the
> skb from the write queue.

Once again I've read the unlinkers through, and only thing that could
cause this is tcp_send_synack (others do deal with the hints) but I think
Eric already proposed a patch to that but we never got anywhere due to
some counterargument why it wouldn't take place (too far away for me to
remember, see archives about the discussions). ...But if you want be dead
sure some WARN_ON there might not hurt. Also the purging of the whole
queue was a similar suspect I then came across (but that would only
materialize with sk reuse happening e.g., with nfs which the other guys
weren't using).

> 3. The hint is pointing to a skb on the list but the list itself is
> corrupt.
>
> I added some debug code and the crash is happening when
> tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next
> is NULL. So, #1 is out; unfortunately, I didn't have debug code in
> place to discern between #2 and #3.
>
> Does anything ring a bell? This is a production system and debugging
> affects quite a number of people. I can put debug code in to discern
> between #2 and #3 but I'm basically shooting in the dark and it would
> be great if someone has a better idea.

Thanks for taking this up. I've been kind of waiting somebody to show up
who actually has some way of reproducing it. Once I had one guy in the
hook but his ability to reproduce was for some reason lost when he tried
with a debug patch [1].

I now realize that the debug patch should probably also print the write
queue too when the problem is caught in order to discern the cases you
mention.

Something along these lines:

tcp_for_write_queue(skb, sk) {
printk("skb %p (%u-%u) next %p prev %p sacked %u\n", ...);
}

Anyway, my debugging patch should be such that in a lucky case it avoids
crashing the system too, though price to pay might then be a stuck
connection. In case #3 I'd expect the box to die elsewhere in TCP code
pretty soon anyway so it depends whether avoiding oops is really so
useful, but if you're lucky other mechanism in TCP will recover
the lost one for you (basically RTO driven retransmission).

--
i.

[1] http://marc.info/?l=linux-kernel&m=126624014117610&w=2

2010-07-11 17:06:23

by Eric Dumazet

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> On Thu, 8 Jul 2010, Tejun Heo wrote:
>
> > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > Please see the attached photoshoot. This is happening on a HPC
> > cluster and very interestingly caused by one particular job. How long
> > it takes isn't clear yet (at least more than a day) but when it
> > happens it happens on a lot of machines in relatively short time.
> >
> > With a bit of disassemblying, I've found that the oops is happening
> > during tcp_for_write_queue_from() because the skb->next points to
> > NULL.
> >
> > void tcp_xmit_retransmit_queue(struct sock *sk)
> > {
> > ...
> > if (tp->retransmit_skb_hint) {
> > skb = tp->retransmit_skb_hint;
> > last_lost = TCP_SKB_CB(skb)->end_seq;
> > if (after(last_lost, tp->retransmit_high))
> > last_lost = tp->retransmit_high;
> > } else {
> > skb = tcp_write_queue_head(sk);
> > last_lost = tp->snd_una;
> > }
> >
> > => tcp_for_write_queue_from(skb, sk) {
> > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> >
> > if (skb == tcp_send_head(sk))
> > break;
> > /* we could do better than to assign each time */
> > if (hole == NULL)
> >
> > This can happen for one of the following reasons,
> >
> > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > queue for some reason.
> >
> > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > write_queue. ie. somebody forgot to update hint while removing the
> > skb from the write queue.
>
> Once again I've read the unlinkers through, and only thing that could
> cause this is tcp_send_synack (others do deal with the hints) but I think
> Eric already proposed a patch to that but we never got anywhere due to
> some counterargument why it wouldn't take place (too far away for me to
> remember, see archives about the discussions). ...But if you want be dead
> sure some WARN_ON there might not hurt. Also the purging of the whole
> queue was a similar suspect I then came across (but that would only
> materialize with sk reuse happening e.g., with nfs which the other guys
> weren't using).
>

Hmm.

This sounds familiar to me, but I cannot remember the discussion you
mention or the patch.

Or maybe it was the TCP transaction thing ? (including data in SYN or
SYN-ACK packet)

> > 3. The hint is pointing to a skb on the list but the list itself is
> > corrupt.
> >
> > I added some debug code and the crash is happening when
> > tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next
> > is NULL. So, #1 is out; unfortunately, I didn't have debug code in
> > place to discern between #2 and #3.
> >
> > Does anything ring a bell? This is a production system and debugging
> > affects quite a number of people. I can put debug code in to discern
> > between #2 and #3 but I'm basically shooting in the dark and it would
> > be great if someone has a better idea.
>
> Thanks for taking this up. I've been kind of waiting somebody to show up
> who actually has some way of reproducing it. Once I had one guy in the
> hook but his ability to reproduce was for some reason lost when he tried
> with a debug patch [1].
>
> I now realize that the debug patch should probably also print the write
> queue too when the problem is caught in order to discern the cases you
> mention.
>
> Something along these lines:
>
> tcp_for_write_queue(skb, sk) {
> printk("skb %p (%u-%u) next %p prev %p sacked %u\n", ...);
> }
>
> Anyway, my debugging patch should be such that in a lucky case it avoids
> crashing the system too, though price to pay might then be a stuck
> connection. In case #3 I'd expect the box to die elsewhere in TCP code
> pretty soon anyway so it depends whether avoiding oops is really so
> useful, but if you're lucky other mechanism in TCP will recover
> the lost one for you (basically RTO driven retransmission).
>

2010-07-11 17:46:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit :
> Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> > On Thu, 8 Jul 2010, Tejun Heo wrote:
> >
> > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > > Please see the attached photoshoot. This is happening on a HPC
> > > cluster and very interestingly caused by one particular job. How long
> > > it takes isn't clear yet (at least more than a day) but when it
> > > happens it happens on a lot of machines in relatively short time.
> > >
> > > With a bit of disassemblying, I've found that the oops is happening
> > > during tcp_for_write_queue_from() because the skb->next points to
> > > NULL.
> > >
> > > void tcp_xmit_retransmit_queue(struct sock *sk)
> > > {
> > > ...
> > > if (tp->retransmit_skb_hint) {
> > > skb = tp->retransmit_skb_hint;
> > > last_lost = TCP_SKB_CB(skb)->end_seq;
> > > if (after(last_lost, tp->retransmit_high))
> > > last_lost = tp->retransmit_high;
> > > } else {
> > > skb = tcp_write_queue_head(sk);
> > > last_lost = tp->snd_una;
> > > }
> > >
> > > => tcp_for_write_queue_from(skb, sk) {
> > > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> > >
> > > if (skb == tcp_send_head(sk))
> > > break;
> > > /* we could do better than to assign each time */
> > > if (hole == NULL)
> > >
> > > This can happen for one of the following reasons,
> > >
> > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > > queue for some reason.
> > >
> > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > > write_queue. ie. somebody forgot to update hint while removing the
> > > skb from the write queue.
> >
> > Once again I've read the unlinkers through, and only thing that could
> > cause this is tcp_send_synack (others do deal with the hints) but I think
> > Eric already proposed a patch to that but we never got anywhere due to
> > some counterargument why it wouldn't take place (too far away for me to
> > remember, see archives about the discussions). ...But if you want be dead
> > sure some WARN_ON there might not hurt. Also the purging of the whole
> > queue was a similar suspect I then came across (but that would only
> > materialize with sk reuse happening e.g., with nfs which the other guys
> > weren't using).
> >
>
> Hmm.
>
> This sounds familiar to me, but I cannot remember the discussion you
> mention or the patch.
>
> Or maybe it was the TCP transaction thing ? (including data in SYN or
> SYN-ACK packet)

Hmm, I cannot find where we reset restransmit_skb_hint in
tcp_mtu_probe(), if we call tcp_unlink_write_queue().

if (skb->len <= copy) {
/* We've eaten all the data from this skb.
* Throw it away. */
TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags;
<<>> tcp_unlink_write_queue(skb, sk);
sk_wmem_free_skb(sk, skb);
} else {


Sorry if this was already discussed. We might add a comment here in anycase ;)

2010-07-11 18:29:56

by Eric Dumazet

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit :
> Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit :
> > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> > > On Thu, 8 Jul 2010, Tejun Heo wrote:
> > >
> > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > > > Please see the attached photoshoot. This is happening on a HPC
> > > > cluster and very interestingly caused by one particular job. How long
> > > > it takes isn't clear yet (at least more than a day) but when it
> > > > happens it happens on a lot of machines in relatively short time.
> > > >
> > > > With a bit of disassemblying, I've found that the oops is happening
> > > > during tcp_for_write_queue_from() because the skb->next points to
> > > > NULL.
> > > >
> > > > void tcp_xmit_retransmit_queue(struct sock *sk)
> > > > {
> > > > ...
> > > > if (tp->retransmit_skb_hint) {
> > > > skb = tp->retransmit_skb_hint;
> > > > last_lost = TCP_SKB_CB(skb)->end_seq;
> > > > if (after(last_lost, tp->retransmit_high))
> > > > last_lost = tp->retransmit_high;
> > > > } else {
> > > > skb = tcp_write_queue_head(sk);
> > > > last_lost = tp->snd_una;
> > > > }
> > > >
> > > > => tcp_for_write_queue_from(skb, sk) {
> > > > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> > > >
> > > > if (skb == tcp_send_head(sk))
> > > > break;
> > > > /* we could do better than to assign each time */
> > > > if (hole == NULL)
> > > >
> > > > This can happen for one of the following reasons,
> > > >
> > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > > > queue for some reason.
> > > >
> > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > > > write_queue. ie. somebody forgot to update hint while removing the
> > > > skb from the write queue.
> > >
> > > Once again I've read the unlinkers through, and only thing that could
> > > cause this is tcp_send_synack (others do deal with the hints) but I think
> > > Eric already proposed a patch to that but we never got anywhere due to
> > > some counterargument why it wouldn't take place (too far away for me to
> > > remember, see archives about the discussions). ...But if you want be dead
> > > sure some WARN_ON there might not hurt. Also the purging of the whole
> > > queue was a similar suspect I then came across (but that would only
> > > materialize with sk reuse happening e.g., with nfs which the other guys
> > > weren't using).
> > >
> >
> > Hmm.
> >
> > This sounds familiar to me, but I cannot remember the discussion you
> > mention or the patch.
> >
> > Or maybe it was the TCP transaction thing ? (including data in SYN or
> > SYN-ACK packet)
>
> Hmm, I cannot find where we reset restransmit_skb_hint in
> tcp_mtu_probe(), if we call tcp_unlink_write_queue().
>
> if (skb->len <= copy) {
> /* We've eaten all the data from this skb.
> * Throw it away. */
> TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags;
> <<>> tcp_unlink_write_queue(skb, sk);
> sk_wmem_free_skb(sk, skb);
> } else {
>
>
> Sorry if this was already discussed. We might add a comment here in anycase ;)
>

Just in case, here is a patch for this issue, if Tejun wants to try it.

Thanks

[PATCH] tcp: tcp_mtu_probe() and retransmit hints

When removing an skb from tcp write queue, we must take care of various
hints that could be kept on this skb. tcp_mtu_probe() misses this
cleanup.

lkml reference : http://lkml.org/lkml/2010/7/8/63

Reported-by: Tejun Heo <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..187453f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1666,6 +1666,9 @@ static int tcp_mtu_probe(struct sock *sk)
* Throw it away. */
TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags;
tcp_unlink_write_queue(skb, sk);
+ tcp_clear_retrans_hints_partial(tp);
+ if (skb == tp->retransmit_skb_hint)
+ tp->retransmit_skb_hint = nskb;
sk_wmem_free_skb(sk, skb);
} else {
TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags &

2010-07-11 19:22:19

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Sun, 11 Jul 2010, Eric Dumazet wrote:

> Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit :
> > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit :
> > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> > > > On Thu, 8 Jul 2010, Tejun Heo wrote:
> > > >
> > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > > > > Please see the attached photoshoot. This is happening on a HPC
> > > > > cluster and very interestingly caused by one particular job. How long
> > > > > it takes isn't clear yet (at least more than a day) but when it
> > > > > happens it happens on a lot of machines in relatively short time.
> > > > >
> > > > > With a bit of disassemblying, I've found that the oops is happening
> > > > > during tcp_for_write_queue_from() because the skb->next points to
> > > > > NULL.
> > > > >
> > > > > void tcp_xmit_retransmit_queue(struct sock *sk)
> > > > > {
> > > > > ...
> > > > > if (tp->retransmit_skb_hint) {
> > > > > skb = tp->retransmit_skb_hint;
> > > > > last_lost = TCP_SKB_CB(skb)->end_seq;
> > > > > if (after(last_lost, tp->retransmit_high))
> > > > > last_lost = tp->retransmit_high;
> > > > > } else {
> > > > > skb = tcp_write_queue_head(sk);
> > > > > last_lost = tp->snd_una;
> > > > > }
> > > > >
> > > > > => tcp_for_write_queue_from(skb, sk) {
> > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> > > > >
> > > > > if (skb == tcp_send_head(sk))
> > > > > break;
> > > > > /* we could do better than to assign each time */
> > > > > if (hole == NULL)
> > > > >
> > > > > This can happen for one of the following reasons,
> > > > >
> > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > > > > queue for some reason.
> > > > >
> > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > > > > write_queue. ie. somebody forgot to update hint while removing the
> > > > > skb from the write queue.
> > > >
> > > > Once again I've read the unlinkers through, and only thing that could
> > > > cause this is tcp_send_synack (others do deal with the hints) but I think
> > > > Eric already proposed a patch to that but we never got anywhere due to
> > > > some counterargument why it wouldn't take place (too far away for me to
> > > > remember, see archives about the discussions). ...But if you want be dead
> > > > sure some WARN_ON there might not hurt. Also the purging of the whole
> > > > queue was a similar suspect I then came across (but that would only
> > > > materialize with sk reuse happening e.g., with nfs which the other guys
> > > > weren't using).
> > > >
> > >
> > > Hmm.
> > >
> > > This sounds familiar to me, but I cannot remember the discussion you
> > > mention or the patch.
> > >
> > > Or maybe it was the TCP transaction thing ? (including data in SYN or
> > > SYN-ACK packet)

No. That's another thing. ...I've already found it with google today but
cannot seem to find it again. I thought I used tcp_make_synack eric but
for some reason I only get these transaction fix hits. I'll keep looking.

> > Hmm, I cannot find where we reset restransmit_skb_hint in
> > tcp_mtu_probe(), if we call tcp_unlink_write_queue().
> >
> > if (skb->len <= copy) {
> > /* We've eaten all the data from this skb.
> > * Throw it away. */
> > TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags;
> > <<>> tcp_unlink_write_queue(skb, sk);
> > sk_wmem_free_skb(sk, skb);
> > } else {
> >
> >
> > Sorry if this was already discussed. We might add a comment here in anycase ;)
> >
>
> Just in case, here is a patch for this issue, if Tejun wants to try it.
>
> Thanks
>
> [PATCH] tcp: tcp_mtu_probe() and retransmit hints
>
> When removing an skb from tcp write queue, we must take care of various
> hints that could be kept on this skb. tcp_mtu_probe() misses this
> cleanup.
>
> lkml reference : http://lkml.org/lkml/2010/7/8/63
>
> Reported-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b4ed957..187453f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1666,6 +1666,9 @@ static int tcp_mtu_probe(struct sock *sk)
> * Throw it away. */
> TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags;
> tcp_unlink_write_queue(skb, sk);
> + tcp_clear_retrans_hints_partial(tp);
> + if (skb == tp->retransmit_skb_hint)
> + tp->retransmit_skb_hint = nskb;

...I think we've not sent that skb just yet, so this'll never be true (and
the rexmit loop terminates at send_head without setting it so we should
be safe, I'll still need to check that no other code can accidently move
it to the send_head but I doubt it happens).

> sk_wmem_free_skb(sk, skb);
> } else {
> TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags &
>
>
>

--
i.

2010-07-11 19:25:09

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Sun, 11 Jul 2010, Ilpo Järvinen wrote:

> On Sun, 11 Jul 2010, Eric Dumazet wrote:
>
> > Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit :
> > > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit :
> > > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> > > > > On Thu, 8 Jul 2010, Tejun Heo wrote:
> > > > >
> > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > > > > > Please see the attached photoshoot. This is happening on a HPC
> > > > > > cluster and very interestingly caused by one particular job. How long
> > > > > > it takes isn't clear yet (at least more than a day) but when it
> > > > > > happens it happens on a lot of machines in relatively short time.
> > > > > >
> > > > > > With a bit of disassemblying, I've found that the oops is happening
> > > > > > during tcp_for_write_queue_from() because the skb->next points to
> > > > > > NULL.
> > > > > >
> > > > > > void tcp_xmit_retransmit_queue(struct sock *sk)
> > > > > > {
> > > > > > ...
> > > > > > if (tp->retransmit_skb_hint) {
> > > > > > skb = tp->retransmit_skb_hint;
> > > > > > last_lost = TCP_SKB_CB(skb)->end_seq;
> > > > > > if (after(last_lost, tp->retransmit_high))
> > > > > > last_lost = tp->retransmit_high;
> > > > > > } else {
> > > > > > skb = tcp_write_queue_head(sk);
> > > > > > last_lost = tp->snd_una;
> > > > > > }
> > > > > >
> > > > > > => tcp_for_write_queue_from(skb, sk) {
> > > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> > > > > >
> > > > > > if (skb == tcp_send_head(sk))
> > > > > > break;
> > > > > > /* we could do better than to assign each time */
> > > > > > if (hole == NULL)
> > > > > >
> > > > > > This can happen for one of the following reasons,
> > > > > >
> > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > > > > > queue for some reason.
> > > > > >
> > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > > > > > write_queue. ie. somebody forgot to update hint while removing the
> > > > > > skb from the write queue.
> > > > >
> > > > > Once again I've read the unlinkers through, and only thing that could
> > > > > cause this is tcp_send_synack (others do deal with the hints) but I think
> > > > > Eric already proposed a patch to that but we never got anywhere due to
> > > > > some counterargument why it wouldn't take place (too far away for me to
> > > > > remember, see archives about the discussions). ...But if you want be dead
> > > > > sure some WARN_ON there might not hurt. Also the purging of the whole
> > > > > queue was a similar suspect I then came across (but that would only
> > > > > materialize with sk reuse happening e.g., with nfs which the other guys
> > > > > weren't using).
> > > > >
> > > >
> > > > Hmm.
> > > >
> > > > This sounds familiar to me, but I cannot remember the discussion you
> > > > mention or the patch.
> > > >
> > > > Or maybe it was the TCP transaction thing ? (including data in SYN or
> > > > SYN-ACK packet)
>
> No. That's another thing. ...I've already found it with google today but
> cannot seem to find it again. I thought I used tcp_make_synack eric but
> for some reason I only get these transaction fix hits. I'll keep looking.

Right, this one:

http://kerneltrap.org/mailarchive/linux-netdev/2009/10/29/6259073

--
i.

2010-07-11 19:44:40

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Sun, 11 Jul 2010, Ilpo Järvinen wrote:

> On Sun, 11 Jul 2010, Ilpo Järvinen wrote:
>
> > On Sun, 11 Jul 2010, Eric Dumazet wrote:
> >
> > > Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit :
> > > > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit :
> > > > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit :
> > > > > > On Thu, 8 Jul 2010, Tejun Heo wrote:
> > > > > >
> > > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15.
> > > > > > > Please see the attached photoshoot. This is happening on a HPC
> > > > > > > cluster and very interestingly caused by one particular job. How long
> > > > > > > it takes isn't clear yet (at least more than a day) but when it
> > > > > > > happens it happens on a lot of machines in relatively short time.
> > > > > > >
> > > > > > > With a bit of disassemblying, I've found that the oops is happening
> > > > > > > during tcp_for_write_queue_from() because the skb->next points to
> > > > > > > NULL.
> > > > > > >
> > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk)
> > > > > > > {
> > > > > > > ...
> > > > > > > if (tp->retransmit_skb_hint) {
> > > > > > > skb = tp->retransmit_skb_hint;
> > > > > > > last_lost = TCP_SKB_CB(skb)->end_seq;
> > > > > > > if (after(last_lost, tp->retransmit_high))
> > > > > > > last_lost = tp->retransmit_high;
> > > > > > > } else {
> > > > > > > skb = tcp_write_queue_head(sk);
> > > > > > > last_lost = tp->snd_una;
> > > > > > > }
> > > > > > >
> > > > > > > => tcp_for_write_queue_from(skb, sk) {
> > > > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked;
> > > > > > >
> > > > > > > if (skb == tcp_send_head(sk))
> > > > > > > break;
> > > > > > > /* we could do better than to assign each time */
> > > > > > > if (hole == NULL)
> > > > > > >
> > > > > > > This can happen for one of the following reasons,
> > > > > > >
> > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL
> > > > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write
> > > > > > > queue for some reason.
> > > > > > >
> > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the
> > > > > > > write_queue. ie. somebody forgot to update hint while removing the
> > > > > > > skb from the write queue.
> > > > > >
> > > > > > Once again I've read the unlinkers through, and only thing that could
> > > > > > cause this is tcp_send_synack (others do deal with the hints) but I think
> > > > > > Eric already proposed a patch to that but we never got anywhere due to
> > > > > > some counterargument why it wouldn't take place (too far away for me to
> > > > > > remember, see archives about the discussions). ...But if you want be dead
> > > > > > sure some WARN_ON there might not hurt. Also the purging of the whole
> > > > > > queue was a similar suspect I then came across (but that would only
> > > > > > materialize with sk reuse happening e.g., with nfs which the other guys
> > > > > > weren't using).
> > > > > >
> > > > >
> > > > > Hmm.
> > > > >
> > > > > This sounds familiar to me, but I cannot remember the discussion you
> > > > > mention or the patch.
> > > > >
> > > > > Or maybe it was the TCP transaction thing ? (including data in SYN or
> > > > > SYN-ACK packet)
> >
> > No. That's another thing. ...I've already found it with google today but
> > cannot seem to find it again. I thought I used tcp_make_synack eric but
> > for some reason I only get these transaction fix hits. I'll keep looking.
>
> Right, this one:
>
> http://kerneltrap.org/mailarchive/linux-netdev/2009/10/29/6259073

Hmm, another idea... It might be useful to try to disable
tcp_retrans_try_collapse in tcp_retransmit_skb as a test. I think that it
might be possible that tcp_xmit_retransmit_queue might end up holding
a stale reference either in hole or skb. Kind of shot into the dark still,
no actual theory on how that could happen but that
tcp_xmit_retransmit_queue logic is rather tricky because of returning to
the first hole if such exists so that I couldn't immediately rule out the
possibility.

--
i.

2010-07-15 12:05:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit :
> I'm testing new reordering algorithms in a virtual testbed, that is the
> nodes are emulated with xen and all the network parameters can be tuned
> with queues.
> With one of the algorithms I also got tracebacks which include
> tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel
> version however is 2.6.31.
> When I read this thread I tried the debug patch and got the following:
>
> [ 2754.413150] NULL head, pkts 0
> [ 2754.413156] Errors caught so far 1
>
> Hope that is of any help.

Not sure I understand.

Are you saying you reproduce same tcp_xmit_retransmit_queue() bug, with
a special tc qdisc/class droppping some ACK frames ?

Could it be some sched problem and incorrect return codes in case of
congestion ?

2010-07-15 12:08:17

by Lennart Schulte

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

I'm testing new reordering algorithms in a virtual testbed, that is the
nodes are emulated with xen and all the network parameters can be tuned
with queues.
With one of the algorithms I also got tracebacks which include
tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel
version however is 2.6.31.
When I read this thread I tried the debug patch and got the following:

[ 2754.413150] NULL head, pkts 0
[ 2754.413156] Errors caught so far 1

Hope that is of any help.

2010-07-15 12:55:30

by Lennart Schulte

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it
is the same bug.
Up to now I only experienced the problem with ACK loss (without ACK loss
the test ran about 30min without problems, with ACK loss it had paniced
within 10min).
The data sender only has a HTB queue for traffic shaping (set to 20
Mbit/s). The ACK loss is done by another router.
The setup looks like this. This way it seems to be the most realistic.

o sender with HTB
|
|
o netem queue for forward path delay
|
o netem queue for a queue limit
|
o netem queue for backward path delay
|
o netem queue for ACK loss
|
|
o receiver with HTB

Perhaps now it is a little big clearer.


On 15.07.2010 14:05, Eric Dumazet wrote:
> Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit :
>
>> I'm testing new reordering algorithms in a virtual testbed, that is the
>> nodes are emulated with xen and all the network parameters can be tuned
>> with queues.
>> With one of the algorithms I also got tracebacks which include
>> tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel
>> version however is 2.6.31.
>> When I read this thread I tried the debug patch and got the following:
>>
>> [ 2754.413150] NULL head, pkts 0
>> [ 2754.413156] Errors caught so far 1
>>
>> Hope that is of any help.
>>
> Not sure I understand.
>
> Are you saying you reproduce same tcp_xmit_retransmit_queue() bug, with
> a special tc qdisc/class droppping some ACK frames ?
>
> Could it be some sched problem and incorrect return codes in case of
> congestion ?
>

2010-07-16 12:02:51

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Thu, 15 Jul 2010, Lennart Schulte wrote:

> Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it is
> the same bug.
> Up to now I only experienced the problem with ACK loss (without ACK loss the
> test ran about 30min without problems, with ACK loss it had paniced within
> 10min).
> The data sender only has a HTB queue for traffic shaping (set to 20 Mbit/s).
> The ACK loss is done by another router.
> The setup looks like this. This way it seems to be the most realistic.
>
> o sender with HTB
> |
> |
> o netem queue for forward path delay
> |
> o netem queue for a queue limit
> |
> o netem queue for backward path delay
> |
> o netem queue for ACK loss
> |
> |
> o receiver with HTB
>
> Perhaps now it is a little big clearer.

> > > [ 2754.413150] NULL head, pkts 0
> > > [ 2754.413156] Errors caught so far 1

Thanks for reporting the results.

Could you post the oops too or double check do the timestamps really match
(and there wasn't more "Errors caught" prints in between)? Since this
condition doesn't seem to crash the kernel as also send_head should be
NULL, which saves the day here exiting the loop (unless send head would
too be corrupt). ...However, I don't like too much anyway that we can end
up into tcp_xmit_retransmit_queue loop with packets_out being zero and
only send_head check side-effect causes proper action.

Besides, Tejun has also found that it's hint->next ptr which is NULL in
his case so this won't solve his case anyway. Tejun, can you confirm
whether it was retransmit_skb_hint->next being NULL on _entry time_ to
tcp_xmit_retransmit_queue() or later on in the loop after the updates done
by the loop itself to the hint (or that your testing didn't conclude
either)?

--
i.

2010-07-16 12:25:46

by Lennart Schulte

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On 16.07.2010 14:02, Ilpo J?rvinen wrote:
>
>>>> [ 2754.413150] NULL head, pkts 0
>>>> [ 2754.413156] Errors caught so far 1
>>>>
> Thanks for reporting the results.
>
> Could you post the oops too or double check do the timestamps really match
> (and there wasn't more "Errors caught" prints in between)? Since this
> condition doesn't seem to crash the kernel as also send_head should be
> NULL, which saves the day here exiting the loop (unless send head would
> too be corrupt).
>
I can try to do some more testing, perhaps then I will get other
results. But until now I've always gotten something like above.

With the debug patch the kernel doesn't crash, but I have an oops from a
run before the patch:

[ 3214.498061] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 3214.498085] IP: [<c12657dc>] tcp_xmit_retransmit_queue+0x4c/0x2b0
[ 3214.498121] *pdpt = 00000002cf6fa001
[ 3214.498130] Thread overran stack, or stack corrupted
[ 3214.498138] Oops: 0000 [#1] SMP
[ 3214.498154] last sysfs file: /sys/kernel/uevent_seqnum
[ 3214.498161] Modules linked in: tcp_ancr tcp_ncr
[ 3214.498174]
[ 3214.498180] Pid: 0, comm: swapper Not tainted (2.6.31.9-pae-um-wolff
#79)
[ 3214.498188] EIP: 0061:[<c12657dc>] EFLAGS: 00010246 CPU: 0
[ 3214.498196] EIP is at tcp_xmit_retransmit_queue+0x4c/0x2b0
[ 3214.498203] EAX: c6da2900 EBX: c6da2880 ECX: 00000000 EDX: e50c512e
[ 3214.498211] ESI: 00000000 EDI: 0000051b EBP: c6da2900 ESP: c13d5cf0
[ 3214.498219] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[ 3214.498227] Process swapper (pid: 0, ti=c13d4000 task=c13e7a20
task.ti=c13d4000)
[ 3214.498236] Stack:
[ 3214.498240] c1005a0b 00000001 00000000 e50c512e c7804300 00000013
c6da2880 0000051b
[ 3214.498264] <0> e50c512e c1260709 c6cbf840 c6d42000 c1031826 c1288bbd
c6da2900 c6e09320
[ 3214.498290] <0> c6e09300 00000000 00000000 00000001 e50c512d e521a346
e50c512e 00000000
[ 3214.498318] Call Trace:
[ 3214.498329] [<c1005a0b>] ? xen_restore_fl_direct_end+0x0/0x1
[ 3214.498339] [<c1260709>] ? tcp_ack+0x7f9/0x10d0
[ 3214.498350] [<c1031826>] ? local_bh_enable+0x56/0x80
[ 3214.498359] [<c1288bbd>] ? ipt_do_table+0x2dd/0x590
[ 3214.498369] [<c1261eef>] ? tcp_rcv_state_process+0x41f/0x970
[ 3214.498378] [<c1267e7f>] ? tcp_v4_do_rcv+0x8f/0x1e0
[ 3214.498387] [<c1269ccd>] ? tcp_v4_rcv+0x68d/0x7d0
[ 3214.498397] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0
[ 3214.498406] [<c124d687>] ? ip_local_deliver_finish+0x97/0x1e0
[ 3214.498416] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0
[ 3214.498425] [<c124d3eb>] ? ip_rcv_finish+0x13b/0x340
[ 3214.498434] [<c124d2b0>] ? ip_rcv_finish+0x0/0x340
[ 3214.498442] [<c124d8c0>] ? ip_rcv+0x0/0x2e0
[ 3214.498452] [<c1211037>] ? netif_receive_skb+0x2f7/0x4c0
[ 3214.498468] [<c1213fe0>] ? process_backlog+0x70/0xb0
[ 3214.498476] [<c1213d98>] ? net_rx_action+0xe8/0x1a0
[ 3214.498486] [<c103116d>] ? __do_softirq+0x8d/0x120
[ 3214.498494] [<c100360d>] ? xen_mc_flush+0xed/0x1a0
[ 3214.498504] [<c1057891>] ? move_native_irq+0x11/0x50
[ 3214.498513] [<c1031238>] ? do_softirq+0x38/0x40
[ 3214.498523] [<c11930e2>] ? xen_evtchn_do_upcall+0x142/0x160
[ 3214.498534] [<c10087d7>] ? xen_do_upcall+0x7/0xc
[ 3214.498543] [<c10013a7>] ? hypercall_page+0x3a7/0x1010
[ 3214.498552] [<c100520f>] ? xen_safe_halt+0xf/0x20
[ 3214.498560] [<c100349c>] ? xen_idle+0x1c/0x30
[ 3214.498569] [<c1006bba>] ? cpu_idle+0x3a/0x60
[ 3214.498578] [<c141692a>] ? start_kernel+0x26a/0x300
[ 3214.498616] [<c1416280>] ? unknown_bootoption+0x0/0x1c0
[ 3214.498630] [<c141938e>] ? xen_start_kernel+0x3be/0x3e0
[ 3214.498637] Code: 00 00 8b b3 a0 03 00 00 85 f6 0f 84 53 02 00 00 8b
46 3c 8d ab 80 00 00 00 8b 93 04 04 00 00 39 c2 89 54 24 0c 0f 89 1c 02
00 00 <8b> 06 0f 18 00 90 39 ee 0f 84 30 01 00 00 39 b3 28 01 00 00 8d
[ 3214.498820] EIP: [<c12657dc>] tcp_xmit_retransmit_queue+0x4c/0x2b0
SS:ESP 0069:c13d5cf0
[ 3214.498836] CR2: 0000000000000000
[ 3214.498846] ---[ end trace 709a97adf87834a7 ]---
[ 3214.498852] Kernel panic - not syncing: Fatal exception in interrupt
[ 3214.498862] Pid: 0, comm: swapper Tainted: G D
2.6.31.9-pae-um-wolff #79
[ 3214.498870] Call Trace:
[ 3214.498878] [<c102c896>] ? panic+0x46/0x100
[ 3214.498904] [<c100b428>] ? oops_end+0x98/0xa0
[ 3214.498922] [<c1019eff>] ? no_context+0x11f/0x1b0
[ 3214.498930] [<c101a516>] ? do_page_fault+0x66/0x240
[ 3214.498939] [<c101a4b0>] ? do_page_fault+0x0/0x240
[ 3214.498947] [<c101a23f>] ? bad_area_nosemaphore+0xf/0x20
[ 3214.498955] [<c1308b7e>] ? error_code+0x66/0x6c
[ 3214.498963] [<c101a4b0>] ? do_page_fault+0x0/0x240
[ 3214.498972] [<c12657dc>] ? tcp_xmit_retransmit_queue+0x4c/0x2b0
[ 3214.498982] [<c1005a0b>] ? xen_restore_fl_direct_end+0x0/0x1
[ 3214.498991] [<c1260709>] ? tcp_ack+0x7f9/0x10d0
[ 3214.498999] [<c1031826>] ? local_bh_enable+0x56/0x80
[ 3214.499009] [<c1288bbd>] ? ipt_do_table+0x2dd/0x590
[ 3214.499017] [<c1261eef>] ? tcp_rcv_state_process+0x41f/0x970
[ 3214.499025] [<c1267e7f>] ? tcp_v4_do_rcv+0x8f/0x1e0
[ 3214.499034] [<c1269ccd>] ? tcp_v4_rcv+0x68d/0x7d0
[ 3214.499044] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0
[ 3214.499053] [<c124d687>] ? ip_local_deliver_finish+0x97/0x1e0
[ 3214.499063] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0
[ 3214.499072] [<c124d3eb>] ? ip_rcv_finish+0x13b/0x340
[ 3214.499079] [<c124d2b0>] ? ip_rcv_finish+0x0/0x340
[ 3214.499087] [<c124d8c0>] ? ip_rcv+0x0/0x2e0
[ 3214.499101] [<c1211037>] ? netif_receive_skb+0x2f7/0x4c0
[ 3214.499115] [<c1213fe0>] ? process_backlog+0x70/0xb0
[ 3214.499123] [<c1213d98>] ? net_rx_action+0xe8/0x1a0
[ 3214.499131] [<c103116d>] ? __do_softirq+0x8d/0x120
[ 3214.499143] [<c100360d>] ? xen_mc_flush+0xed/0x1a0
[ 3214.499152] [<c1057891>] ? move_native_irq+0x11/0x50
[ 3214.499160] [<c1031238>] ? do_softirq+0x38/0x40
[ 3214.499174] [<c11930e2>] ? xen_evtchn_do_upcall+0x142/0x160
[ 3214.499188] [<c10087d7>] ? xen_do_upcall+0x7/0xc
[ 3214.499195] [<c10013a7>] ? hypercall_page+0x3a7/0x1010
[ 3214.499203] [<c100520f>] ? xen_safe_halt+0xf/0x20
[ 3214.499214] [<c100349c>] ? xen_idle+0x1c/0x30
[ 3214.499223] [<c1006bba>] ? cpu_idle+0x3a/0x60
[ 3214.499231] [<c141692a>] ? start_kernel+0x26a/0x300
[ 3214.499239] [<c1416280>] ? unknown_bootoption+0x0/0x1c0
[ 3214.499247] [<c141938e>] ? xen_start_kernel+0x3be/0x3e0

2010-07-16 13:19:51

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Fri, 16 Jul 2010, Lennart Schulte wrote:

> On 16.07.2010 14:02, Ilpo J?rvinen wrote:
> >
> > > > > [ 2754.413150] NULL head, pkts 0
> > > > > [ 2754.413156] Errors caught so far 1
> > > > >
> > Thanks for reporting the results.
> >
> > Could you post the oops too or double check do the timestamps really match
> > (and there wasn't more "Errors caught" prints in between)? Since this
> > condition doesn't seem to crash the kernel as also send_head should be
> > NULL, which saves the day here exiting the loop (unless send head would
> > too be corrupt).

Doh, I think we'll deref skb already to get the sacked (wouldn't be
absolutely necessary but better to not trust side-effects) so it
certainly is bad even with the send_head exit.

> I can try to do some more testing, perhaps then I will get other results. But
> until now I've always gotten something like above.

It might then be useful to remove if (!caught_it) which was to prevent
infinite printout if the problem is such that it would have persisted
forever (now w/o the crash), but since there's no evidence of that.

> With the debug patch the kernel doesn't crash, but I have an oops from a run
> before the patch:

Right, no crash of course, stupid me :-).

Lets start with this (I'm not sure if this helps Tejun's case but
much doubt it does):

--
[PATCH] tcp: fix crash in tcp_xmit_retransmit_queue

It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.

There is no work to do if no packets are outstanding so we just
exit early.

There may still be another bug affecting this same function.

Signed-off-by: Ilpo J?rvinen <[email protected]>
Reported-by: Lennart Schulte <[email protected]>
---
net/ipv4/tcp_output.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..7ed9dc1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
int mib_idx;
int fwd_rexmitting = 0;

+ if (!tp->packets_out)
+ return;
+
if (!tp->lost_out)
tp->retransmit_high = tp->snd_una;

--
1.5.6.5

2010-07-19 08:06:13

by Lennart Schulte

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

I ran tests for about 2 hours with this patch and I got no output from
the debug patch. This seems to have solved at least my problem :)

Thanks!
> [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue
>
> It can happen that there are no packets in queue while calling
> tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> NULL and that gets deref'ed to get sacked into a local var.
>
> There is no work to do if no packets are outstanding so we just
> exit early.
>
> There may still be another bug affecting this same function.
>
> Signed-off-by: Ilpo J?rvinen<[email protected]>
> Reported-by: Lennart Schulte<[email protected]>
> ---
> net/ipv4/tcp_output.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b4ed957..7ed9dc1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> int mib_idx;
> int fwd_rexmitting = 0;
>
> + if (!tp->packets_out)
> + return;
> +
> if (!tp->lost_out)
> tp->retransmit_high = tp->snd_una;
>
>

2010-07-19 11:16:24

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

On Mon, 19 Jul 2010, Lennart Schulte wrote:

> I ran tests for about 2 hours with this patch and I got no output from the
> debug patch. This seems to have solved at least my problem :)
>
> Thanks!
> > [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue
> >
> > It can happen that there are no packets in queue while calling
> > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> > NULL and that gets deref'ed to get sacked into a local var.
> >
> > There is no work to do if no packets are outstanding so we just
> > exit early.
> >
> > There may still be another bug affecting this same function.

Thanks for testing.

DaveM, I think this oops was introduced for 2.6.28 (in
08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to
stables it should go too please. I've only tweaked the message (so no need
for Lennart to retest v2 :-)).

--
[PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.

There is no work to do if no packets are outstanding so we just
exit early.

This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
guard to make joining diff nicer).

Signed-off-by: Ilpo J?rvinen <[email protected]>
Reported-by: Lennart Schulte <[email protected]>
Tested-by: Lennart Schulte <[email protected]>
---
net/ipv4/tcp_output.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..7ed9dc1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
int mib_idx;
int fwd_rexmitting = 0;

+ if (!tp->packets_out)
+ return;
+
if (!tp->lost_out)
tp->retransmit_high = tp->snd_una;

--
1.5.6.5

2010-07-19 14:09:22

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit :

> Thanks for testing.
>
> DaveM, I think this oops was introduced for 2.6.28 (in
> 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to
> stables it should go too please. I've only tweaked the message (so no need
> for Lennart to retest v2 :-)).
>
> --
> [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
>
> It can happen that there are no packets in queue while calling
> tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> NULL and that gets deref'ed to get sacked into a local var.
>
> There is no work to do if no packets are outstanding so we just
> exit early.
>
> This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
> guard to make joining diff nicer).
>

But prior to commit 08ebd1721ab8fd3, we were not testing
tp->packets_out, but tp->lost_out

if it was 0, we were not doing the tcp_for_write_queue_from() loop.

Not sure it makes a difference ?

> Signed-off-by: Ilpo Järvinen <[email protected]>
> Reported-by: Lennart Schulte <[email protected]>
> Tested-by: Lennart Schulte <[email protected]>
> ---
> net/ipv4/tcp_output.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b4ed957..7ed9dc1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> int mib_idx;
> int fwd_rexmitting = 0;
>
> + if (!tp->packets_out)
> + return;
> +
> if (!tp->lost_out)
> tp->retransmit_high = tp->snd_una;
>

2010-07-19 14:58:00

by Tejun Heo

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

Hello,

On 07/16/2010 02:02 PM, Ilpo J?rvinen wrote:
> Besides, Tejun has also found that it's hint->next ptr which is NULL in
> his case so this won't solve his case anyway. Tejun, can you confirm
> whether it was retransmit_skb_hint->next being NULL on _entry time_ to
> tcp_xmit_retransmit_queue() or later on in the loop after the updates done
> by the loop itself to the hint (or that your testing didn't conclude
> either)?

Sorry about the delay. I was traveling last week. Unfortunately, I
don't know whether ->next was NULL on entry or not. I hacked up the
following ugly patch for the next test run. It should have everything
which has come up till now + list and hint sanity checking before
starting processing them. I'm planning on deploying it w/ crashdump
enabled in several days. If I've missed something, please let me
know.

Thanks.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..1c8b1e0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2190,6 +2190,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
return 1;
}

+static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ struct sk_buff *skb, *prev;
+ bool do_panic = false;
+
+ skb = tcp_write_queue_head(sk);
+ prev = (struct sk_buff *)(&sk->sk_write_queue);
+
+ if (skb == NULL) {
+ printk("XXX NULL head, pkts %u\n", tp->packets_out);
+ do_panic = true;
+ }
+
+ printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
+ tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
+ tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
+ tp->retransmit_high);
+
+ while (skb) {
+ printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n",
+ skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
+ skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
+ if (prev != skb->prev) {
+ printk("XXX Inconsistent prev\n");
+ do_panic = true;
+ }
+
+ if (skb == tcp_write_queue_tail(sk)) {
+ if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+ printk("XXX Improper next at tail\n");
+ do_panic = true;
+ }
+ break;
+ }
+
+ prev = skb;
+ skb = skb->next;
+ }
+ if (!skb) {
+ printk("XXX Encountered unexpected NULL\n");
+ do_panic = true;
+ }
+ if (do_panic)
+ panic("XXX panicking");
+}
+
/* This gets called after a retransmit timeout, and the initially
* retransmitted data is acknowledged. It tries to continue
* resending the rest of the retransmit queue, until either
@@ -2198,19 +2245,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
* based retransmit packet might feed us FACK information again.
* If so, we use it to avoid unnecessarily retransmissions.
*/
+static unsigned int caught_it;
+
void tcp_xmit_retransmit_queue(struct sock *sk)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
- struct sk_buff *skb;
+ struct sk_buff *skb, *prev;
struct sk_buff *hole = NULL;
+ struct sk_buff *old = tp->retransmit_skb_hint;
u32 last_lost;
int mib_idx;
int fwd_rexmitting = 0;
+ bool saw_hint = false;
+
+ if (!tp->packets_out) {
+ if (net_ratelimit())
+ printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n",
+ tp->retransmit_skb_hint, tcp_write_queue_head(sk));
+ return;
+ }

if (!tp->lost_out)
tp->retransmit_high = tp->snd_una;

+ for (skb = tcp_write_queue_head(sk),
+ prev = (struct sk_buff *)&sk->sk_write_queue;
+ skb != (struct sk_buff *)&sk->sk_write_queue;
+ prev = skb, skb = skb->next) {
+ if (prev != skb->prev) {
+ printk("XXX sanity check: prev corrupt\n");
+ print_queue(sk, old, hole);
+ }
+ if (skb == tp->retransmit_skb_hint)
+ saw_hint = true;
+ if (skb == tcp_write_queue_tail(sk) &&
+ skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+ printk("XXX sanity check: end corrupt\n");
+ print_queue(sk, old, hole);
+ }
+ }
+ if (tp->retransmit_skb_hint && !saw_hint) {
+ printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n",
+ tp->retransmit_skb_hint);
+ print_queue(sk, old, hole);
+ tp->retransmit_skb_hint = NULL;
+ }
+
if (tp->retransmit_skb_hint) {
skb = tp->retransmit_skb_hint;
last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2218,7 +2299,17 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
last_lost = tp->retransmit_high;
} else {
skb = tcp_write_queue_head(sk);
- last_lost = tp->snd_una;
+ if (skb)
+ last_lost = tp->snd_una;
+ }
+
+checknull:
+ if (skb == NULL) {
+ print_queue(sk, old, hole);
+ caught_it++;
+ if (net_ratelimit())
+ printk("XXX Errors caught so far %u\n", caught_it);
+ return;
}

tcp_for_write_queue_from(skb, sk) {
@@ -2261,7 +2352,7 @@ begin_fwd:
} else if (!(sacked & TCPCB_LOST)) {
if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
hole = skb;
- continue;
+ goto cont;

} else {
last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2272,7 +2363,7 @@ begin_fwd:
}

if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
- continue;
+ goto cont;

if (tcp_retransmit_skb(sk, skb))
return;
@@ -2282,6 +2373,9 @@ begin_fwd:
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
inet_csk(sk)->icsk_rto,
TCP_RTO_MAX);
+cont:
+ skb = skb->next;
+ goto checknull;
}
}

--
tejun

2010-07-19 17:25:10

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

On Mon, 19 Jul 2010, Eric Dumazet wrote:

> Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit :
>
> > Thanks for testing.
> >
> > DaveM, I think this oops was introduced for 2.6.28 (in
> > 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to
> > stables it should go too please. I've only tweaked the message (so no need
> > for Lennart to retest v2 :-)).
> >
> > --
> > [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
> >
> > It can happen that there are no packets in queue while calling
> > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> > NULL and that gets deref'ed to get sacked into a local var.
> >
> > There is no work to do if no packets are outstanding so we just
> > exit early.
> >
> > This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
> > guard to make joining diff nicer).
> >
>
> But prior to commit 08ebd1721ab8fd3, we were not testing
> tp->packets_out, but tp->lost_out

That's right, but back then we were not testing it for the same purpose.

> if it was 0, we were not doing the tcp_for_write_queue_from() loop.

This invariant _should_ be true all the time:
lost_out <= packets_out

...and if it's not we would get Leak printouts every now and then. Thus is
packets_out is zero no NULL defer with the if lost_out either. The other
loop too (in pre 08eb kernels) will work because of earlier mentioned
send_head check side-effects.

> Not sure it makes a difference ?

This difference is well thought and intentional, I didn't use different
one by accident. We want to make sure we won't use NULL from
tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was
interested mainly whether the first loop should run or not (and of course
ends up avoid the null deref too but it's more optimization like
thing in there, ie., if there's no lost packets no work to-do). The deref
could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but
that would again make us depend on the side-effect of the send_head check
(in the case of packets_out being zero and wq empty) which is something I
don't like too much.

--
i.

2010-07-19 17:39:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

Le lundi 19 juillet 2010 à 20:25 +0300, Ilpo Järvinen a écrit :

> This difference is well thought and intentional, I didn't use different
> one by accident. We want to make sure we won't use NULL from
> tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was
> interested mainly whether the first loop should run or not (and of course
> ends up avoid the null deref too but it's more optimization like
> thing in there, ie., if there's no lost packets no work to-do). The deref
> could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but
> that would again make us depend on the side-effect of the send_head check
> (in the case of packets_out being zero and wq empty) which is something I
> don't like too much.
>

Thanks Ilpo.

Do you know in what exact circumstance the bug triggers ?

It's hard to believe thousand of machines on the Internet never hit
it :(

Maybe another problem in congestion control ?

2010-07-19 19:54:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

From: Eric Dumazet <[email protected]>
Date: Mon, 19 Jul 2010 19:39:08 +0200

> Do you know in what exact circumstance the bug triggers ?
>
> It's hard to believe thousand of machines on the Internet never hit
> it :(
>
> Maybe another problem in congestion control ?

This is something to investigate, but the conditions under which
tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue())
does it's thing are complicated enough that I'm going to add this fix
for the time being and push it out to stable too.

Thanks everyone.

2010-07-20 08:33:42

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue

On Mon, 19 Jul 2010, David Miller wrote:

> From: Eric Dumazet <[email protected]>
> Date: Mon, 19 Jul 2010 19:39:08 +0200
>
> > Do you know in what exact circumstance the bug triggers ?
> >
> > It's hard to believe thousand of machines on the Internet never hit
> > it :(
> >
> > Maybe another problem in congestion control ?
>
> This is something to investigate, but the conditions under which
> tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue())
> does it's thing are complicated enough that I'm going to add this fix
> for the time being and push it out to stable too.

This is so true. ...So far I've managed to twice rule out of the
possibility of this being really triggerable (ie., it would mean
Lennart's out of tree changes broke it), and once in the middle came
into opposite conclusion. Thus by majority voting we can deduce that it
won't happen - how reassuring :-/. It seems that tcp_try_undo_recovery
causes return if TCP remained in CA_Loss/CA_Recovery and that
tcp_time_to_recover won't really let past return either under normal
circumstances (more details below), and tcp_simple_retransmit
requires lost_out to change; seems safe in mainline to me.

Hmm... It seems that I've just solved another report too. ...Somebody a
while back found out that setting reordering sysctl to zero (ie. to a
value which does not make too much sense) crashed the kernel. It seems
that at least then tcp_time_to_recover() would return true and trigger
this bug (though I'm not sure if that's the only breakage to happen).

Also worth to keep in mind is the bugzilla entry ("New freez in
TCP" or something like that) so I'm not really sure I could say for sure
nobody never hit it. The bugzilla one goes away by disable SACK (at least
for some) but it might mix two different issues. It seems that there
really are two different issues, the other may have something to do with
SACK though there are other variables then involved, e.g., the changes in
retransmission logic/timing, so it's impossible to say if the SACK disable
really "fixed" the bugzilla one or not. Also Tejun's ->next == NULL
finding points out to a different bug than this Lennart's one.


--
i.

2010-07-20 08:41:49

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Mon, 19 Jul 2010, Tejun Heo wrote:

> Hello,
>
> On 07/16/2010 02:02 PM, Ilpo J?rvinen wrote:
> > Besides, Tejun has also found that it's hint->next ptr which is NULL in
> > his case so this won't solve his case anyway. Tejun, can you confirm
> > whether it was retransmit_skb_hint->next being NULL on _entry time_ to
> > tcp_xmit_retransmit_queue() or later on in the loop after the updates done
> > by the loop itself to the hint (or that your testing didn't conclude
> > either)?
>
> Sorry about the delay. I was traveling last week. Unfortunately, I
> don't know whether ->next was NULL on entry or not. I hacked up the
> following ugly patch for the next test run. It should have everything
> which has come up till now + list and hint sanity checking before
> starting processing them. I'm planning on deploying it w/ crashdump
> enabled in several days. If I've missed something, please let me
> know.

One thing that complicates things further is the fact that the write queue
can change beneath us (ie., in tcp_retrans_try_collapse and tcp_fragment).
I've read them multiple times through and always found them innocent so
this might be just for-the-record type of a note (at least I hope so).

--
i.

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b4ed957..1c8b1e0 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2190,6 +2190,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
> return 1;
> }
>
> +static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
> + struct sk_buff *skb, *prev;
> + bool do_panic = false;
> +
> + skb = tcp_write_queue_head(sk);
> + prev = (struct sk_buff *)(&sk->sk_write_queue);
> +
> + if (skb == NULL) {
> + printk("XXX NULL head, pkts %u\n", tp->packets_out);
> + do_panic = true;
> + }
> +
> + printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
> + tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
> + tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
> + tp->retransmit_high);
> +
> + while (skb) {
> + printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n",
> + skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
> + skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
> + if (prev != skb->prev) {
> + printk("XXX Inconsistent prev\n");
> + do_panic = true;
> + }
> +
> + if (skb == tcp_write_queue_tail(sk)) {
> + if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
> + printk("XXX Improper next at tail\n");
> + do_panic = true;
> + }
> + break;
> + }
> +
> + prev = skb;
> + skb = skb->next;
> + }
> + if (!skb) {
> + printk("XXX Encountered unexpected NULL\n");
> + do_panic = true;
> + }
> + if (do_panic)
> + panic("XXX panicking");
> +}
> +
> /* This gets called after a retransmit timeout, and the initially
> * retransmitted data is acknowledged. It tries to continue
> * resending the rest of the retransmit queue, until either
> @@ -2198,19 +2245,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
> * based retransmit packet might feed us FACK information again.
> * If so, we use it to avoid unnecessarily retransmissions.
> */
> +static unsigned int caught_it;
> +
> void tcp_xmit_retransmit_queue(struct sock *sk)
> {
> const struct inet_connection_sock *icsk = inet_csk(sk);
> struct tcp_sock *tp = tcp_sk(sk);
> - struct sk_buff *skb;
> + struct sk_buff *skb, *prev;
> struct sk_buff *hole = NULL;
> + struct sk_buff *old = tp->retransmit_skb_hint;
> u32 last_lost;
> int mib_idx;
> int fwd_rexmitting = 0;
> + bool saw_hint = false;
> +
> + if (!tp->packets_out) {
> + if (net_ratelimit())
> + printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n",
> + tp->retransmit_skb_hint, tcp_write_queue_head(sk));
> + return;
> + }
>
> if (!tp->lost_out)
> tp->retransmit_high = tp->snd_una;
>
> + for (skb = tcp_write_queue_head(sk),
> + prev = (struct sk_buff *)&sk->sk_write_queue;
> + skb != (struct sk_buff *)&sk->sk_write_queue;
> + prev = skb, skb = skb->next) {
> + if (prev != skb->prev) {
> + printk("XXX sanity check: prev corrupt\n");
> + print_queue(sk, old, hole);
> + }
> + if (skb == tp->retransmit_skb_hint)
> + saw_hint = true;
> + if (skb == tcp_write_queue_tail(sk) &&
> + skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
> + printk("XXX sanity check: end corrupt\n");
> + print_queue(sk, old, hole);
> + }
> + }
> + if (tp->retransmit_skb_hint && !saw_hint) {
> + printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n",
> + tp->retransmit_skb_hint);
> + print_queue(sk, old, hole);
> + tp->retransmit_skb_hint = NULL;
> + }
> +
> if (tp->retransmit_skb_hint) {
> skb = tp->retransmit_skb_hint;
> last_lost = TCP_SKB_CB(skb)->end_seq;
> @@ -2218,7 +2299,17 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> last_lost = tp->retransmit_high;
> } else {
> skb = tcp_write_queue_head(sk);
> - last_lost = tp->snd_una;
> + if (skb)
> + last_lost = tp->snd_una;
> + }
> +
> +checknull:
> + if (skb == NULL) {
> + print_queue(sk, old, hole);
> + caught_it++;
> + if (net_ratelimit())
> + printk("XXX Errors caught so far %u\n", caught_it);
> + return;
> }
>
> tcp_for_write_queue_from(skb, sk) {
> @@ -2261,7 +2352,7 @@ begin_fwd:
> } else if (!(sacked & TCPCB_LOST)) {
> if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
> hole = skb;
> - continue;
> + goto cont;
>
> } else {
> last_lost = TCP_SKB_CB(skb)->end_seq;
> @@ -2272,7 +2363,7 @@ begin_fwd:
> }
>
> if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
> - continue;
> + goto cont;
>
> if (tcp_retransmit_skb(sk, skb))
> return;
> @@ -2282,6 +2373,9 @@ begin_fwd:
> inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
> inet_csk(sk)->icsk_rto,
> TCP_RTO_MAX);
> +cont:
> + skb = skb->next;
> + goto checknull;
> }
> }
>
>