Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933846Ab3CHHJa (ORCPT ); Fri, 8 Mar 2013 02:09:30 -0500 Received: from rydia.net ([69.46.88.68]:54291 "EHLO mail.rydia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933292Ab3CHHJ2 (ORCPT ); Fri, 8 Mar 2013 02:09:28 -0500 Date: Thu, 7 Mar 2013 23:09:26 -0800 (PST) From: dormando X-X-Sender: dormando@dflat To: Eric Dumazet cc: Cong Wang , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: BUG: IPv4: Attempt to release TCP socket in state 1 In-Reply-To: <1362663990.15793.208.camel@edumazet-glaptop> Message-ID: References: <51356AC1.4090302@gmail.com> <1362460046.15793.111.camel@edumazet-glaptop> <1362494795.15793.113.camel@edumazet-glaptop> <1362663990.15793.208.camel@edumazet-glaptop> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2395 Lines: 56 > On Wed, 2013-03-06 at 16:41 -0800, dormando wrote: > > > Ok... bridge module is loaded but nothing seems to be using it. No > > bond/tunnels/anything enabled. I couldn't quickly figure out what was > > causing it to load. > > > > We removed the need for macvlan, started machines with a fresh boot, and > > they still crashed without it, after a few hours. > > > > Unfortunately I just saw a machine crash in the same way on 3.6.6 and > > 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9 > > tested. Our patches are minor but there were a few, so I'm backing it all > > out just to be sure. > > > > Is there anything in particular which is most interesting? I can post lots > > and lots and lots of information. Sadly bridge/macvlan weren't part of the > > problem. .config, sysctls are easiest I guess? When this "hang" happens > > the machine is still up somewhat, but we lose access to it. Syslog is > > still writing entries to disk occasionally, so it's possible we could set > > something up to dump more information. > > > > It takes a day or two to cycle this, so it might take a while to get > > information and test crashes. > > Thanks ! > > Please add a stack trace, it might help : > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 68f6a94..1d4d97e 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk) > sk_mem_reclaim(sk); > > if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) { > - pr_err("Attempt to release TCP socket in state %d %p\n", > - sk->sk_state, sk); > + pr_err("Attempt to release TCP socket family %d in state %d %p\n", > + sk->sk_family, sk->sk_state, sk); > + WARN_ON_ONCE(1); > return; > } > if (!sock_flag(sk, SOCK_DEAD)) { Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like we've been having this crash for quite a while, but much more rarely. Recent changes in traffic have made it worse. I'll try your patch soon. It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping with any ideas you folks might have in the meantime :( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/