Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752824Ab2KGPza (ORCPT ); Wed, 7 Nov 2012 10:55:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:14222 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751616Ab2KGPz2 (ORCPT ); Wed, 7 Nov 2012 10:55:28 -0500 Date: Wed, 7 Nov 2012 10:54:34 -0500 From: Dave Jones To: Julius Werner Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Patrick McHardy , Hideaki YOSHIFUJI , James Morris , Alexey Kuznetsov , "David S. Miller" , Sameer Nanda , Mandeep Singh Baines , Eric Dumazet Subject: Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Message-ID: <20121107155434.GA17677@redhat.com> Mail-Followup-To: Dave Jones , Julius Werner , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Patrick McHardy , Hideaki YOSHIFUJI , James Morris , Alexey Kuznetsov , "David S. Miller" , Sameer Nanda , Mandeep Singh Baines , Eric Dumazet References: <1352247335-10396-1-git-send-email-jwerner@chromium.org> <20121107013907.GA31185@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1832 Lines: 39 On Tue, Nov 06, 2012 at 05:51:19PM -0800, Julius Werner wrote: > > We've had reports of this WARN against the Fedora kernel for a while. > > Had this been immediately followed by a BUG(), we'd have never seen those traces at all, > > and just got "my machine just locked up" reports instead. > > > > The proper fix here is to find out why we're getting into this state. > > Are you sure you don't mean the WARN below that ("recvmsg bug 2") > instead? I don't think this one can happen without eventually running > into the syslog overflow issue I described. bug2 is more common (And usually is accompanied by mangled traces), but we have reports of the first WARN too.. https://bugzilla.redhat.com/show_bug.cgi?id=841769 https://bugzilla.redhat.com/show_bug.cgi?id=845853 https://bugzilla.redhat.com/show_bug.cgi?id=846991 https://bugzilla.redhat.com/show_bug.cgi?id=860039 (I note that none of these reports mention "also, my hard disk is now full") > I agree that the underlying cause must be fixed too, but as we will > always have bugs in the kernel I think proper handling when it does > happen is also important (and filling the hard disk with junk is > obviously not the best approach). If you think a full panic is too > extreme, I have an alternative version of this patch that logs the > WARN once, closes the socket, and returns EBADFD from the syscall... > would you think that is more appropriate? It sounds more appropriate to me, instead of silently wedging the box. At least with that approach we have a chance of finding out what happened. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/