Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754335Ab1BQWtu (ORCPT ); Thu, 17 Feb 2011 17:49:50 -0500 Received: from usmamail.tilera.com ([206.83.70.70]:57868 "EHLO USMAMAIL.TILERA.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752643Ab1BQWtt (ORCPT ); Thu, 17 Feb 2011 17:49:49 -0500 Message-ID: <4D5DA60A.8080201@tilera.com> Date: Thu, 17 Feb 2011 17:49:46 -0500 From: Chris Metcalf User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7 MIME-Version: 1.0 To: David Miller CC: , , , , Subject: Re: IGMP and rwlock: Dead ocurred again on TILEPro References: <20110217044917.GA2653@cr0.nay.redhat.com> <20110217054237.GB2653@cr0.nay.redhat.com> <20110216.214625.189707123.davem@davemloft.net> In-Reply-To: <20110216.214625.189707123.davem@davemloft.net> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2261 Lines: 56 On 2/17/2011 12:46 AM, David Miller wrote: > From: Am?rico Wang > Date: Thu, 17 Feb 2011 13:42:37 +0800 > >> On Thu, Feb 17, 2011 at 01:04:14PM +0800, Cypher Wu wrote: >>>> Have you turned CONFIG_LOCKDEP on? >>>> >>>> I think Eric already converted that rwlock into RCU lock, thus >>>> this problem should disappear. Could you try a new kernel? >>>> >>>> Thanks. >>>> >>> I haven't turned CONFIG_LOCKDEP on for test since I didn't get too >>> much information when we tried to figured out the former deadlock. >>> >>> IGMP used read_lock() instead of read_lock_bh() since usually >>> read_lock() can be called recursively, and today I've read the >>> implementation of MIPS, it's should also works fine in that situation. >>> The implementation of TILEPro cause problem since after it use TNS set >>> the lock-val to 1 and hold the original value and before it re-set >>> lock-val a new value, it a race condition window. >>> >> I see no reason why you can't call read_lock_bh() recursively, >> read_lock_bh() is roughly equalent to local_bh_disable() + read_lock(), >> both can be recursive. >> >> But I may miss something here. :-/ > IGMP is doing this so that taking the read lock does not stop packet > processing. > > TILEPro's rwlock implementation is simply buggy and needs to be fixed. Cypher, thanks for tracking this down with a good bug report. The fix is to disable interrupts for the arch_read_lock family of methods. In my fix I'm using the "hard" disable that locks out NMIs as well, so that in the event the NMI handler needs to share an rwlock with regular code it would be possible (plus, it's more efficient). I believe it's not necessary to worry about similar protection for the arch_write_lock methods, since they aren't guaranteed to be re-entrant anyway (you'd have to use write_lock_irqsave or equivalent). I'll send the patch to LKML after letting it bake internally for a little while. Thanks again! -- Chris Metcalf, Tilera Corp. http://www.tilera.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/