Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758208AbYBYUIv (ORCPT ); Mon, 25 Feb 2008 15:08:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756395AbYBYUIn (ORCPT ); Mon, 25 Feb 2008 15:08:43 -0500 Received: from srv5.dvmed.net ([207.36.208.214]:59312 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756304AbYBYUIm (ORCPT ); Mon, 25 Feb 2008 15:08:42 -0500 Message-ID: <47C32043.6090809@pobox.com> Date: Mon, 25 Feb 2008 15:08:35 -0500 From: Jeff Garzik User-Agent: Thunderbird 2.0.0.9 (X11/20071115) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Bj=F6rn_Steinbrink?= CC: Thomas Gleixner , LKML Subject: Re: 2.6.24-git: kmap_atomic() WARN_ON() References: <20080225195924.GA23176@atjola.homenet> In-Reply-To: <20080225195924.GA23176@atjola.homenet> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -4.4 (----) X-Spam-Report: SpamAssassin version 3.2.3 on srv5.dvmed.net summary: Content analysis details: (-4.4 points, 5.0 required) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3804 Lines: 100 Bj?rn Steinbrink wrote: > On 2008.02.07 00:58:42 +0100, Thomas Gleixner wrote: >> current mainline triggers: >> >> WARNING: at /home/tglx/work/kernel/x86/linux-2.6/arch/x86/mm/highmem_32.c:52 kmap_atomic_prot+0xe5/0x19b() >> Modules linked in: ahci(+) sata_sil libata sd_mod scsi_mod raid1 ext3 jbd ehci_hcd ohci_hcd uhci_hcd >> Pid: 0, comm: swapper Not tainted 2.6.24 #173 >> [] warn_on_slowpath+0x41/0x51 >> [] ? __enqueue_entity+0x9c/0xa4 >> [] ? enqueue_entity+0x124/0x13b >> [] ? enqueue_task_fair+0x41/0x4c >> [] ? _spin_lock_irqsave+0x14/0x2e >> [] ? lock_timer_base+0x1f/0x3e >> [] kmap_atomic_prot+0xe5/0x19b >> [] kmap_atomic+0x14/0x16 >> [] ata_scsi_rbuf_get+0x1e/0x2c [libata] >> [] atapi_qc_complete+0x23f/0x289 [libata] >> [] __ata_qc_complete+0x8e/0x93 [libata] >> [] ata_qc_complete+0x115/0x128 [libata] >> [] ata_qc_complete_multiple+0x86/0xa0 [libata] >> [] ahci_interrupt+0x370/0x40d [ahci] >> [] handle_IRQ_event+0x21/0x48 >> [] handle_edge_irq+0xc9/0x10a >> [] ? handle_edge_irq+0x0/0x10a >> [] do_IRQ+0x8b/0xb7 >> [] common_interrupt+0x23/0x28 >> [] ? init_chipset_cmd64x+0xb/0x93 >> [] ? mwait_idle_with_hints+0x39/0x3d >> [] ? mwait_idle+0x0/0xf >> [] mwait_idle+0xd/0xf >> [] cpu_idle+0xb0/0xe4 >> [] rest_init+0x5d/0x5f >> >> This is not a new problem. It was pointed out some time ago already, >> but now the WARN_ON() finally made it into mainline :) >> >> The fix is not obvious, as this code seems to be called from various >> call sites. > > Hm, do you have lockdep enabled? If not, does lockdep make this go away? > Because lockdep will set IRQF_DISABLED for all interrupt handlers, and > unless that flag is set, handle_IRQ_event will reenable interrupts while > the handler is running. And ahci_interrupt only uses a plain spin_lock, > so interrupts keep being enabled. The patch below should help with that. > > Hmhm, maybe that also solves the deadlock you saw? Dunno... > > I can't come up with an useful commit message right now, but I'll resend > in suitable form for submission if that thing actually works. > > Bj?rn > > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c > index 1db93b6..ae3dbc8 100644 > --- a/drivers/ata/ahci.c > +++ b/drivers/ata/ahci.c > @@ -1739,6 +1739,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) > unsigned int i, handled = 0; > void __iomem *mmio; > u32 irq_stat, irq_ack = 0; > + unsigned long flags; > > VPRINTK("ENTER\n"); > > @@ -1751,7 +1752,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) > if (!irq_stat) > return IRQ_NONE; > > - spin_lock(&host->lock); > + spin_lock_irqsave(&host->lock, flags); > > for (i = 0; i < host->n_ports; i++) { > struct ata_port *ap; > @@ -1778,7 +1779,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance) > handled = 1; > } > > - spin_unlock(&host->lock); > + spin_unlock_irqrestore(&host->lock, flags); If this truly fixes the problem, then lockdep is definitely the problem source. There are plenty of drivers that do the same thing that ahci does, in terms of interrupt handler locking... and I will definitely push back on efforts to convert otherwise-100%-safe spin_lock() into spin_lock_irqsave() just to quiet lockdep. Very interesting email, thanks... Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/