Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756788Ab0BBT45 (ORCPT ); Tue, 2 Feb 2010 14:56:57 -0500 Received: from mail-fx0-f215.google.com ([209.85.220.215]:49841 "EHLO mail-fx0-f215.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755955Ab0BBT4z convert rfc822-to-8bit (ORCPT ); Tue, 2 Feb 2010 14:56:55 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Rb0CxI9gc/qUZ3svWlWS6bUjKXmy8UimZr9g2kvkqr5KRfARESJXDy6HGPP6C8NFFM teYqDdHTa6gm+Bh8AjrxPmIzXclFjm1aj3Lzfp3KQ0RA7QnjsWSuxQPkqDjv6PiUj6+s fsqtW7Qe0+c90LIUU2CfCXg01c3s3UNhalwXM= MIME-Version: 1.0 In-Reply-To: <1265136022.2793.33.camel@sbs-t61.sc.intel.com> References: <64bb37e1001310502p3d74bdf5ve56f63d3e8d2fd39@mail.gmail.com> <4B679042.2010008@kernel.org> <1265136022.2793.33.camel@sbs-t61.sc.intel.com> Date: Tue, 2 Feb 2010 20:56:53 +0100 Message-ID: <64bb37e1002021156s6e8e3ba7p6192e15bc431eb87@mail.gmail.com> Subject: Re: do_IRQ: 0.165 No irq handler for vector (irq -1) From: Torsten Kaiser To: Suresh Siddha Cc: "Eric W. Biederman" , Tejun Heo , "linux-kernel@vger.kernel.org" , Robert Hancock , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Yinghai Lu Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7508 Lines: 179 On Tue, Feb 2, 2010 at 7:40 PM, Suresh Siddha wrote: > On Mon, 2010-02-01 at 20:53 -0800, Eric W. Biederman wrote: >> > It might be that the silicon implements MSI incorrectly and ends up >> > sending out invalid MSI packets under certain circumstances. ?The >> > silicon hasn't changed for quite some time now and back when it came >> > out MSI wasn't too popular and I don't think SIMG's proprietary >> > drivers use it, so it's quite possible that the feature simply is >> > broken. ?Is there any specific reason why you want to enable MSI >> > support? ?It's not like MSI brings any actual benefit when the >> > compatibility hardware is already there. 19: 34618 3 2 4862 IO-APIC-fasteoi sata_sil24, bttv0, Bt87x audio [ 6.038918] IRQ 19/bttv0: IRQF_DISABLED is not guaranteed on shared IRQs The interrupt that the sata_sil24 is currently using is shared, so I thought that switching this to MSI might be a good idea. And I wanted to test a new feature. ;-) >> It also seems possible that some of the recent irq handling changes >> missed something. > > No Eric. This particular report is with 2.6.33-rc kernels and also only > when MSI support for sata_sil24 is enabled. Recent irq handling changes > are all in -tip tree and getting tested. So this sounds like a different > problem specific to this HW's MSI capabilities. Just to repeat this so not get this information lost: MSI seems to work an this system. The drivers radeon (X300), HDA intel (onboard sound from the MCP55 chipset) and tg3 (two BCM5754) all work without any problems. >> Usually the message "No irq handler for vector (irq -1)" means that the irq >> was delivered to a cpu that was not ready for it. ?I see that vector 165 >> is being delivered on all of the different cpus with vector 165, >> and that you are getting interrupts delivered most of the time. > > Also I see this in the original report: > > On Sun, 2010-01-31 at 05:02 -0800, Torsten Kaiser wrote: >> What is really strange: The vector 165 is stable. It never changed >> even if I deactivate all other drivers in the kernel config (that >> changes the MSI IRQ for sata_sil24 from 29 to 28!) or if I switch off >> CONFIG_SPARSE_IRQ. In the kernel with the reduced number of drivers >> the maximum vector that gets used in __assign_irq_vector is only 137. > > It looks like the HW under certain conditions is generating interrupts > with wrong vector (165), especially when the __assign_irq_vector() never > allocated the vector 165 (and hence we never setup the vector to irq > mapping for this vector on any cpu). Torsten, can you please apply the > appended patch and boot with "apic_phys" boot parameter and see if it > makes any difference? I tried the patch and the message from do_IRQ is gone, but reading the file still fails with the same errors from libata. (Earlier tests with writing a large file to this disk also failed with timeouts, but never trigger the do_IRQ error) I added a diff between the dmesg from the testrun with your patch to the previous run at the end of the mail. >> This smells like the initialization problems I was seeing in another >> thread. ?Suresh? > > No. Initialization problems in another thread happens in a small window > during cpu online (in logical flat mode, we are setting up vector to irq > mappings for the AP a little late after we have enabled interrupts). > Here the problem is not actually triggered during cpu on-lining. FWIW: # CONFIG_HOTPLUG_CPU is not set I don't use suspend/resume on that system, so I never enabled CPU hotplug in the .config. Thanks for looking at this. Torsten The changes in dmesg from you patch: 1,2c1,2 < x Linux version 2.6.33-rc6 (root@treogen) (gcc version 4.4.2 (Gentoo 4.4.2 p1.0) ) #1 SMP Sat Jan 30 10:38:39 CET 2010 < x Command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug --- > x Linux version 2.6.33-rc6 (root@treogen) (gcc version 4.4.2 (Gentoo 4.4.2 p1.0) ) #2 SMP Tue Feb 2 20:22:21 CET 2010 > x Command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug apic_phys 61a62 > x Setting APIC routing to physical flat. 130a132 > x Setting APIC routing to physical flat. 159c161 < x Kernel command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug --- > x Kernel command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug apic_phys 163,164c165,166 < x Node 0: aperture @ a7f2000000 size 32 MB < x Aperture beyond 4GB. Ignoring. --- > x Node 0: aperture @ 20000000 size 32 MB > x Aperture pointing to e820 RAM. Ignoring. 202c204 < x Setting APIC routing to flat --- > x Setting APIC routing to physical flat 234,235c236,237 < x ... lapic delta = 1249998 < x ... PM-Timer delta = 357954 --- > x ... lapic delta = 1249989 > x ... PM-Timer delta = 357951 237,241c239,243 < x ..... delta 1249998 < x ..... mult: 53687005 < x ..... calibration result: 1999996 < x ..... CPU clock speed is 2599.9959 MHz. < x ..... host bus clock speed is 199.9996 MHz. --- > x ..... delta 1249989 > x ..... mult: 53686618 > x ..... calibration result: 1999982 > x ..... CPU clock speed is 2599.9751 MHz. > x ..... host bus clock speed is 199.9982 MHz. 248c250 < x Total of 4 processors activated (20800.14 BogoMIPS). --- > x Total of 4 processors activated (20799.96 BogoMIPS). 430,431c432,433 < x ... APIC ICR: 000008fd < x ... APIC ICR2: 08000000 --- > x ... APIC ICR: 000000fd > x ... APIC ICR2: 03000000 437,438c439,440 < x ... APIC TMICT: 0001e847 < x ... APIC TMCCT: 000174b3 --- > x ... APIC TMICT: 0001e846 > x ... APIC TMCCT: 000185ee 462,476c464,478 < x 01 00F 0 0 0 0 0 1 1 31 < x 02 00F 0 0 0 0 0 1 1 30 < x 03 00F 0 0 0 0 0 1 1 33 < x 04 00F 0 0 0 0 0 1 1 34 < x 05 00F 1 0 0 0 0 1 1 35 < x 06 00F 0 0 0 0 0 1 1 36 < x 07 00F 0 0 0 0 0 1 1 37 < x 08 00F 0 0 0 0 0 1 1 38 < x 09 00F 0 1 0 0 0 1 1 39 < x 0a 00F 1 0 0 0 0 1 1 3A < x 0b 00F 1 0 0 0 0 1 1 3B < x 0c 00F 0 0 0 0 0 1 1 3C < x 0d 00F 0 0 0 0 0 1 1 3D < x 0e 00F 0 0 0 0 0 1 1 3E < x 0f 00F 0 0 0 0 0 1 1 3F --- > x 01 000 0 0 0 0 0 0 0 31 > x 02 000 0 0 0 0 0 0 0 30 > x 03 000 0 0 0 0 0 0 0 33 > x 04 000 0 0 0 0 0 0 0 34 > x 05 000 1 0 0 0 0 0 0 35 > x 06 000 0 0 0 0 0 0 0 36 > x 07 000 0 0 0 0 0 0 0 37 > x 08 000 0 0 0 0 0 0 0 38 > x 09 000 0 1 0 0 0 0 0 39 > x 0a 000 1 0 0 0 0 0 0 3A > x 0b 000 1 0 0 0 0 0 0 3B > x 0c 000 0 0 0 0 0 0 0 3C > x 0d 000 0 0 0 0 0 0 0 3D > x 0e 000 0 0 0 0 0 0 0 3E > x 0f 000 0 0 0 0 0 0 0 3F -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/