Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755874AbZLWNCm (ORCPT ); Wed, 23 Dec 2009 08:02:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755857AbZLWNCl (ORCPT ); Wed, 23 Dec 2009 08:02:41 -0500 Received: from mx2.compro.net ([12.186.155.4]:10898 "EHLO mx2.compro.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755856AbZLWNCj (ORCPT ); Wed, 23 Dec 2009 08:02:39 -0500 X-IronPort-AV: E=Sophos;i="4.47,442,1257138000"; d="scan'208";a="4810952" Message-ID: <4B3214EC.6020308@compro.net> Date: Wed, 23 Dec 2009 08:02:36 -0500 From: Mark Hounschell Reply-To: markh@compro.net Organization: Compro Computer Svcs. User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: "Pallipadi, Venkatesh" CC: dmarkh@cfl.rr.com, Linus Torvalds , Alain Knaff , Linux Kernel Mailing List , "fdutils@fdutils.linux.lu" , "Li, Shaohua" , Ingo Molnar Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) References: <4AFB3962.2020106@ntlworld.com> <4B2A4EC9.2030902@compro.net> <4B2A4FA5.5000701@knaff.lu> <4B2A5192.6090602@compro.net> <4B2A530D.3080606@knaff! .lu> <4B2A6394.3080705@knaff.lu> <4B2A98BB.5080406@knaff.lu> <4B2AAC87.5000703@knaff.lu> <4B2ABDC8.6090104@knaff.lu> <4B2B4485.6000305@cfl.rr.com> <4B2B5F86.1090403@cfl.rr.com> <4B2B9F9F.7040802@compro.net> <4B2BE05B.9050006@compro.net> <4B30E1B4.7000702@compro.net> <4B310879.9050701@compro.net> <1261525076.16916.4.camel@localhost.localdo main> <4B3162BC.9000508@cfl.rr.com> In-Reply-To: <4B3162BC.9000508@cfl.rr.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8304 Lines: 169 On 12/22/2009 07:22 PM, Mark Hounschell wrote: > On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>> >>>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for >>>> details, but Mark is basically chasing down a situation where the floppy >>>> driver seems to have trouble formatting floppies, and it happened >>>> between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a >>>> memory block transfers the wrong value for the first byte of the block. >>>> >>>> Which should be impossible, but whatever. Some part of the system has a >>>> cached buffer that isn't flushed. >>>> >>>> What gets _you_ guys involved is that Mark cannot reproduce the bug if >>>> HPET is disabled in the BIOS or by using 'nohpet'. He found that out by >>>> pure luck while bisecting, because some time during his bisect, his >>>> machine wouldn't even boot with HPET. >>>> >>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But >>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>> >>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>> >>>>> Ok, I may have something that might help. >>>>> >>>>> # git bisect bad >>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>> Author: venkatesh.pallipadi@intel.com >>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>> >>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>> >>>>> Initialize a per CPU HPET MSI timer when possible. We retain the HPET >>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We >>>>> setup the remaining HPET timers as per CPU MSI based timers. This per CPU >>>>> timer will eliminate the need for timer broadcasting with IRQ 0 when there >>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>> >>>>> If there are more CPUs than number of available timers, CPUs that do not >>>>> find any timer to use will continue using LAPIC and IRQ 0 broadcast. >>>>> >>>>> Signed-off-by: Venkatesh Pallipadi >>>>> Signed-off-by: Shaohua Li >>>>> Signed-off-by: Ingo Molnar >>>>> >>>>> And of coarse this was the first commit that I could not boot if I had hpet >>>>> enabled. To get this one to boot (single user mode only) I had to add the >>>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c >>>>> >>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>> >>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) >>>>> { >>>>> >>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) >>>>> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) >>>>> return -1; >>>>> >>>>> disable_irq(dev->irq); >>>>> >>>>> AND add the quiet cmdline option. >>>> >>>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by >>>> that 5ceb1a04). But is this also when the floppy started mis-behaving? >>>> >>> >>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops >>> working >>> and also when I could no longer boot with hpet enabled. >> >> >> I am missing something here. Commit 26afe5f2 is where system does not >> boot with HPET or is it where the floppy stops working when you boot >> with HPET enabled. >> > > As it happens, both happen there. Commit 5ceb1a04 is where it starts > booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and > applied it to (26afe5f2f) to be able to boot with hpet enabled. I had to > use the quiet option to get to a login prompt, but there is where the > floppy format first fails, just as it does in 2.6.28 and up. > >> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >> output in each case. With that option, we should be using local APIC >> timer and PIT, HPET or HPET with MSI should not really matter. Does it >> still fail with .28 with that option? >> 2.6.28 still fails with that option. 2.6.27.41 /proc/interrupts with idle=halt CPU0 CPU1 CPU2 CPU3 0: 126 0 0 1 IO-APIC-edge timer 1: 0 0 1 157 IO-APIC-edge i8042 3: 0 0 0 6 IO-APIC-edge 4: 0 0 0 6 IO-APIC-edge 6: 0 0 0 4 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 1 128 IO-APIC-edge i8042 14: 0 0 34 4457 IO-APIC-edge pata_atiixp 15: 0 0 4 480 IO-APIC-edge pata_atiixp 16: 0 0 0 397 IO-APIC-fasteoi aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel 17: 0 0 0 2 IO-APIC-fasteoi ehci_hcd:usb1 18: 0 0 0 0 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 19: 0 0 0 142 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 22: 0 0 4 1154 IO-APIC-fasteoi ahci 219: 0 0 3 63 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 91539 91964 92525 91181 Local timer interrupts RES: 2888 3873 2434 2721 Rescheduling interrupts CAL: 240 245 247 84 function call interrupts TLB: 768 628 526 512 TLB shootdowns SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 2.6.28 /proc/interrupts with idle=halt CPU0 CPU1 CPU2 CPU3 0: 126 0 2 0 IO-APIC-edge timer 1: 0 0 192 0 IO-APIC-edge i8042 3: 0 0 6 0 IO-APIC-edge 4: 0 0 6 0 IO-APIC-edge 6: 0 0 4 0 IO-APIC-edge floppy 8: 0 0 1 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 128 1 IO-APIC-edge i8042 14: 0 1 147114 396 IO-APIC-edge pata_atiixp 15: 0 0 646 2 IO-APIC-edge pata_atiixp 16: 0 0 396 0 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel 17: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 18: 0 0 0 0 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 19: 0 0 362 1 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 22: 0 0 874 1 IO-APIC-fasteoi ahci 1274: 0 0 193 4 PCI-MSI-edge eth0 1279: 513207 0 0 0 HPET_MSI-edge hpet2 NMI: 0 0 0 0 Non-maskable interrupts LOC: 268 513395 513138 522088 Local timer interrupts RES: 3262 3679 2573 3746 Rescheduling interrupts CAL: 131 166 57 147 Function call interrupts TLB: 680 438 450 639 TLB shootdowns SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/