Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756092AbZLWPGB (ORCPT ); Wed, 23 Dec 2009 10:06:01 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753152AbZLWPGA (ORCPT ); Wed, 23 Dec 2009 10:06:00 -0500 Received: from mga11.intel.com ([192.55.52.93]:18304 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752959AbZLWPF7 convert rfc822-to-8bit (ORCPT ); Wed, 23 Dec 2009 10:05:59 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.47,442,1257148800"; d="scan'208";a="758711350" From: "Pallipadi, Venkatesh" To: "markh@compro.net" CC: "dmarkh@cfl.rr.com" , Linus Torvalds , Alain Knaff , Linux Kernel Mailing List , "fdutils@fdutils.linux.lu" , "Li, Shaohua" , Ingo Molnar Date: Wed, 23 Dec 2009 07:10:49 -0800 Subject: RE: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Thread-Topic: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Thread-Index: AcqD0DWWiUEAFq19SBu75w4It1HczAAETmAw Message-ID: <6598A4E21F1DB24D80BF72956484F59802EFD1C6@orsmsx001.amr.corp.intel.com> References: <4AFB3962.2020106@ntlworld.com> <4B2A4EC9.2030902@compro.net> <4B2A4FA5.5000701@knaff.lu> <4B2A5192.6090602@compro.net> <4B2A530D.3080606@knaff! .lu> <4B2A6394.3080705@knaff.lu> <4B2A98BB.5080406@knaff.lu> <4B2AAC87.5000703@knaff.lu> <4B2ABDC8.6090104@knaff.lu> <4B2B4485.6000305@cfl.rr.com> <4B2B5F86.1090403@cfl.rr.com> <4B2B9F9F.7040802@compro.net> <4B2BE05B.9050006@compro.net> <4B30E1B4.7000702@compro.net> <4B310879.9050701@compro.net> <1261525076.16916.4.camel@localhost.localdo main> <4B3162BC.9000508@cfl.rr.com> <4B3214EC.6020308@compro.net> In-Reply-To: <4B3214EC.6020308@compro.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9379 Lines: 244 >-----Original Message----- >From: Mark Hounschell [mailto:markh@compro.net] >Sent: Wednesday, December 23, 2009 5:03 AM >To: Pallipadi, Venkatesh >Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux >Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >Subject: Re: [Fdutils] DMA cache consistency bug introduced in >2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) > >On 12/22/2009 07:22 PM, Mark Hounschell wrote: >> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>>> >>>>> [ Ingo, Venki and Shaohua added to cc: see the whole >thread on lkml for >>>>> details, but Mark is basically chasing down a situation >where the floppy >>>>> driver seems to have trouble formatting floppies, and >it happened >>>>> between 2.6.27 and .28. The trouble seems to be that a >DMA transfer of a >>>>> memory block transfers the wrong value for the first >byte of the block. >>>>> >>>>> Which should be impossible, but whatever. Some part of >the system has a >>>>> cached buffer that isn't flushed. >>>>> >>>>> What gets _you_ guys involved is that Mark cannot >reproduce the bug if >>>>> HPET is disabled in the BIOS or by using 'nohpet'. He >found that out by >>>>> pure luck while bisecting, because some time during his >bisect, his >>>>> machine wouldn't even boot with HPET. >>>>> >>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ >to work. But >>>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>>> >>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>>> >>>>>> Ok, I may have something that might help. >>>>>> >>>>>> # git bisect bad >>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>>> Author: venkatesh.pallipadi@intel.com > >>>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>>> >>>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>>> >>>>>> Initialize a per CPU HPET MSI timer when possible. >We retain the HPET >>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when >legacy mode is being used. We >>>>>> setup the remaining HPET timers as per CPU MSI based >timers. This per CPU >>>>>> timer will eliminate the need for timer broadcasting >with IRQ 0 when there >>>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>>> >>>>>> If there are more CPUs than number of available >timers, CPUs that do not >>>>>> find any timer to use will continue using LAPIC and >IRQ 0 broadcast. >>>>>> >>>>>> Signed-off-by: Venkatesh Pallipadi > >>>>>> Signed-off-by: Shaohua Li >>>>>> Signed-off-by: Ingo Molnar >>>>>> >>>>>> And of coarse this was the first commit that I could not >boot if I had hpet >>>>>> enabled. To get this one to boot (single user mode only) >I had to add the >>>>>> the quiet cmdline option and following patch from to >arch/x86/kernel/hpet.c >>>>>> >>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>>> >>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct >hpet_dev *dev) >>>>>> { >>>>>> >>>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>>> - IRQF_SHARED|IRQF_NOBALANCING, >dev->name, dev)) >>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, >dev->name, dev)) >>>>>> return -1; >>>>>> >>>>>> disable_irq(dev->irq); >>>>>> >>>>>> AND add the quiet cmdline option. >>>>> >>>>> Ok, so we know why HPET didn't boot for you, and that was >fixed later (by >>>>> that 5ceb1a04). But is this also when the floppy started >mis-behaving? >>>>> >>>> >>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when >the floppy stops >>>> working >>>> and also when I could no longer boot with hpet enabled. >>> >>> >>> I am missing something here. Commit 26afe5f2 is where >system does not >>> boot with HPET or is it where the floppy stops working when you boot >>> with HPET enabled. >>> >> >> As it happens, both happen there. Commit 5ceb1a04 is where it starts >> booting _again_ with hpet enabled. So I took that patch >(5ceb1a04) and >> applied it to (26afe5f2f) to be able to boot with hpet >enabled. I had to >> use the quiet option to get to a login prompt, but there is where the >> floppy format first fails, just as it does in 2.6.28 and up. >> >>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >>> output in each case. With that option, we should be using local APIC >>> timer and PIT, HPET or HPET with MSI should not really >matter. Does it >>> still fail with .28 with that option? >>> > >2.6.28 still fails with that option. > >2.6.27.41 /proc/interrupts with idle=halt > > CPU0 CPU1 CPU2 CPU3 > 0: 126 0 0 1 >IO-APIC-edge timer > 1: 0 0 1 157 >IO-APIC-edge i8042 > 3: 0 0 0 6 IO-APIC-edge > 4: 0 0 0 6 IO-APIC-edge > 6: 0 0 0 4 >IO-APIC-edge floppy > 8: 0 0 0 1 >IO-APIC-edge rtc0 > 9: 0 0 0 0 >IO-APIC-fasteoi acpi > 12: 0 0 1 128 >IO-APIC-edge i8042 > 14: 0 0 34 4457 IO-APIC-edge >pata_atiixp > 15: 0 0 4 480 IO-APIC-edge >pata_atiixp > 16: 0 0 0 397 IO-APIC-fasteoi >aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel > 17: 0 0 0 2 IO-APIC-fasteoi >ehci_hcd:usb1 > 18: 0 0 0 0 IO-APIC-fasteoi >ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 > 19: 0 0 0 142 IO-APIC-fasteoi >aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 > 22: 0 0 4 1154 >IO-APIC-fasteoi ahci >219: 0 0 3 63 >PCI-MSI-edge eth0 >NMI: 0 0 0 0 >Non-maskable interrupts >LOC: 91539 91964 92525 91181 Local timer >interrupts >RES: 2888 3873 2434 2721 >Rescheduling interrupts >CAL: 240 245 247 84 function >call interrupts >TLB: 768 628 526 512 TLB shootdowns >SPU: 0 0 0 0 Spurious interrupts >ERR: 0 >MIS: 0 > >2.6.28 /proc/interrupts with idle=halt > > CPU0 CPU1 CPU2 CPU3 > 0: 126 0 2 0 >IO-APIC-edge timer > 1: 0 0 192 0 >IO-APIC-edge i8042 > 3: 0 0 6 0 IO-APIC-edge > 4: 0 0 6 0 IO-APIC-edge > 6: 0 0 4 0 >IO-APIC-edge floppy > 8: 0 0 1 0 >IO-APIC-edge rtc0 > 9: 0 0 0 0 >IO-APIC-fasteoi acpi > 12: 0 0 128 1 >IO-APIC-edge i8042 > 14: 0 1 147114 396 IO-APIC-edge >pata_atiixp > 15: 0 0 646 2 IO-APIC-edge >pata_atiixp > 16: 0 0 396 0 IO-APIC-fasteoi >aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel > 17: 0 0 0 0 IO-APIC-fasteoi >ehci_hcd:usb1 > 18: 0 0 0 0 IO-APIC-fasteoi >ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 > 19: 0 0 362 1 IO-APIC-fasteoi >aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 > 22: 0 0 874 1 >IO-APIC-fasteoi ahci >1274: 0 0 193 4 >PCI-MSI-edge eth0 >1279: 513207 0 0 0 >HPET_MSI-edge hpet2 >NMI: 0 0 0 0 >Non-maskable interrupts >LOC: 268 513395 513138 522088 Local timer >interrupts >RES: 3262 3679 2573 3746 >Rescheduling interrupts >CAL: 131 166 57 147 Function >call interrupts >TLB: 680 438 450 639 TLB shootdowns >SPU: 0 0 0 0 Spurious interrupts >ERR: 0 >MIS: 0 > Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case. I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28 /proc/timer_list grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/