Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756045Ab0BAT6d (ORCPT ); Mon, 1 Feb 2010 14:58:33 -0500 Received: from einhorn.in-berlin.de ([192.109.42.8]:41444 "EHLO einhorn.in-berlin.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754876Ab0BAT6b (ORCPT ); Mon, 1 Feb 2010 14:58:31 -0500 X-Envelope-From: stefanr@s5r6.in-berlin.de Message-ID: <4B673233.8000300@s5r6.in-berlin.de> Date: Mon, 01 Feb 2010 20:57:39 +0100 From: Stefan Richter User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.23) Gecko/20100102 SeaMonkey/1.1.18 MIME-Version: 1.0 To: "Justin P. Mattock" CC: Dan Carpenter , linux1394-devel@lists.sourceforge.net, "Rafael J. Wysocki" , Linux Kernel Mailing List , Kernel Testers List Subject: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0) References: <4B6630CA.9010207@gmail.com> <20100201125441.GB2576@bicker> <4B671606.3080405@gmail.com> In-Reply-To: <4B671606.3080405@gmail.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4635 Lines: 116 Justin P. Mattock wrote: > On 02/01/10 04:54, Dan Carpenter wrote: >> On Sun, Jan 31, 2010 at 05:39:22PM -0800, Justin P. Mattock wrote: >>> On 01/31/10 16:43, Rafael J. Wysocki wrote: >>>> This message has been generated automatically as a part of a report >>>> of regressions introduced between 2.6.31 and 2.6.32. >>>> >>>> The following bug entry is on the current list of known regressions >>>> introduced between 2.6.31 and 2.6.32. Please verify if it still should >>>> be listed and let me know (either way). >>>> >>>> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 >>>> Subject : PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0 >>>> Submitter : Justin P. Mattock >>>> Date : 2009-10-23 16:45 (101 days old) >>>> References : http://lkml.org/lkml/2009/10/23/252 [...] >>> yeah still hitting this. [...] >> I've added the linux1394-devel people to the CC list. Thanks. Alas the original author is MIA, and the bug seems to be tied to the early platform setup code (rather than OHCI 1394 device specific code) about which I for one am clueless. The listed MAINTAINERS contact of init_ohci1394_dma.c is linux1394-devel and me, but a good deal of this driver is very x86 platform specific. (There was some interest in making useful for other architectures, but this would merely mean that the respective architecture people need to keep an eye on their parts of this driver.) >> Justin has found an issue that when he boots with: ohci1394_dma=early >> his computer >> crashes. >> >> He can get it to boot by modifying drivers/ieee1394/init_ohci1394_dma.c: [...] This modification and some others in the LKML thread from October simply cause init_ohci1394_controller() to be skipped for all devices. init_ohci1394_controller() is simple enough: static inline void __init init_ohci1394_controller(int num, int slot, int func) { unsigned long ohci_base; struct ti_ohci ohci; printk(KERN_INFO "init_ohci1394_dma: initializing OHCI-1394" " at %02x:%02x.%x\n", num, slot, func); ohci_base = read_pci_config(num, slot, func, PCI_BASE_ADDRESS_0+(0<<2)) & PCI_BASE_ADDRESS_MEM_MASK; set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); init_ohci1394_reset_and_init_dma(&ohci); } Justin, you already established that read_pci_config is not the point where it crashes, right? set_fixmap_nocache() and fix_to_virt() frighten me because I don't know what they do. :-) The rest, init_ohci1394_reset_and_init_dma(), is something which I can easily follow. There is just a bunch of register reads and writes with occasional mdelays. This /could/ be a cause of the crash too if the controller is inspired to do something dangerous in there --- meaning, if the OHCI 1394 controller starts to write something per DMA into memory. However, we do not switch on any DMA context except for the so-called physical DMA unit which only springs into action if a remote FireWire-attached console instructs it to do so. I am noticing one point where init_ohci1394_dma.c violates the OHCI 1394 specification: OHCI1394_HCControl_linkEnable is witched on while the OHCI1394_ConfigROMmap register is still invalid. This register needs to contain a physical address of a 1kB sized, 1kB aligned memory region which allows DMA_TO_DEVICE. So, since this is a read-only DMA, I am tempted to say that this potential issue should not be a cause for a kernel crash. (Sinde note, the OHCI 1394 spec is freely available, see http://ieee1394.wiki.kernel.org/index.php/Specifications#OHCI_Release_1.1.2C_January_6.2C_2000 ) Justin Mattock wrote on 2009-10-27 in http://lkml.org/lkml/2009/10/27/335: > o.k. you should be able to view > this:(let me know if you can't and I can > manually write out, and in time find a public > photo sharing suite to make things easier). > > http://www.flickr.com/photos/44066293@N08/4050317695 > > When this happens I see lots of messages from the print > during boot, then this happens. (Now that a bugzilla.kernel.org ticket exists for this you can also use bugzilla.kernel.org to publish screenshots if you have an account there.) This screenshot looks like ___alloc_bootmem_node is the issue here, or am I mistaken of what the order of functions in the backtrace means? -- Stefan Richter -=====-==-=- --=- ----= http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/