Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756894Ab1BJVmf (ORCPT ); Thu, 10 Feb 2011 16:42:35 -0500 Received: from rcsinet10.oracle.com ([148.87.113.121]:34503 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756766Ab1BJVme (ORCPT ); Thu, 10 Feb 2011 16:42:34 -0500 Message-ID: <4D545B93.9010109@oracle.com> Date: Thu, 10 Feb 2011 13:41:39 -0800 From: Randy Dunlap Organization: Oracle Linux Engineering User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-3.fc11 Thunderbird/3.0 MIME-Version: 1.0 To: Linus Torvalds CC: lkml , Corey Minyard , Bjorn Helgaas Subject: Re: Linux 2.6.38-rc4 (other bugs: ipmi Oops) References: <20110209093656.2b23b80b.randy.dunlap@oracle.com> <20110210113456.93e8bc56.randy.dunlap@oracle.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsmt353.oracle.com [141.146.40.153] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090205.4D545BA2.00EE:SCFMA4539814,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2630 Lines: 63 On 02/10/11 12:03, Linus Torvalds wrote: > On Thu, Feb 10, 2011 at 11:34 AM, Randy Dunlap wrote: >> >> Loading ipmi_si module a second time causes an Oops: >> >> [ 68.120143] RIP: 0010:[] [] put_driver+0x10/0x22 > > The disassembly is > > 55 push %rbp > 48 89 e5 mov %rsp,%rbp > 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 48 ff 05 c7 af 80 01 incq 0x180afc7(%rip) # 0x180aff2 > * 48 8b 7f 60 mov 0x60(%rdi),%rdi <-- trapping instruction > e8 38 27 ec ff callq 0xffffffffffec276c > 48 ff 05 bf af 80 01 incq 0x180afbf(%rip) # 0x180affa > c9 leaveq > c3 retq > > which is the access of "drv->p" in that function: > > kobject_put(&drv->p->kobj); > > so "drv" that was passed in was just bogus. (it's > "0xffffffffa06a8430", looks like it's the DEBUG_PAGEALLOC that has > caused the page to be free'd). > >> [ 68.340115] Call Trace: >> [ 68.340115] [] driver_register+0xc0/0x1b2 >> [ 68.340115] [] pnp_register_driver+0x28/0x31 >> [ 68.340115] [] init_ipmi_si+0x1a4/0x4cd [ipmi_si] >> [ 68.340115] [] do_one_initcall+0x6c/0x1e3 >> [ 68.340115] [] sys_init_module+0x12b/0x307 > > And I think that - as usual - the problem is that the damn driver > cleanup is very ugly, and has this duplicate set of code to unregister > all the random crap. Except one of the duplicates is missing one case. > I think the bug was introduced by Gjorn Helgaas in commit 9e368fa011d4 > ("ipmi: add PNP discovery (ACPI namespace via PNPACPI)") which added > the acpi pnp case, but only unregistered it on the regular module exit > path, not on the "module loaded with no pnp devices" path. > > Does this patch fix it? And Corey - this is a good example of why the > code shouldn't duplicate the "unregister stuff" in the module load > error case vs the module exit path, and there should be a shared > "cleanup()" function that is called by both. Can this be cleaned up, > please? > > PATCH IS UNTESTED! That works. Acked-and-tested-by: Randy Dunlap thanks, -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/