Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261314AbVABUNX (ORCPT ); Sun, 2 Jan 2005 15:13:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261317AbVABUNX (ORCPT ); Sun, 2 Jan 2005 15:13:23 -0500 Received: from fw.osdl.org ([65.172.181.6]:59840 "EHLO mail.osdl.org") by vger.kernel.org with ESMTP id S261314AbVABUMI (ORCPT ); Sun, 2 Jan 2005 15:12:08 -0500 Date: Sun, 2 Jan 2005 12:11:47 -0800 (PST) From: Linus Torvalds To: "Georg C. F. Greve" , Hans Ulrich Niedermann cc: Kernel Mailing List , ACPI Developers , Andrew Morton , Dave Jones , Alan Cox , Herbert Poetzl , Nick Warne , Len Brown Subject: Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3326 Lines: 79 I think I chased this one down. Very nasty indeed, and the only reason I was able to find it was pure luck: I just happened to be able to reproduce the problem on a laptop (but not on my main machine, or indeed, any other machine I had access to). Even so, it took thousands of reboots to pinpoint it (I'm not kidding, I had the thing auto-reboot until it would crash or until I decided some configuration was crash-proof, and thank God for fast reboot times). If your problem is the same thing, then it's due to ACPI video enumeration overwriting an adjacent slab entry at bootup, and depending on just what happened to follow it it might be harmless or it might totally destroy the slab cache data structures. I'll hereby ask the ACPI people to verify whether this was the only such allocation problem in ACPI - it's a stupid lack of allocation for the final terminating entry, and maybe other instances of this exist (and I just found one of them). As far as I can tell, this bug would only trigger if ACPI Video reported exactly four (possibly eight, but that is getting very unlikely) video heads, since anything else would have enough padding in the allocation that it wouldn't matter that it allocated too little. Again, just pure luck that I happened to have something that fit that criteria. DaveJ: this may explain the "Mobile Radeon" reports. Not because of anything Radeon-specific, but simply because they'd have four ports (CRT, LVDS TFT panel, tv-out, and tv-in, I think - ACPI doesn't tell me enough to be sure). Of course, there might be another independent bug, so there's nothing to say that this will fix what others are seeing... Patch appended, and also pushed out to the BK trees. Linus ---- # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/01/02 11:45:13-08:00 torvalds@evo.osdl.org # acpi video device enumeration: fix incorrect device list allocation # # It didn't allocate space for the final terminating entry, # which caused it to overwrite the next slab entry, which in turn # sometimes ended up being a slab array cache pointer. End result: # total slab cache corruption at a random time afterwards. Very # nasty. # # drivers/acpi/video.c # 2005/01/02 11:45:03-08:00 torvalds@evo.osdl.org +1 -1 # acpi video device enumeration: fix incorrect device list allocation # # It didn't allocate space for the final terminating entry, # which caused it to overwrite the next slab entry, which in turn # sometimes ended up being a slab array cache pointer. End result: # total slab cache corruption at a random time afterwards. Very # nasty. # diff -Nru a/drivers/acpi/video.c b/drivers/acpi/video.c --- a/drivers/acpi/video.c 2005-01-02 12:10:23 -08:00 +++ b/drivers/acpi/video.c 2005-01-02 12:10:23 -08:00 @@ -1524,7 +1524,7 @@ dod->package.count)); active_device_list= kmalloc( - dod->package.count*sizeof(struct acpi_video_enumerated_device), + (1+dod->package.count)*sizeof(struct acpi_video_enumerated_device), GFP_KERNEL); if (!active_device_list) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/