Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756187AbYHSBPc (ORCPT ); Mon, 18 Aug 2008 21:15:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751594AbYHSBPS (ORCPT ); Mon, 18 Aug 2008 21:15:18 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:41424 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752780AbYHSBPR (ORCPT ); Mon, 18 Aug 2008 21:15:17 -0400 Date: Tue, 19 Aug 2008 03:14:45 +0200 From: Ingo Molnar To: David Witbrodt Cc: Yinghai Lu , linux-kernel@vger.kernel.org, "Paul E. McKenney" , Peter Zijlstra , Thomas Gleixner , "H. Peter Anvin" , netdev Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found Message-ID: <20080819011445.GB14821@elte.hu> References: <887650.67133.qm@web82104.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <887650.67133.qm@web82104.mail.mud.yahoo.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3849 Lines: 102 * David Witbrodt wrote: > The output I get when the kernel locks up looks perfectly OK, except > maybe for the address of hpet_res (which I am not knowledgeable enough > to judge): > > Data from arch/x86/kernel/acpi/boot.c: > hpet_res = ffff88000100f000 broken_bios: 0 > sequence = 0 insert_resource() returned: 0 > > > I see some recent (Aug. 2008) discussion of alloc_bootmem() being > broken, so maybe that is related to my problem. > > Does this connection between HPET and insert_resource() look > meaningful, or is this a coincidence? it is definitely the angle i'd suspect the most. perhaps we stomp over some piece of memory that is "available RAM" according to your BIOS, but in reality is used by something. With previous kernels we got lucky and have put a data structure there which kept your hpet still working. (a bit far-fetched i think, but the best theory i could come up with) the address you printed out (0xffff88000100f000), does look _somewhat_ suspicious. It corresponds to the physical address of 0x100f000. That is _just_ above the 16MB boundary. It should not be relevant normally - but it's still somewhat suspicious. To test this theory, could you tweak this: alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE); to be: alloc_bootmem_low(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE); this will allocate the hpet resource descriptor in lower RAM. Another idea: could you increase HPET_RESOURCE_NAME_SIZE from 9 to something larger (via the patch below)? Maybe the bug is that this overflows: snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u", hpet_tbl->sequence); and corrupts the memory next to the hpet resource descriptor. Depending on random details of the kernel, this might or might not turn into some real problem. The way of allocating the resource and its name string together in a bootmem allocation is a bit quirky - but should be Ok otherwise. Hm, i see you have printed out hpet_tbl->sequence, and that gives 0, which should be borderline OK in terms of overflow. Cannot hurt to add this patch to your queue of test-patches :-/ Also, you could try to increase the bootmem allocation drastically, by say 16*1024 bytes, via: hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE + 16*1024); hpet_res = (void *)hpet_res + 15*1024; this will pad the memory at ~16MB and not use it for any resource. Arguably a really weird hack, but i'm running out of ideas ... Ingo ------------------> >From 6319ee82bc363e2fd356782dacc9e01e5b33694e Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 19 Aug 2008 03:10:51 +0200 Subject: [PATCH] hpet: increase HPET_RESOURCE_NAME_SIZE only had enough space for a 4 digit sprintf. If the index is wider for any reason, we'll corrupt memory ... Signed-off-by: Ingo Molnar --- arch/x86/kernel/acpi/boot.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 9d3528c..f6350aa 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -700,7 +700,7 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table) * Allocate and initialize the HPET firmware resource for adding into * the resource tree during the lateinit timeframe. */ -#define HPET_RESOURCE_NAME_SIZE 9 +#define HPET_RESOURCE_NAME_SIZE 14 hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE); hpet_res->name = (void *)&hpet_res[1]; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/