2008-08-19 00:34:51

by David Witbrodt

[permalink] [raw]
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found


As part of my experiments to determine the root cause of my lockups,
I was searching through the kernel sources trying to discover any
connection between the changes in the commits introducing the lockups
(3def3d6d... and 1e934dda...) and the fact that "hpet=disable"
alleviates the lockups.

I finally discovered something that looks promising!


Both of those commits introduce changes involving insert_resource(),
and I found the function hpet_insert_resource() in
arch/x86/kernel/acpi/boot.c that also uses insert_resource():

static __init int hpet_insert_resource(void)
{
if (!hpet_res)
return 1;

return insert_resource(&iomem_resource, hpet_res);
}


The effect of "hpet=disable" is to prevent the hpet_res pointer,

static struct __initdata resource *hpet_res;

from being attached to memory, keeping it NULL and causing the
return value to indicate that the HPET resource was not assigned.

When not using "hpet=disable", the memory location of hpet_res
is added to the iomem_resource tree. The code that obtains the
memory for hpet_res is in the same file, in the lines immediately
preceding:

static int __init acpi_parse_hpet(struct acpi_table_header *table)
{
struct acpi_table_hpet *hpet_tbl;

...
#define HPET_RESOURCE_NAME_SIZE 9
hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

...
return 0;
}


Trying to discover if something was going haywire in this part of the code,
I tried to capture some data which I could save until just before the kernel
locks so that I could printk() it and still see it without having it scroll
off the top:

===== BEGIN DIFF ==========
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9d3528c..c4670a6 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -644,6 +644,11 @@ static int __init acpi_parse_sbf(struct acpi_table_header *table)

static struct __initdata resource *hpet_res;

+extern void *dw_hpet_res;
+extern int dw_broken_bios;
+extern unsigned dw_seq;
+extern unsigned dw_req_size;
+
static int __init acpi_parse_hpet(struct acpi_table_header *table)
{
struct acpi_table_hpet *hpet_tbl;
@@ -672,6 +677,9 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
hpet_tbl->id, hpet_address);
return 0;
}
+
+ dw_broken_bios = 0;
+
#ifdef CONFIG_X86_64
/*
* Some even more broken BIOSes advertise HPET at
@@ -679,6 +687,8 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
* some noise:
*/
if (hpet_address == 0xfed0000000000000UL) {
+ dw_broken_bios = 1;
+
if (!hpet_force_user) {
printk(KERN_WARNING PREFIX "HPET id: %#x "
"base: 0xfed0000000000000 is bogus\n "
@@ -702,12 +712,15 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
*/
#define HPET_RESOURCE_NAME_SIZE 9
hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+ dw_hpet_res = hpet_res;
+ dw_req_size = sizeof (*hpet_res) + HPET_RESOURCE_NAME_SIZE;

hpet_res->name = (void *)&hpet_res[1];
hpet_res->flags = IORESOURCE_MEM;
snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
hpet_tbl->sequence);

+ dw_seq = hpet_tbl->sequence;
hpet_res->start = hpet_address;
hpet_res->end = hpet_address + (1 * 1024) - 1;

@@ -718,12 +731,19 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
* hpet_insert_resource inserts the HPET resources used into the resource
* tree.
*/
+extern int dw_ir_retval;
+
static __init int hpet_insert_resource(void)
{
+ int retval;
+
if (!hpet_res)
return 1;

- return insert_resource(&iomem_resource, hpet_res);
+ retval = insert_resource(&iomem_resource, hpet_res);
+ dw_ir_retval = retval;
+
+ return retval;
}

late_initcall(hpet_insert_resource);
diff --git a/net/core/dev.c b/net/core/dev.c
index 600bb23..fe27b94 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4304,10 +4304,21 @@ void free_netdev(struct net_device *dev)
put_device(&dev->dev);
}

+void *dw_hpet_res;
+int dw_broken_bios;
+unsigned dw_seq;
+int dw_ir_retval;
+unsigned dw_req_size;
+
/* Synchronize with packet receive processing. */
void synchronize_net(void)
{
might_sleep();
+
+ printk ("Data from arch/x86/kernel/acpi/boot.c:\n");
+ printk (" hpet_res = %p requested size: %u\n", dw_hpet_res, dw_req_size);
+ printk (" sequence = %u insert_resource() returned: %d\n", dw_seq, dw_ir_retval);
+ printk (" broken_bios: %d\n", dw_broken_bios);
synchronize_rcu();
}
===== END DIFF ==========


The output I get when the kernel locks up looks perfectly OK, except
maybe for the address of hpet_res (which I am not knowledgeable enough
to judge):

Data from arch/x86/kernel/acpi/boot.c:
hpet_res = ffff88000100f000 broken_bios: 0
sequence = 0 insert_resource() returned: 0


I see some recent (Aug. 2008) discussion of alloc_bootmem() being
broken, so maybe that is related to my problem.

Does this connection between HPET and insert_resource() look meaningful,
or is this a coincidence?


Thanks,
Dave W.


2008-08-19 01:15:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found


* David Witbrodt <[email protected]> wrote:

> The output I get when the kernel locks up looks perfectly OK, except
> maybe for the address of hpet_res (which I am not knowledgeable enough
> to judge):
>
> Data from arch/x86/kernel/acpi/boot.c:
> hpet_res = ffff88000100f000 broken_bios: 0
> sequence = 0 insert_resource() returned: 0
>
>
> I see some recent (Aug. 2008) discussion of alloc_bootmem() being
> broken, so maybe that is related to my problem.
>
> Does this connection between HPET and insert_resource() look
> meaningful, or is this a coincidence?

it is definitely the angle i'd suspect the most.

perhaps we stomp over some piece of memory that is "available RAM"
according to your BIOS, but in reality is used by something. With
previous kernels we got lucky and have put a data structure there which
kept your hpet still working. (a bit far-fetched i think, but the best
theory i could come up with)

the address you printed out (0xffff88000100f000), does look _somewhat_
suspicious. It corresponds to the physical address of 0x100f000. That is
_just_ above the 16MB boundary. It should not be relevant normally - but
it's still somewhat suspicious.

To test this theory, could you tweak this:

alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

to be:

alloc_bootmem_low(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

this will allocate the hpet resource descriptor in lower RAM.

Another idea: could you increase HPET_RESOURCE_NAME_SIZE from 9 to
something larger (via the patch below)? Maybe the bug is that this
overflows:

snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
hpet_tbl->sequence);

and corrupts the memory next to the hpet resource descriptor. Depending
on random details of the kernel, this might or might not turn into some
real problem. The way of allocating the resource and its name string
together in a bootmem allocation is a bit quirky - but should be Ok
otherwise.

Hm, i see you have printed out hpet_tbl->sequence, and that gives 0,
which should be borderline OK in terms of overflow. Cannot hurt to add
this patch to your queue of test-patches :-/

Also, you could try to increase the bootmem allocation drastically, by
say 16*1024 bytes, via:

hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE + 16*1024);
hpet_res = (void *)hpet_res + 15*1024;

this will pad the memory at ~16MB and not use it for any resource.
Arguably a really weird hack, but i'm running out of ideas ...

Ingo

------------------>
>From 6319ee82bc363e2fd356782dacc9e01e5b33694e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Tue, 19 Aug 2008 03:10:51 +0200
Subject: [PATCH] hpet: increase HPET_RESOURCE_NAME_SIZE

only had enough space for a 4 digit sprintf. If the index is wider
for any reason, we'll corrupt memory ...

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/acpi/boot.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9d3528c..f6350aa 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -700,7 +700,7 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
* Allocate and initialize the HPET firmware resource for adding into
* the resource tree during the lateinit timeframe.
*/
-#define HPET_RESOURCE_NAME_SIZE 9
+#define HPET_RESOURCE_NAME_SIZE 14
hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

hpet_res->name = (void *)&hpet_res[1];