2006-12-20 10:56:27

by Ingo Molnar

[permalink] [raw]
Subject: [patch] x86_64: fix boot time hang in detect_calgary()

Subject: [patch] x86_64: fix boot time hang in detect_calgary()
From: Ingo Molnar <[email protected]>

if CONFIG_CALGARY_IOMMU is built into the kernel via
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT, or is enabled via the
iommu=calgary boot option, then the detect_calgary() function runs to
detect the presence of a Calgary IOMMU.

detect_calgary() first searches the BIOS EBDA area for a "rio_table_hdr"
BIOS table. It has this parsing algorithm for the EBDA:

while (offset) {
...
/* The next offset is stored in the 1st word. 0 means no more */
offset = *((unsigned short *)(ptr + offset));
}

got that? Lets repeat it slowly: we've got a BIOS-supplied data
structure, plus Linux kernel code that will only break out of an
infinite parsing loop once the BIOS gives a zero offset. Ok?

Translation: what an excellent opportunity for BIOS writers to lock up
the Linux boot process in an utterly hard to debug place! Indeed the
BIOS jumped on that opportunity on my box, which has the following EBDA
chaining layout:

384, 65282, 65535, 65535, 65535, 65535, 65535, 65535 ...

see the pattern? So my, definitely non-Calgary system happily locks up
in detect_calgary()!

the patch below fixes the boot hang by trusting the BIOS-supplied data
structure a bit less: the parser always has to make forward progress,
and if it doesnt, we break out of the loop and i get the expected kernel
message:

Calgary: Unable to locate Rio Grande Table in EBDA - bailing!

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86_64/kernel/pci-calgary.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

Index: linux/arch/x86_64/kernel/pci-calgary.c
===================================================================
--- linux.orig/arch/x86_64/kernel/pci-calgary.c
+++ linux/arch/x86_64/kernel/pci-calgary.c
@@ -1052,7 +1052,7 @@ void __init detect_calgary(void)
void *tbl;
int calgary_found = 0;
unsigned long ptr;
- int offset;
+ unsigned int offset, prev_offset;
int ret;

/*
@@ -1071,15 +1071,20 @@ void __init detect_calgary(void)
ptr = (unsigned long)phys_to_virt(get_bios_ebda());

rio_table_hdr = NULL;
+ prev_offset = 0;
offset = 0x180;
- while (offset) {
+ /*
+ * The next offset is stored in the 1st word.
+ * Only parse up until the offset increases:
+ */
+ while (offset > prev_offset) {
/* The block id is stored in the 2nd word */
if (*((unsigned short *)(ptr + offset + 2)) == 0x4752){
/* set the pointer past the offset & block id */
rio_table_hdr = (struct rio_table_hdr *)(ptr + offset + 4);
break;
}
- /* The next offset is stored in the 1st word. 0 means no more */
+ prev_offset = offset;
offset = *((unsigned short *)(ptr + offset));
}
if (!rio_table_hdr) {


2006-12-20 11:34:24

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [patch] x86_64: fix boot time hang in detect_calgary()

On Wed, Dec 20, 2006 at 11:53:32AM +0100, Ingo Molnar wrote:

> Index: linux/arch/x86_64/kernel/pci-calgary.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/pci-calgary.c
> +++ linux/arch/x86_64/kernel/pci-calgary.c
> @@ -1052,7 +1052,7 @@ void __init detect_calgary(void)
> void *tbl;
> int calgary_found = 0;
> unsigned long ptr;
> - int offset;
> + unsigned int offset, prev_offset;
> int ret;
>
> /*
> @@ -1071,15 +1071,20 @@ void __init detect_calgary(void)
> ptr = (unsigned long)phys_to_virt(get_bios_ebda());
>
> rio_table_hdr = NULL;
> + prev_offset = 0;
> offset = 0x180;
> - while (offset) {
> + /*
> + * The next offset is stored in the 1st word.
> + * Only parse up until the offset increases:
> + */
> + while (offset > prev_offset) {
> /* The block id is stored in the 2nd word */
> if (*((unsigned short *)(ptr + offset + 2)) == 0x4752){
> /* set the pointer past the offset & block id */
> rio_table_hdr = (struct rio_table_hdr *)(ptr + offset + 4);
> break;
> }
> - /* The next offset is stored in the 1st word. 0 means no more */
> + prev_offset = offset;
> offset = *((unsigned short *)(ptr + offset));
> }
> if (!rio_table_hdr) {
> -

Patch looks good to me, thanks. I'll give it a spin to verify.

Please CC me on Calgary patches.

Cheers,
Muli

2006-12-20 11:53:59

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [patch] x86_64: fix boot time hang in detect_calgary()

On Wed, Dec 20, 2006 at 01:34:15PM +0200, Muli Ben-Yehuda wrote:
> On Wed, Dec 20, 2006 at 11:53:32AM +0100, Ingo Molnar wrote:

[snipped patch]

> Patch looks good to me, thanks. I'll give it a spin to verify.

Signed-off-by: Muli Ben-Yehuda <[email protected]>

Andi, I assume you'll push it to mainline?

This should go into -stable too.

Cheers,
Muli

2006-12-20 15:28:38

by Dave Jones

[permalink] [raw]
Subject: Re: [patch] x86_64: fix boot time hang in detect_calgary()

On Wed, Dec 20, 2006 at 11:53:32AM +0100, Ingo Molnar wrote:

> the patch below fixes the boot hang by trusting the BIOS-supplied data
> structure a bit less: the parser always has to make forward progress,
> and if it doesnt, we break out of the loop and i get the expected kernel
> message:
>
> Calgary: Unable to locate Rio Grande Table in EBDA - bailing!

Good job tracking this down. I saw someone get bit by probably this
same bug a few days ago. Whilst on the subject though, can we do
something about the printk ?
It always bothered me that some drivers print something when
a) built-in, and b) they don't find something.
For kitchen sink kernels, this makes for a really noisy bootup.

So you didn't find hardware I know I don't have. Big deal, move on.
dmesg spam these days is getting really out of hand.

hmm, maybe a mod_printk, to only printk something when built as
a module ?

Dave

--
http://www.codemonkey.org.uk

2006-12-20 15:48:15

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [patch] x86_64: fix boot time hang in detect_calgary()

On Wed, Dec 20, 2006 at 10:28:28AM -0500, Dave Jones wrote:

> Good job tracking this down. I saw someone get bit by probably this
> same bug a few days ago. Whilst on the subject though, can we do
> something about the printk ?

Calgary can't be built as a module. I agree that being quiet when we
don't find the HW is the appropriate thing to do. I'll push a patch to
make this KERN_DEBUG in the next batch of updates.

Cheers,
Muli