2008-03-07 03:42:17

by Bob Tracy

[permalink] [raw]
Subject: [regression]2.6.25-rc4: boot panic on alpha

The regression is relative to 2.6.25-rc1 + timer fixes (added when
I reported the problems with 2.6.25-rc1 relative to 2.6.24). I haven't
tried -rc2 or -rc3.

Might be related to similar reports I've seen, but wanted to add the
Alpha platform to the list. The -rc4 kernel boots, but at some point
following initialization of the SCSI layer, there's a panic with enough
diagnostic output that a 50-line screen can't hold it all. Unfortunately,
the system logging isn't active, so nothing gets captured.

The panic is completely reproducible. The "Code" line at the bottom
of the panic output is

44220001 4821f621 4821b681 4821f62b f420004f 00000081 <0000009b> 0066c012

The first line of the "Trace" section is

[<fffffc000031a41f>] iommu_arena_alloc+0x64/0xe0

Platform is an Alpha PWS 433au. gcc is version 4.1.2 (Debian 4.1.1-21).

--
------------------------------------------------------------------------
Bob Tracy | "I was a beta tester for dirt. They never did
[email protected] | get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------


2008-03-07 07:23:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha


[Cc:-ed FUJITA Tomonori who did the Alpha IOMMU changes - full report
from Bob quoted below.]

Bob, does latest -git boot if you revert these 4 commits:

commit d5a4630a0daad241c761064295958554472ed491
Author: FUJITA Tomonori <[email protected]>
Date: Tue Mar 4 14:28:58 2008 -0800

alpha: remove unused DEBUG_FORCEDAC define in IOMMU

commit cf5401454863df8e6dc3ebe8faad09141cbec187
Author: FUJITA Tomonori <[email protected]>
Date: Tue Mar 4 14:28:57 2008 -0800

alpha: make IOMMU respect the segment boundary limits

commit 23d7e0390ab57cf15a5cfe8d6806192f0997e5a8
Author: FUJITA Tomonori <[email protected]>
Date: Tue Mar 4 14:28:57 2008 -0800

alpha: IOMMU had better access to the free space bitmap at only one place

commit 3c5f1def7dd50b792f56dcf7378c2684c06947f3
Author: FUJITA Tomonori <[email protected]>
Date: Tue Mar 4 14:28:54 2008 -0800

alpha: convert IOMMU to use ALIGN()

Ingo

* Bob Tracy <[email protected]> wrote:

> The regression is relative to 2.6.25-rc1 + timer fixes (added when
> I reported the problems with 2.6.25-rc1 relative to 2.6.24). I haven't
> tried -rc2 or -rc3.
>
> Might be related to similar reports I've seen, but wanted to add the
> Alpha platform to the list. The -rc4 kernel boots, but at some point
> following initialization of the SCSI layer, there's a panic with enough
> diagnostic output that a 50-line screen can't hold it all. Unfortunately,
> the system logging isn't active, so nothing gets captured.
>
> The panic is completely reproducible. The "Code" line at the bottom
> of the panic output is
>
> 44220001 4821f621 4821b681 4821f62b f420004f 00000081 <0000009b> 0066c012
>
> The first line of the "Trace" section is
>
> [<fffffc000031a41f>] iommu_arena_alloc+0x64/0xe0
>
> Platform is an Alpha PWS 433au. gcc is version 4.1.2 (Debian 4.1.1-21).
>
> --
> ------------------------------------------------------------------------
> Bob Tracy | "I was a beta tester for dirt. They never did
> [email protected] | get all the bugs out." - Steve McGrew on /.
> ------------------------------------------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-03-07 12:36:56

by Bob Tracy

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

Ingo Molnar wrote:
> Bob, does latest -git boot if you revert these 4 commits:

(FUJITA Tomonori's Alpha IOMMU changes)

I'll get things set up to try that later today... In the meantime, it
appears the regression is relative to -rc3: I'm up and running on that
kernel as I type this.

> * Bob Tracy <[email protected]> wrote:
>
> > (panic with 2.6.24-rc4 on Alpha)
> > The panic is completely reproducible. The "Code" line at the bottom
> > of the panic output is
> >
> > 44220001 4821f621 4821b681 4821f62b f420004f 00000081 <0000009b> 0066c012
> >
> > The first line of the "Trace" section is
> >
> > [<fffffc000031a41f>] iommu_arena_alloc+0x64/0xe0
> >

--
------------------------------------------------------------------------
Bob Tracy | "I was a beta tester for dirt. They never did
[email protected] | get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------

2008-03-07 14:15:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha


* Bob Tracy <[email protected]> wrote:

> Ingo Molnar wrote:
> > Bob, does latest -git boot if you revert these 4 commits:
>
> (FUJITA Tomonori's Alpha IOMMU changes)
>
> I'll get things set up to try that later today... In the meantime, it
> appears the regression is relative to -rc3: I'm up and running on that
> kernel as I type this.

yes, these iommu changes were added post-rc3, they came from -mm and
were i think declared to be untested on Alpha. So i'd strongly suspect
them.

Ingo

2008-03-07 14:57:20

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Fri, 7 Mar 2008 15:14:40 +0100
Ingo Molnar <[email protected]> wrote:

>
> * Bob Tracy <[email protected]> wrote:
>
> > Ingo Molnar wrote:
> > > Bob, does latest -git boot if you revert these 4 commits:
> >
> > (FUJITA Tomonori's Alpha IOMMU changes)
> >
> > I'll get things set up to try that later today... In the meantime, it
> > appears the regression is relative to -rc3: I'm up and running on that
> > kernel as I type this.
>
> yes, these iommu changes were added post-rc3, they came from -mm and
> were i think declared to be untested on Alpha. So i'd strongly suspect
> them.

Very sorry, I probably broke the IOMMU. I don't have the hardware so
I've not tested the patches.

Bob, if reverting the patches works, can you please try the following
patch? If it works, please let me know about the kernel message.

Thanks,

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index be6fa10..a6bef1d 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -129,8 +129,9 @@ static inline int is_span_boundary(unsigned int index, unsigned int nr,
unsigned long shift,
unsigned long boundary_size)
{
- shift = (shift + index) & (boundary_size - 1);
- return shift + nr > boundary_size;
+ return 0;
+/* shift = (shift + index) & (boundary_size - 1); */
+/* return shift + nr > boundary_size; */
}

/* Must be called with the arena lock held */
@@ -144,7 +145,9 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
unsigned long base;
unsigned long boundary_size;

- BUG_ON(arena->dma_base & ~PAGE_MASK);
+ if (arena->dma_base & ~PAGE_MASK)
+ printk("%s %d: %lx\n", __FUNCTION__, __LINE__, arena->dma_base);
+
base = arena->dma_base >> PAGE_SHIFT;
if (dev)
boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
@@ -152,7 +155,8 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
else
boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;

- BUG_ON(!is_power_of_2(boundary_size));
+ if (!is_power_of_2(boundary_size))
+ printk("%s %d: %lx\n", __FUNCTION__, __LINE__, boundary_size);

/* Search forward for the first mask-aligned sequence of N free ptes */
ptes = arena->ptes;

2008-03-07 16:21:25

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Fri, Mar 07, 2008 at 11:56:40PM +0900, FUJITA Tomonori wrote:
> Very sorry, I probably broke the IOMMU. I don't have the hardware so
> I've not tested the patches.

Unfortunately, I was not able to test these patches either, as my
alpha with iommmu and scsi is disassembled at the moment, thanks to
dead CPU fan...

> - BUG_ON(arena->dma_base & ~PAGE_MASK);
> + if (arena->dma_base & ~PAGE_MASK)
> + printk("%s %d: %lx\n", __FUNCTION__, __LINE__, arena->dma_base);

arena->dma_base is always properly aligned - it's pretty fundamental thing.

> base = arena->dma_base >> PAGE_SHIFT;
> if (dev)
> boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
^^^^^^^^^^^^^^^^^^^^^^^^^
Must be dma_get_seg_boundary(dev). Hopefully it's the only typo ;-)

> @@ -152,7 +155,8 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
> else
> boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;

A bit of abuse of ALIGN() macro, to my taste ;-)

> - BUG_ON(!is_power_of_2(boundary_size));
> + if (!is_power_of_2(boundary_size))
> + printk("%s %d: %lx\n", __FUNCTION__, __LINE__, boundary_size);

Perhaps it was that, if the scsi driver set max_seg_size to something
other than default 65536 (and not a power of two)...

Bob, can you give it a try?

Ivan.

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index be6fa10..e07a23f 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -144,15 +144,14 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
unsigned long base;
unsigned long boundary_size;

- BUG_ON(arena->dma_base & ~PAGE_MASK);
base = arena->dma_base >> PAGE_SHIFT;
- if (dev)
- boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
- >> PAGE_SHIFT;
- else
- boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;
-
- BUG_ON(!is_power_of_2(boundary_size));
+ if (dev) {
+ boundary_size = dma_get_seg_boundary(dev) + 1;
+ BUG_ON(!is_power_of_2(boundary_size));
+ boundary_size >>= PAGE_SHIFT;
+ } else {
+ boundary_size = 1UL << (32 - PAGE_SHIFT);
+ }

/* Search forward for the first mask-aligned sequence of N free ptes */
ptes = arena->ptes;

2008-03-07 16:46:56

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Fri, 7 Mar 2008 19:21:16 +0300
Ivan Kokshaysky <[email protected]> wrote:

> On Fri, Mar 07, 2008 at 11:56:40PM +0900, FUJITA Tomonori wrote:
> > Very sorry, I probably broke the IOMMU. I don't have the hardware so
> > I've not tested the patches.
>
> Unfortunately, I was not able to test these patches either, as my
> alpha with iommmu and scsi is disassembled at the moment, thanks to
> dead CPU fan...
>
> > - BUG_ON(arena->dma_base & ~PAGE_MASK);
> > + if (arena->dma_base & ~PAGE_MASK)
> > + printk("%s %d: %lx\n", __FUNCTION__, __LINE__, arena->dma_base);
>
> arena->dma_base is always properly aligned - it's pretty fundamental thing.

Thanks, I see.


> > base = arena->dma_base >> PAGE_SHIFT;
> > if (dev)
> > boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> Must be dma_get_seg_boundary(dev). Hopefully it's the only typo ;-)

Ah, a stupid typo...


> > @@ -152,7 +155,8 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
> > else
> > boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;
>
> A bit of abuse of ALIGN() macro, to my taste ;-)
>
> > - BUG_ON(!is_power_of_2(boundary_size));
> > + if (!is_power_of_2(boundary_size))
> > + printk("%s %d: %lx\n", __FUNCTION__, __LINE__, boundary_size);
>
> Perhaps it was that, if the scsi driver set max_seg_size to something
> other than default 65536 (and not a power of two)...

Yeah, probably (and hopefully) it was.


> Bob, can you give it a try?
>
> Ivan.
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index be6fa10..e07a23f 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> @@ -144,15 +144,14 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
> unsigned long base;
> unsigned long boundary_size;
>
> - BUG_ON(arena->dma_base & ~PAGE_MASK);
> base = arena->dma_base >> PAGE_SHIFT;
> - if (dev)
> - boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
> - >> PAGE_SHIFT;
> - else
> - boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;
> -
> - BUG_ON(!is_power_of_2(boundary_size));
> + if (dev) {
> + boundary_size = dma_get_seg_boundary(dev) + 1;
> + BUG_ON(!is_power_of_2(boundary_size));
> + boundary_size >>= PAGE_SHIFT;
> + } else {
> + boundary_size = 1UL << (32 - PAGE_SHIFT);
> + }
>
> /* Search forward for the first mask-aligned sequence of N free ptes */
> ptes = arena->ptes;

Thanks!

2008-03-07 19:06:28

by Bob Tracy

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

Ivan Kokshaysky wrote:
> On Fri, Mar 07, 2008 at 11:56:40PM +0900, FUJITA Tomonori wrote:
> > Very sorry, I probably broke the IOMMU. I don't have the hardware so
> > I've not tested the patches.
>
> Unfortunately, I was not able to test these patches either, as my
> alpha with iommmu and scsi is disassembled at the moment, thanks to
> dead CPU fan...
>
> (...)
>
> Perhaps it was that, if the scsi driver set max_seg_size to something
> other than default 65536 (and not a power of two)...
>
> Bob, can you give it a try?
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index be6fa10..e07a23f 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> (...)

In the interest of efficiency, I'll try Ivan's patch against -rc4 before
I do anything else (although I *do* have my git tree updated and ready
for abuse if it comes to that :-)). New kernel is building: I'll be able
to test it in about five hours (when I can get back to where the machine
is).

As usual, thanks to everyone for the quick responses.

--
------------------------------------------------------------------------
Bob Tracy | "I was a beta tester for dirt. They never did
[email protected] | get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------

2008-03-08 01:24:51

by Bob Tracy

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

Ivan Kokshaysky wrote:
> Bob, can you give it a try?
>
> Ivan.
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index be6fa10..e07a23f 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c

This fixes things for me. I'm running 2.6.25-rc4 + the above patch as
I type this.

Thanks!

--
------------------------------------------------------------------------
Bob Tracy | "I was a beta tester for dirt. They never did
[email protected] | get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------

2008-03-09 10:21:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Fri, 7 Mar 2008 19:24:40 -0600 (CST)
[email protected] (Bob Tracy) wrote:

> Ivan Kokshaysky wrote:
> > Bob, can you give it a try?
> >
> > Ivan.
> >
> > diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> > index be6fa10..e07a23f 100644
> > --- a/arch/alpha/kernel/pci_iommu.c
> > +++ b/arch/alpha/kernel/pci_iommu.c
>
> This fixes things for me. I'm running 2.6.25-rc4 + the above patch as
> I type this.

Great, thanks for testing!

Ivan, can you resend the patch with a description and signed-off-by?

Thanks,

2008-03-09 13:23:15

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Sun, Mar 09, 2008 at 07:20:20PM +0900, FUJITA Tomonori wrote:
> Ivan, can you resend the patch with a description and signed-off-by?

I was going to do that, but since Andrew has picked up that patch
I thought it wasn't necessary...

Anyway, resending shouldn't hurt. I assume that I can add your Acked-by.

Ivan.

2008-03-09 13:30:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Sun, 9 Mar 2008 16:22:56 +0300
Ivan Kokshaysky <[email protected]> wrote:

> On Sun, Mar 09, 2008 at 07:20:20PM +0900, FUJITA Tomonori wrote:
> > Ivan, can you resend the patch with a description and signed-off-by?
>
> I was going to do that, but since Andrew has picked up that patch
> I thought it wasn't necessary...

Maybe, but the -mm commit mail was not sent to linux-kernel or
Linus. I just wanted to make sure that the patch will be applied to
mainline shortly.


> Anyway, resending shouldn't hurt. I assume that I can add your Acked-by.

Thanks! Of course, you can add my Acked-by.

2008-03-09 15:39:17

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Sun, Mar 09, 2008 at 10:29:54PM +0900, FUJITA Tomonori wrote:
> Thanks! Of course, you can add my Acked-by.

BTW, I feel a bit sorry for not ACKing your patches in time - I wasn't able
to test them then and I was perhaps too overcautious because of that...
But they are really good.

Ivan.

2008-03-09 18:48:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Sun, 9 Mar 2008 22:29:54 +0900 FUJITA Tomonori <[email protected]> wrote:

> On Sun, 9 Mar 2008 16:22:56 +0300
> Ivan Kokshaysky <[email protected]> wrote:
>
> > On Sun, Mar 09, 2008 at 07:20:20PM +0900, FUJITA Tomonori wrote:
> > > Ivan, can you resend the patch with a description and signed-off-by?
> >
> > I was going to do that, but since Andrew has picked up that patch
> > I thought it wasn't necessary...
>
> Maybe, but the -mm commit mail was not sent to linux-kernel or
> Linus. I just wanted to make sure that the patch will be applied to
> mainline shortly.

Yup, I have it in my next-batch-for-Linus. Maybe tomorrow..

2008-03-10 14:35:45

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [regression]2.6.25-rc4: boot panic on alpha

On Sun, 9 Mar 2008 18:38:59 +0300
Ivan Kokshaysky <[email protected]> wrote:

> On Sun, Mar 09, 2008 at 10:29:54PM +0900, FUJITA Tomonori wrote:
> > Thanks! Of course, you can add my Acked-by.
>
> BTW, I feel a bit sorry for not ACKing your patches in time - I wasn't able
> to test them then and I was perhaps too overcautious because of that...
> But they are really good.

No problem, thanks for finding my stupid typo quickly :)

The patchset was merged instantly so it was difficult to review it in
time.