Hi All
I've been observing kernel panics for the past week on
kernel versions 2.6.26, 2.6.27 but not on 2.6.24 and 2.6.25.
The panic message says:
arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
Using git-bisect, I've zeroed in on the commit that introduced this.
Please see the attached file for the commit.
The workload consists of 2 tests:
1. Single fio process writing a 1 TB file.
2. 15 fio processes writing 15GB files each.
The panic happens on both workloads. There is no stack trace after
the above message.
Other info:
System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
20 SATA disks under software RAID0 with 6 TB capacity.
Silicon Image 3124 controller.
File system is XFS.
I'd much appreciate some help in fixing this because this panic has
basically stalled my own work. I'd be willing to run more tests on my
setup to test any patches that possibly fix this issue.
Regards
Shehjar
Added Cc: linux-ia64 ... more likely to attract attention of HP
ia64 experts there.
> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
Odd ... the code (back to the dawn of git time in 2.6.12-rc1) looks like
panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n"
ioc->ioc_hpa);
I wonder why you don't see the "@ HEXADDRESS"?
> Using git-bisect, I've zeroed in on the commit that introduced this.
> Please see the attached file for the commit.
Did you confirm that reverting this commit on a recent kernel
fixes the problem (once in a while git bisect can point to
the wrong commit ... it seems very likely that it got the
right one here, but it is always good to check). When I
tried to use "patch -R" to revert this it got confused on
the Kconfig file because the lines that were added were
subsequently changed ... so you may need to revert that
by hand ... the sba_iommu.c apparently reverted ok).
> Other info:
> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
> 20 SATA disks under software RAID0 with 6 TB capacity.
> Silicon Image 3124 controller.
> File system is XFS.
My HP test system is way too small to attempt to recreate
this (just 2 cpus & 1 disk). How long does each of your
tests take to hit the problems ... a few minutes? Or hours?
> I'd much appreciate some help in fixing this because this panic has
> basically stalled my own work. I'd be willing to run more tests on my
> setup to test any patches that possibly fix this issue.
Adding some printk() before the panic might give a clue as to what
is going wrong. Either a bogus call is trying to allocate far
too much space, or the bitmap is leaking, or we have a totally
messed up "ioc" structure.
Printing "pages_needed" the address of "ioc" and some interesting
fields from ioc (at least ioc->res_size) would help. I assume
the the return value from sba_search_bitmap() is ~0x0 ... but
you should print "pide" just to be sure.
-Tony
Sorry for the delay.
CC'ed linux-parisc since the same problem could happen to parisc.
On Tue, 04 Nov 2008 10:23:58 +1100
Shehjar Tikoo <[email protected]> wrote:
> I've been observing kernel panics for the past week on
> kernel versions 2.6.26, 2.6.27 but not on 2.6.24 and 2.6.25.
>
> The panic message says:
>
> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
>
> Using git-bisect, I've zeroed in on the commit that introduced this.
> Please see the attached file for the commit.
>
> The workload consists of 2 tests:
> 1. Single fio process writing a 1 TB file.
> 2. 15 fio processes writing 15GB files each.
>
> The panic happens on both workloads. There is no stack trace after
> the above message.
>
> Other info:
> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
> 20 SATA disks under software RAID0 with 6 TB capacity.
> Silicon Image 3124 controller.
> File system is XFS.
>
> I'd much appreciate some help in fixing this because this panic has
> basically stalled my own work. I'd be willing to run more tests on my
> setup to test any patches that possibly fix this issue.
This patch modified the sba IOMMU driver to support LLDs' segment
boundary limits properly.
ATA hardware has poor segment boundary limit, 64KB. In addition, sba
IOMMU driver uses size-aligned allocation algorithm. It means that
it's difficult for the IOMMU driver to find an appropriate I/O address
space. I think that you hit the allocation failure due to this problem
(of course, it's possible that my change breaks the IOMMU driver but I
can't find a problem so far).
To make matters worse, sba IOMMU driver panic when the allocation
fails. IIRC, only IA64 and parisc IOMMU drivers panic by default in
the case of the allocation failure. I think that we need to change
them to handle the failure properly.
Can you try this? I've not fixed map_single failure yet but I think
that you hit the failure allocation in map_sg path.
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index d98f0f4..8f44dc8 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -676,12 +676,19 @@ sba_alloc_range(struct ioc *ioc, struct device *dev, size_t size)
spin_unlock_irqrestore(&ioc->saved_lock, flags);
pide = sba_search_bitmap(ioc, dev, pages_needed, 0);
- if (unlikely(pide >= (ioc->res_size << 3)))
- panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n",
- ioc->ioc_hpa);
+ if (unlikely(pide >= (ioc->res_size << 3))) {
+ printk(KERN_WARNING "%s: I/O MMU @ %p is"
+ "out of mapping resources, %u %u %lx\n",
+ __func__, ioc->ioc_hpa, ioc->res_size,
+ pages_needed, dma_get_seg_boundary(dev));
+ return -1;
+ }
#else
- panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n",
- ioc->ioc_hpa);
+ printk(KERN_WARNING "%s: I/O MMU @ %p is"
+ "out of mapping resources, %u %u %lx\n",
+ __func__, ioc->ioc_hpa, ioc->res_size,
+ pages_needed, dma_get_seg_boundary(dev));
+ return -1;
#endif
}
}
@@ -962,6 +969,7 @@ sba_map_single_attrs(struct device *dev, void *addr, size_t size, int dir,
#endif
pide = sba_alloc_range(ioc, dev, size);
+ BUG_ON(pide < 0);
iovp = (dma_addr_t) pide << iovp_shift;
@@ -1304,6 +1312,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
unsigned long dma_offset, dma_len; /* start/len of DMA stream */
int n_mappings = 0;
unsigned int max_seg_size = dma_get_max_seg_size(dev);
+ int idx;
while (nents > 0) {
unsigned long vaddr = (unsigned long) sba_sg_address(startsg);
@@ -1402,9 +1411,13 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
vcontig_sg->dma_length = vcontig_len;
dma_len = (dma_len + dma_offset + ~iovp_mask) & iovp_mask;
ASSERT(dma_len <= DMA_CHUNK_SIZE);
- dma_sg->dma_address = (dma_addr_t) (PIDE_FLAG
- | (sba_alloc_range(ioc, dev, dma_len) << iovp_shift)
- | dma_offset);
+ idx = sba_alloc_range(ioc, dev, dma_len);
+ if (idx < 0) {
+ dma_sg->dma_length = 0;
+ return -1;
+ }
+ dma_sg->dma_address = (dma_addr_t)(PIDE_FLAG | (idx << iovp_shift)
+ | dma_offset);
n_mappings++;
}
@@ -1476,6 +1489,10 @@ int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents,
** Access to the virtual address is what forces a two pass algorithm.
*/
coalesced = sba_coalesce_chunks(ioc, dev, sglist, nents);
+ if (coalesced < 0) {
+ sba_unmap_sg_attrs(dev, sglist, nents, dir, attrs);
+ return 0;
+ }
/*
** Program the I/O Pdir
FUJITA Tomonori wrote:
> Sorry for the delay.
>
> CC'ed linux-parisc since the same problem could happen to parisc.
>
> On Tue, 04 Nov 2008 10:23:58 +1100
> Shehjar Tikoo <[email protected]> wrote:
>
>> I've been observing kernel panics for the past week on
>> kernel versions 2.6.26, 2.6.27 but not on 2.6.24 and 2.6.25.
>>
>> The panic message says:
>>
>> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
>>
>> Using git-bisect, I've zeroed in on the commit that introduced this.
>> Please see the attached file for the commit.
>>
>> The workload consists of 2 tests:
>> 1. Single fio process writing a 1 TB file.
>> 2. 15 fio processes writing 15GB files each.
>>
>> The panic happens on both workloads. There is no stack trace after
>> the above message.
>>
>> Other info:
>> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
>> 20 SATA disks under software RAID0 with 6 TB capacity.
>> Silicon Image 3124 controller.
>> File system is XFS.
>>
>> I'd much appreciate some help in fixing this because this panic has
>> basically stalled my own work. I'd be willing to run more tests on my
>> setup to test any patches that possibly fix this issue.
>
> This patch modified the sba IOMMU driver to support LLDs' segment
> boundary limits properly.
>
> ATA hardware has poor segment boundary limit, 64KB. In addition, sba
> IOMMU driver uses size-aligned allocation algorithm. It means that
> it's difficult for the IOMMU driver to find an appropriate I/O address
> space. I think that you hit the allocation failure due to this problem
> (of course, it's possible that my change breaks the IOMMU driver but I
> can't find a problem so far).
>
> To make matters worse, sba IOMMU driver panic when the allocation
> fails. IIRC, only IA64 and parisc IOMMU drivers panic by default in
> the case of the allocation failure. I think that we need to change
> them to handle the failure properly.
>
> Can you try this? I've not fixed map_single failure yet but I think
> that you hit the failure allocation in map_sg path.
>
On 2.6.27, this patch seems to prevent the panic from happening for
both the tests I had described earlier. Do you need more info to
validate this? I will be running more tests with this patch over
the next few days, so we'll find out anyway.
Thanks
Shehjar
>
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index d98f0f4..8f44dc8 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -676,12 +676,19 @@ sba_alloc_range(struct ioc *ioc, struct device *dev, size_t size)
> spin_unlock_irqrestore(&ioc->saved_lock, flags);
>
> pide = sba_search_bitmap(ioc, dev, pages_needed, 0);
> - if (unlikely(pide >= (ioc->res_size << 3)))
> - panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n",
> - ioc->ioc_hpa);
> + if (unlikely(pide >= (ioc->res_size << 3))) {
> + printk(KERN_WARNING "%s: I/O MMU @ %p is"
> + "out of mapping resources, %u %u %lx\n",
> + __func__, ioc->ioc_hpa, ioc->res_size,
> + pages_needed, dma_get_seg_boundary(dev));
> + return -1;
> + }
> #else
> - panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n",
> - ioc->ioc_hpa);
> + printk(KERN_WARNING "%s: I/O MMU @ %p is"
> + "out of mapping resources, %u %u %lx\n",
> + __func__, ioc->ioc_hpa, ioc->res_size,
> + pages_needed, dma_get_seg_boundary(dev));
> + return -1;
> #endif
> }
> }
> @@ -962,6 +969,7 @@ sba_map_single_attrs(struct device *dev, void *addr, size_t size, int dir,
> #endif
>
> pide = sba_alloc_range(ioc, dev, size);
> + BUG_ON(pide < 0);
>
> iovp = (dma_addr_t) pide << iovp_shift;
>
> @@ -1304,6 +1312,7 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
> unsigned long dma_offset, dma_len; /* start/len of DMA stream */
> int n_mappings = 0;
> unsigned int max_seg_size = dma_get_max_seg_size(dev);
> + int idx;
>
> while (nents > 0) {
> unsigned long vaddr = (unsigned long) sba_sg_address(startsg);
> @@ -1402,9 +1411,13 @@ sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
> vcontig_sg->dma_length = vcontig_len;
> dma_len = (dma_len + dma_offset + ~iovp_mask) & iovp_mask;
> ASSERT(dma_len <= DMA_CHUNK_SIZE);
> - dma_sg->dma_address = (dma_addr_t) (PIDE_FLAG
> - | (sba_alloc_range(ioc, dev, dma_len) << iovp_shift)
> - | dma_offset);
> + idx = sba_alloc_range(ioc, dev, dma_len);
> + if (idx < 0) {
> + dma_sg->dma_length = 0;
> + return -1;
> + }
> + dma_sg->dma_address = (dma_addr_t)(PIDE_FLAG | (idx << iovp_shift)
> + | dma_offset);
> n_mappings++;
> }
>
> @@ -1476,6 +1489,10 @@ int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents,
> ** Access to the virtual address is what forces a two pass algorithm.
> */
> coalesced = sba_coalesce_chunks(ioc, dev, sglist, nents);
> + if (coalesced < 0) {
> + sba_unmap_sg_attrs(dev, sglist, nents, dir, attrs);
> + return 0;
> + }
>
> /*
> ** Program the I/O Pdir
Luck, Tony wrote:
> Added Cc: linux-ia64 ... more likely to attract attention of HP
> ia64 experts there.
>
>> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
>
> Odd ... the code (back to the dawn of git time in 2.6.12-rc1) looks like
>
> panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n"
> ioc->ioc_hpa);
>
> I wonder why you don't see the "@ HEXADDRESS"?
That was copy paste from memory. You're right. There is a hex address.
I've copied a full message at the end of the email.
>
>> Using git-bisect, I've zeroed in on the commit that introduced this.
>> Please see the attached file for the commit.
>
> Did you confirm that reverting this commit on a recent kernel
> fixes the problem (once in a while git bisect can point to
> the wrong commit ... it seems very likely that it got the
> right one here, but it is always good to check). When I
> tried to use "patch -R" to revert this it got confused on
> the Kconfig file because the lines that were added were
> subsequently changed ... so you may need to revert that
> by hand ... the sba_iommu.c apparently reverted ok).
Yes, reverting this commit in 2.6.27 prevents kernel panic on both
workloads.
>
>> Other info:
>> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
>> 20 SATA disks under software RAID0 with 6 TB capacity.
>> Silicon Image 3124 controller.
>> File system is XFS.
>
> My HP test system is way too small to attempt to recreate
> this (just 2 cpus & 1 disk). How long does each of your
> tests take to hit the problems ... a few minutes? Or hours?
The points at which panic occur are variable for both tests but
generally, I felt the panics were occurring nearer to the end of the
750G to 1TB writes.
>
>> I'd much appreciate some help in fixing this because this panic has
>> basically stalled my own work. I'd be willing to run more tests on my
>> setup to test any patches that possibly fix this issue.
>
> Adding some printk() before the panic might give a clue as to what
> is going wrong. Either a bogus call is trying to allocate far
> too much space, or the bitmap is leaking, or we have a totally
> messed up "ioc" structure.
>
> Printing "pages_needed" the address of "ioc" and some interesting
> fields from ioc (at least ioc->res_size) would help. I assume
> the the return value from sba_search_bitmap() is ~0x0 ... but
> you should print "pide" just to be sure.
Heres some more info from a printk:
Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @
c0000000fed01000 is out of mapping resources: pide:
18446744073709551615, pages_needed: 5, iocres_size: 8192
>
> -Tony
On Thu, 06 Nov 2008 14:06:09 +1100
Shehjar Tikoo <[email protected]> wrote:
> FUJITA Tomonori wrote:
> > Sorry for the delay.
> >
> > CC'ed linux-parisc since the same problem could happen to parisc.
> >
> > On Tue, 04 Nov 2008 10:23:58 +1100
> > Shehjar Tikoo <[email protected]> wrote:
> >
> >> I've been observing kernel panics for the past week on
> >> kernel versions 2.6.26, 2.6.27 but not on 2.6.24 and 2.6.25.
> >>
> >> The panic message says:
> >>
> >> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
> >>
> >> Using git-bisect, I've zeroed in on the commit that introduced this.
> >> Please see the attached file for the commit.
> >>
> >> The workload consists of 2 tests:
> >> 1. Single fio process writing a 1 TB file.
> >> 2. 15 fio processes writing 15GB files each.
> >>
> >> The panic happens on both workloads. There is no stack trace after
> >> the above message.
> >>
> >> Other info:
> >> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
> >> 20 SATA disks under software RAID0 with 6 TB capacity.
> >> Silicon Image 3124 controller.
> >> File system is XFS.
> >>
> >> I'd much appreciate some help in fixing this because this panic has
> >> basically stalled my own work. I'd be willing to run more tests on my
> >> setup to test any patches that possibly fix this issue.
> >
> > This patch modified the sba IOMMU driver to support LLDs' segment
> > boundary limits properly.
> >
> > ATA hardware has poor segment boundary limit, 64KB. In addition, sba
> > IOMMU driver uses size-aligned allocation algorithm. It means that
> > it's difficult for the IOMMU driver to find an appropriate I/O address
> > space. I think that you hit the allocation failure due to this problem
> > (of course, it's possible that my change breaks the IOMMU driver but I
> > can't find a problem so far).
> >
> > To make matters worse, sba IOMMU driver panic when the allocation
> > fails. IIRC, only IA64 and parisc IOMMU drivers panic by default in
> > the case of the allocation failure. I think that we need to change
> > them to handle the failure properly.
> >
> > Can you try this? I've not fixed map_single failure yet but I think
> > that you hit the failure allocation in map_sg path.
> >
>
> On 2.6.27, this patch seems to prevent the panic from happening for
> both the tests I had described earlier.
Thanks!
> Do you need more info to
> validate this? I will be running more tests with this patch over
> the next few days, so we'll find out anyway.
Can you check if data corruption doesn't happen during the tests?
Tony, changing the sba IOMMU driver to return an error instead of
panic in the case of allocation failure is fine with you?
> Can you check if data corruption doesn't happen during the tests?
Very important!!!
> Tony, changing the sba IOMMU driver to return an error instead of
> panic in the case of allocation failure is fine with you?
This is fine ... but we do need to audit the callers to make
sure that they check for and handle this new error.
-Tony
On Fri, 7 Nov 2008 08:58:28 -0800
"Luck, Tony" <[email protected]> wrote:
> > Tony, changing the sba IOMMU driver to return an error instead of
> > panic in the case of allocation failure is fine with you?
>
> This is fine ... but we do need to audit the callers to make
> sure that they check for and handle this new error.
Well, this is the issue discussed in the past several times...
The most of SCSI drivers are fine; they can handle IOMMU mapping
failure properly or panic. But there are some network drivers that
don't even check the failure. Fixing these network drivers has been on
my todo list... But as I said before, we ignore this problem; only
swiotlb and SBA panic in the case of IOMMU mapping failure.