2011-03-29 23:36:08

by Mike Travis

[permalink] [raw]
Subject: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

dmar_init_reserved_ranges() reserves the card's MMIO ranges to
prevent handing out a DMA map that would overlap with the MMIO range.
The problem while the Nvidia GPU has 64bit BARs, it's capable of
receiving > 40bit PIOs, but can't generate > 40bit DMAs.

So when the iommu code reserves these MMIO ranges a > 40bit
entry ends up getting in the rbtree. On a UV test system with
the Nvidia cards, the BARs are:

0001:36:00.0 VGA compatible controller: nVidia Corporation GT200GL
Region 0: Memory at 92000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]

So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
maps get added and deleted from the rbtree we can end up getting a cached
entry to this 0xf8200000000 entry... this is what results in the code
handing out the invalid DMA map of 0xf81fffff000:

[ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]

The IOVA code needs to better honor the "limit_pfn" when allocating
these maps.

Signed-off-by: Mike Travis <[email protected]>
Reviewed-by: Mike Habeck <[email protected]>
---
drivers/pci/intel-iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- linux.orig/drivers/pci/intel-iommu.c
+++ linux/drivers/pci/intel-iommu.c
@@ -1323,7 +1323,8 @@ static void dmar_init_reserved_ranges(vo

for (i = 0; i < PCI_NUM_RESOURCES; i++) {
r = &pdev->resource[i];
- if (!r->flags || !(r->flags & IORESOURCE_MEM))
+ if (!r->flags || !(r->flags & IORESOURCE_MEM) ||
+ r->start > pdev->dma_mask)
continue;
iova = reserve_iova(&reserved_iova_list,
IOVA_PFN(r->start),

--


2011-03-31 22:11:52

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

Chris - did you have any comment on this patch?

David/Jesse - should I resubmit the entire series for consideration? We
have a number of customers that have encountered this problem.

Thanks,
Mike

Mike Travis wrote:
> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
> prevent handing out a DMA map that would overlap with the MMIO range.
> The problem while the Nvidia GPU has 64bit BARs, it's capable of
> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>
> So when the iommu code reserves these MMIO ranges a > 40bit
> entry ends up getting in the rbtree. On a UV test system with
> the Nvidia cards, the BARs are:
>
> 0001:36:00.0 VGA compatible controller: nVidia Corporation GT200GL
> Region 0: Memory at 92000000 (32-bit, non-prefetchable) [size=16M]
> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>
> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
> maps get added and deleted from the rbtree we can end up getting a cached
> entry to this 0xf8200000000 entry... this is what results in the code
> handing out the invalid DMA map of 0xf81fffff000:
>
> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>
> The IOVA code needs to better honor the "limit_pfn" when allocating
> these maps.
>
> Signed-off-by: Mike Travis <[email protected]>
> Reviewed-by: Mike Habeck <[email protected]>
> ---
> drivers/pci/intel-iommu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> --- linux.orig/drivers/pci/intel-iommu.c
> +++ linux/drivers/pci/intel-iommu.c
> @@ -1323,7 +1323,8 @@ static void dmar_init_reserved_ranges(vo
>
> for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> r = &pdev->resource[i];
> - if (!r->flags || !(r->flags & IORESOURCE_MEM))
> + if (!r->flags || !(r->flags & IORESOURCE_MEM) ||
> + r->start > pdev->dma_mask)
> continue;
> iova = reserve_iova(&reserved_iova_list,
> IOVA_PFN(r->start),
>

2011-03-31 22:54:07

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

* Mike Travis ([email protected]) wrote:
> Chris - did you have any comment on this patch?

It doesn't actually look right to me. It means that particular range
is no longer reserved. But perhaps I've misunderstood something.

> Mike Travis wrote:
> > dmar_init_reserved_ranges() reserves the card's MMIO ranges to
> > prevent handing out a DMA map that would overlap with the MMIO range.
> > The problem while the Nvidia GPU has 64bit BARs, it's capable of
> > receiving > 40bit PIOs, but can't generate > 40bit DMAs.

I don't undertand what you mean here.

> > So when the iommu code reserves these MMIO ranges a > 40bit
> > entry ends up getting in the rbtree. On a UV test system with
> > the Nvidia cards, the BARs are:
> >
> > 0001:36:00.0 VGA compatible controller: nVidia Corporation
> >GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
> >[size=16M]
> > Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
> > Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
> >
> > So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
> > maps get added and deleted from the rbtree we can end up getting a cached
> > entry to this 0xf8200000000 entry... this is what results in the code
> > handing out the invalid DMA map of 0xf81fffff000:
> >
> > [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
> >
> > The IOVA code needs to better honor the "limit_pfn" when allocating
> > these maps.

This means we could get the MMIO address range (it's no longer reserved).
It seems to me the DMA transaction would then become a peer to peer
transaction if ACS is not enabled, which could show up as random register
write in that GPUs 256M BAR (i.e. broken).

The iova allocation should not hand out an address bigger than the
dma_mask. What is the device's dma_mask?

thanks,
-chris

2011-03-31 23:25:11

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

I'll probably need help from our Hardware PCI Engineer to help explain
this further, though here's a pointer to an earlier email thread:

http://marc.info/?l=linux-kernel&m=129259816925973&w=2

I'll also dig out the specs you're asking for.

Thanks,
Mike

Chris Wright wrote:
> * Mike Travis ([email protected]) wrote:
>> Chris - did you have any comment on this patch?
>
> It doesn't actually look right to me. It means that particular range
> is no longer reserved. But perhaps I've misunderstood something.
>
>> Mike Travis wrote:
>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
>>> prevent handing out a DMA map that would overlap with the MMIO range.
>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of
>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>
> I don't undertand what you mean here.
>
>>> So when the iommu code reserves these MMIO ranges a > 40bit
>>> entry ends up getting in the rbtree. On a UV test system with
>>> the Nvidia cards, the BARs are:
>>>
>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation
>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
>>> [size=16M]
>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>>>
>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
>>> maps get added and deleted from the rbtree we can end up getting a cached
>>> entry to this 0xf8200000000 entry... this is what results in the code
>>> handing out the invalid DMA map of 0xf81fffff000:
>>>
>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>>>
>>> The IOVA code needs to better honor the "limit_pfn" when allocating
>>> these maps.
>
> This means we could get the MMIO address range (it's no longer reserved).
> It seems to me the DMA transaction would then become a peer to peer
> transaction if ACS is not enabled, which could show up as random register
> write in that GPUs 256M BAR (i.e. broken).
>
> The iova allocation should not hand out an address bigger than the
> dma_mask. What is the device's dma_mask?
>
> thanks,
> -chris

2011-03-31 23:39:15

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

* Chris Wright ([email protected]) wrote:
> > Mike Travis wrote:
> > > Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
> > > Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
> > >
> > > So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
> > > maps get added and deleted from the rbtree we can end up getting a cached
> > > entry to this 0xf8200000000 entry... this is what results in the code
> > > handing out the invalid DMA map of 0xf81fffff000:
> > >
> > > [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
> > >
> > > The IOVA code needs to better honor the "limit_pfn" when allocating
> > > these maps.
>
> This means we could get the MMIO address range (it's no longer reserved).
> It seems to me the DMA transaction would then become a peer to peer
> transaction if ACS is not enabled, which could show up as random register
> write in that GPUs 256M BAR (i.e. broken).
>
> The iova allocation should not hand out an address bigger than the
> dma_mask. What is the device's dma_mask?

Ah, looks like this is a bad interaction with the way the cached entry
is handled. I think the iova lookup should skip down the the limit_pfn
rather than assume that rb_last's pfn_lo/hi is ok just because it's in
the tree. Because you'll never hit the limit_pfn == 32bit_pfn case, it
just goes straight to rb_last in __get_cached_rbnode.

2011-03-31 23:43:00

by Mike Habeck

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

On 03/31/2011 06:25 PM, Mike Travis wrote:
> I'll probably need help from our Hardware PCI Engineer to help explain
> this further, though here's a pointer to an earlier email thread:
>
> http://marc.info/?l=linux-kernel&m=129259816925973&w=2
>
> I'll also dig out the specs you're asking for.
>
> Thanks,
> Mike
>
> Chris Wright wrote:
>> * Mike Travis ([email protected]) wrote:
>>> Chris - did you have any comment on this patch?
>>
>> It doesn't actually look right to me. It means that particular range
>> is no longer reserved. But perhaps I've misunderstood something.
>>
>>> Mike Travis wrote:
>>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
>>>> prevent handing out a DMA map that would overlap with the MMIO range.
>>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of
>>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>>
>> I don't undertand what you mean here.

What Mike is getting at is there is no reason to reserve the MMIO
range if it's greater than the dma_mask, given the MMIO range is
outside of what the IOVA code will ever hand back to the IOMMU
code. In this case the nVidia card has a 64bit BAR and is assigned
the MMIO range [0xf8200000000 - 0xf820fffffff]. But the Nvidia
card can only generate a 40bit DMA (thus has a 40bit dma_mask). If
the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it
will never hand back a >40bit address back to the IOMMU code. Thus
there is no reason to reserve the cards MMIO range if it is greater
than the dma_mask. (And that is what the patch is doing).

More below,,,

>>
>>>> So when the iommu code reserves these MMIO ranges a > 40bit
>>>> entry ends up getting in the rbtree. On a UV test system with
>>>> the Nvidia cards, the BARs are:
>>>>
>>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation
>>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
>>>> [size=16M]
>>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
>>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>>>>
>>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
>>>> maps get added and deleted from the rbtree we can end up getting a cached
>>>> entry to this 0xf8200000000 entry... this is what results in the code
>>>> handing out the invalid DMA map of 0xf81fffff000:
>>>>
>>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>>>>
>>>> The IOVA code needs to better honor the "limit_pfn" when allocating
>>>> these maps.
>>
>> This means we could get the MMIO address range (it's no longer reserved).

Not true, the MMIO address is greater than the dma_mask (i.e., the
limit_pfn passed into alloc_iova()) thus the IOVA code will never
hand back that address range given it's greater than the dma_mask).

>> It seems to me the DMA transaction would then become a peer to peer
>> transaction if ACS is not enabled, which could show up as random register
>> write in that GPUs 256M BAR (i.e. broken).
>>
>> The iova allocation should not hand out an address bigger than the
>> dma_mask. What is the device's dma_mask?

Agree. But there is a bug. The IOVA doesn't validate the limit_pfn
if it uses the cached entry. One could argue that it should validate
the limit_pfn, but then again a entry outside the limit_pfn should
have never got into the rbtree... (it got in due to the IOMMU's
dmar_init_reserved_ranges() adding it).

-mike

>>
>> thanks,
>> -chris

2011-03-31 23:57:10

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

* Mike Habeck ([email protected]) wrote:
> On 03/31/2011 06:25 PM, Mike Travis wrote:
> >I'll probably need help from our Hardware PCI Engineer to help explain
> >this further, though here's a pointer to an earlier email thread:
> >
> >http://marc.info/?l=linux-kernel&m=129259816925973&w=2
> >
> >I'll also dig out the specs you're asking for.
> >
> >Thanks,
> >Mike
> >
> >Chris Wright wrote:
> >>* Mike Travis ([email protected]) wrote:
> >>>Chris - did you have any comment on this patch?
> >>
> >>It doesn't actually look right to me. It means that particular range
> >>is no longer reserved. But perhaps I've misunderstood something.
> >>
> >>>Mike Travis wrote:
> >>>>dmar_init_reserved_ranges() reserves the card's MMIO ranges to
> >>>>prevent handing out a DMA map that would overlap with the MMIO range.
> >>>>The problem while the Nvidia GPU has 64bit BARs, it's capable of
> >>>>receiving > 40bit PIOs, but can't generate > 40bit DMAs.
> >>
> >>I don't undertand what you mean here.
>
> What Mike is getting at is there is no reason to reserve the MMIO
> range if it's greater than the dma_mask, given the MMIO range is
> outside of what the IOVA code will ever hand back to the IOMMU
> code. In this case the nVidia card has a 64bit BAR and is assigned
> the MMIO range [0xf8200000000 - 0xf820fffffff]. But the Nvidia
> card can only generate a 40bit DMA (thus has a 40bit dma_mask). If
> the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it
> will never hand back a >40bit address back to the IOMMU code. Thus
> there is no reason to reserve the cards MMIO range if it is greater
> than the dma_mask. (And that is what the patch is doing).

The reserved ranges are for all devices. Another device with a 64bit
dma_mask could get that region if it's not properly reserved. The
driver would then program that device to dma to an address to is an
alias to a MMIO region. The memory transaction travels up towards
root...and sees the MMIO range in some bridge and would go straight down
to the GPU.

> More below,,,
>
> >>
> >>>>So when the iommu code reserves these MMIO ranges a > 40bit
> >>>>entry ends up getting in the rbtree. On a UV test system with
> >>>>the Nvidia cards, the BARs are:
> >>>>
> >>>>0001:36:00.0 VGA compatible controller: nVidia Corporation
> >>>>GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
> >>>>[size=16M]
> >>>>Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
> >>>>Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
> >>>>
> >>>>So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
> >>>>maps get added and deleted from the rbtree we can end up getting a cached
> >>>>entry to this 0xf8200000000 entry... this is what results in the code
> >>>>handing out the invalid DMA map of 0xf81fffff000:
> >>>>
> >>>>[ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
> >>>>
> >>>>The IOVA code needs to better honor the "limit_pfn" when allocating
> >>>>these maps.
> >>
> >>This means we could get the MMIO address range (it's no longer reserved).
>
> Not true, the MMIO address is greater than the dma_mask (i.e., the
> limit_pfn passed into alloc_iova()) thus the IOVA code will never
> hand back that address range given it's greater than the dma_mask).

Well, as you guys are seeing, the iova allocation code is making the
assumption that if the range is in the tree, it's valid. And it is
handing out an address that's too large.

> >>It seems to me the DMA transaction would then become a peer to peer
> >>transaction if ACS is not enabled, which could show up as random register
> >>write in that GPUs 256M BAR (i.e. broken).
> >>
> >>The iova allocation should not hand out an address bigger than the
> >>dma_mask. What is the device's dma_mask?
>
> Agree. But there is a bug. The IOVA doesn't validate the limit_pfn
> if it uses the cached entry. One could argue that it should validate
> the limit_pfn, but then again a entry outside the limit_pfn should
> have never got into the rbtree... (it got in due to the IOMMU's
> dmar_init_reserved_ranges() adding it).

Yeah, I think it needs to be in the global reserved list. But perhaps
not copied into the domain specific iova. Or simply skipped on iova
allocation (don't just assume rb_last is <= dma_mask).

thanks,
-chris

2011-04-01 01:07:26

by Mike Habeck

[permalink] [raw]
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges



Chris Wright wrote:
> * Mike Habeck ([email protected]) wrote:
>> On 03/31/2011 06:25 PM, Mike Travis wrote:
>>> I'll probably need help from our Hardware PCI Engineer to help explain
>>> this further, though here's a pointer to an earlier email thread:
>>>
>>> http://marc.info/?l=linux-kernel&m=129259816925973&w=2
>>>
>>> I'll also dig out the specs you're asking for.
>>>
>>> Thanks,
>>> Mike
>>>
>>> Chris Wright wrote:
>>>> * Mike Travis ([email protected]) wrote:
>>>>> Chris - did you have any comment on this patch?
>>>> It doesn't actually look right to me. It means that particular range
>>>> is no longer reserved. But perhaps I've misunderstood something.
>>>>
>>>>> Mike Travis wrote:
>>>>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
>>>>>> prevent handing out a DMA map that would overlap with the MMIO range.
>>>>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of
>>>>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>>>> I don't undertand what you mean here.
>> What Mike is getting at is there is no reason to reserve the MMIO
>> range if it's greater than the dma_mask, given the MMIO range is
>> outside of what the IOVA code will ever hand back to the IOMMU
>> code. In this case the nVidia card has a 64bit BAR and is assigned
>> the MMIO range [0xf8200000000 - 0xf820fffffff]. But the Nvidia
>> card can only generate a 40bit DMA (thus has a 40bit dma_mask). If
>> the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it
>> will never hand back a >40bit address back to the IOMMU code. Thus
>> there is no reason to reserve the cards MMIO range if it is greater
>> than the dma_mask. (And that is what the patch is doing).
>
> The reserved ranges are for all devices. Another device with a 64bit
> dma_mask could get that region if it's not properly reserved. The
> driver would then program that device to dma to an address to is an
> alias to a MMIO region. The memory transaction travels up towards
> root...and sees the MMIO range in some bridge and would go straight down
> to the GPU.

Chris,

OK, I understand now what you meant by the patch possibly causing
the DMA transaction to become a peer to peer transaction. Mike and
I will have to rethink this one. Thanks for your input.

-mike


>
>> More below,,,
>>
>>>>>> So when the iommu code reserves these MMIO ranges a > 40bit
>>>>>> entry ends up getting in the rbtree. On a UV test system with
>>>>>> the Nvidia cards, the BARs are:
>>>>>>
>>>>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation
>>>>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
>>>>>> [size=16M]
>>>>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
>>>>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>>>>>>
>>>>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
>>>>>> maps get added and deleted from the rbtree we can end up getting a cached
>>>>>> entry to this 0xf8200000000 entry... this is what results in the code
>>>>>> handing out the invalid DMA map of 0xf81fffff000:
>>>>>>
>>>>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>>>>>>
>>>>>> The IOVA code needs to better honor the "limit_pfn" when allocating
>>>>>> these maps.
>>>> This means we could get the MMIO address range (it's no longer reserved).
>> Not true, the MMIO address is greater than the dma_mask (i.e., the
>> limit_pfn passed into alloc_iova()) thus the IOVA code will never
>> hand back that address range given it's greater than the dma_mask).
>
> Well, as you guys are seeing, the iova allocation code is making the
> assumption that if the range is in the tree, it's valid. And it is
> handing out an address that's too large.
>
>>>> It seems to me the DMA transaction would then become a peer to peer
>>>> transaction if ACS is not enabled, which could show up as random register
>>>> write in that GPUs 256M BAR (i.e. broken).
>>>>
>>>> The iova allocation should not hand out an address bigger than the
>>>> dma_mask. What is the device's dma_mask?
>> Agree. But there is a bug. The IOVA doesn't validate the limit_pfn
>> if it uses the cached entry. One could argue that it should validate
>> the limit_pfn, but then again a entry outside the limit_pfn should
>> have never got into the rbtree... (it got in due to the IOMMU's
>> dmar_init_reserved_ranges() adding it).
>
> Yeah, I think it needs to be in the global reserved list. But perhaps
> not copied into the domain specific iova. Or simply skipped on iova
> allocation (don't just assume rb_last is <= dma_mask).
>
> thanks,
> -chris

2011-04-02 00:33:20

by Chris Wright

[permalink] [raw]
Subject: [PATCH 3/4 v2] intel-iommu: don't cache iova above 32bit caching boundary

Mike Travis and Mike Habeck reported an issue where iova allocation
would return a range that was larger than a device's dma mask.

https://lkml.org/lkml/2011/3/29/423

The dmar initialization code will reserve all PCI MMIO regions and copy
those reservations into a domain specific iova tree. It is possible for
one of those regions to be above the dma mask of a device. It is typical
to allocate iovas with a 32bit mask (despite device's dma mask possibly
being larger) and cache the result until it exhausts the lower 32bit
address space. Freeing the iova range that is >= the last iova in the
lower 32bit range when there is still an iova above the 32bit range will
corrupt the cached iova by pointing it to a region that is above 32bit.
If that region is also larger than the device's dma mask, a subsequent
allocation will return an unusable iova and cause dma failure.

Simply don't cache an iova that is above the 32bit caching boundary.

Reported-by: Mike Travis <[email protected]>
Reported-by: Mike Habeck <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]
Signed-off-by: Chris Wright <[email protected]>
---
Mike or Mike, can you try this? I was able to reproduce the failure
in a few different ways and successfully test this patch against those
failures, but w/out real hw.

drivers/pci/iova.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
index 7914951..1690ca4 100644
--- a/drivers/pci/iova.c
+++ b/drivers/pci/iova.c
@@ -63,8 +63,16 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
curr = iovad->cached32_node;
cached_iova = container_of(curr, struct iova, node);

- if (free->pfn_lo >= cached_iova->pfn_lo)
- iovad->cached32_node = rb_next(&free->node);
+ if (free->pfn_lo >= cached_iova->pfn_lo) {
+ struct rb_node *node = rb_next(&free->node);
+ struct iova *iova = container_of(node, struct iova, node);
+
+ /* only cache if it's below 32bit pfn */
+ if (iova->pfn_lo < iovad->dma_32bit_pfn)
+ iovad->cached32_node = node;
+ else
+ iovad->cached32_node = NULL;
+ }
}

/* Computes the padding size required, to make the
--
1.7.4

2011-04-06 00:40:09

by Chris Wright

[permalink] [raw]
Subject: [PATCH 3/4 v3] intel-iommu: don't cache iova above 32bit caching boundary

Mike Travis and Mike Habeck reported an issue where iova allocation
would return a range that was larger than a device's dma mask.

https://lkml.org/lkml/2011/3/29/423

The dmar initialization code will reserve all PCI MMIO regions and copy
those reservations into a domain specific iova tree. It is possible for
one of those regions to be above the dma mask of a device. It is typical
to allocate iovas with a 32bit mask (despite device's dma mask possibly
being larger) and cache the result until it exhausts the lower 32bit
address space. Freeing the iova range that is >= the last iova in the
lower 32bit range when there is still an iova above the 32bit range will
corrupt the cached iova by pointing it to a region that is above 32bit.
If that region is also larger than the device's dma mask, a subsequent
allocation will return an unusable iova and cause dma failure.

Simply don't cache an iova that is above the 32bit caching boundary.

Reported-by: Mike Travis <[email protected]>
Reported-by: Mike Habeck <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]
Acked-by: Mike Travis <[email protected]>
Tested-by: Mike Habeck <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
---

v3: rb_next() can return NULL, found when testing on my hw

David, Mike Travis will collect and resumbit full series when he's back.

drivers/pci/iova.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
index 7914951..10f995a 100644
--- a/drivers/pci/iova.c
+++ b/drivers/pci/iova.c
@@ -63,8 +63,16 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
curr = iovad->cached32_node;
cached_iova = container_of(curr, struct iova, node);

- if (free->pfn_lo >= cached_iova->pfn_lo)
- iovad->cached32_node = rb_next(&free->node);
+ if (free->pfn_lo >= cached_iova->pfn_lo) {
+ struct rb_node *node = rb_next(&free->node);
+ struct iova *iova = container_of(node, struct iova, node);
+
+ /* only cache if it's below 32bit pfn */
+ if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
+ iovad->cached32_node = node;
+ else
+ iovad->cached32_node = NULL;
+ }
}

/* Computes the padding size required, to make the
--
1.7.4