After 19943b0e (intel-iommu: Unify hardware and software passthrough
support) hardware pass-through mode devices make use of intel_dma_ops
rather than swiotlb_dma_ops. The problem is that intel_alloc_coherent
ignores the device coherent_dma_mask when allocating the page since it
expects to remap the page and provide the device with an iova within the
coherent mask. This breaks when we use pass-through.
The patch below crudely works around the problem, but I hope we can come
up with something better without reintroducing the dependency on
swiotlb. The device hitting this problem is an HP smart array
controller on a Proliant G6 system. It uses a default 32bit coherent
DMA mask, and stalls, presumably waiting on control data to change in
the wrong address space, when it gets a coherent buffer above 4G. This
device also doesn't exactly play nice when using VT-d in anything other
than pass-through mode, so switching it into mapped mode is not really
an option.
Signed-off-by: Alex Williamson <[email protected]>
---
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..7a739ac 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2767,7 +2767,11 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
size = PAGE_ALIGN(size);
order = get_order(size);
- flags &= ~(GFP_DMA | GFP_DMA32);
+
+ if (iommu_no_mapping(hwdev))
+ flags |= (GFP_DMA | GFP_DMA32);
+ else
+ flags &= ~(GFP_DMA | GFP_DMA32);
vaddr = (void *)__get_free_pages(flags, order);
if (!vaddr)
On Wed, 2009-10-21 at 21:41 -0600, Alex Williamson wrote:
> After 19943b0e (intel-iommu: Unify hardware and software passthrough
> support) hardware pass-through mode devices make use of intel_dma_ops
> rather than swiotlb_dma_ops. The problem is that intel_alloc_coherent
> ignores the device coherent_dma_mask when allocating the page since it
> expects to remap the page and provide the device with an iova within the
> coherent mask. This breaks when we use pass-through.
>
> The patch below crudely works around the problem, but I hope we can come
> up with something better without reintroducing the dependency on
> swiotlb. The device hitting this problem is an HP smart array
> controller on a Proliant G6 system. It uses a default 32bit coherent
> DMA mask, and stalls, presumably waiting on control data to change in
> the wrong address space, when it gets a coherent buffer above 4G. This
> device also doesn't exactly play nice when using VT-d in anything other
> than pass-through mode, so switching it into mapped mode is not really
> an option.
I don't understand. If your device can't cope with 64-bit addresses,
then surely iommu_no_mapping() should return _false_ for it. And we'll
actually use the IOMMU even though we're generally in passthrough mode.
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
On Thu, 2009-10-22 at 15:28 +0900, David Woodhouse wrote:
> On Wed, 2009-10-21 at 21:41 -0600, Alex Williamson wrote:
> > After 19943b0e (intel-iommu: Unify hardware and software passthrough
> > support) hardware pass-through mode devices make use of intel_dma_ops
> > rather than swiotlb_dma_ops. The problem is that intel_alloc_coherent
> > ignores the device coherent_dma_mask when allocating the page since it
> > expects to remap the page and provide the device with an iova within the
> > coherent mask. This breaks when we use pass-through.
> >
> > The patch below crudely works around the problem, but I hope we can come
> > up with something better without reintroducing the dependency on
> > swiotlb. The device hitting this problem is an HP smart array
> > controller on a Proliant G6 system. It uses a default 32bit coherent
> > DMA mask, and stalls, presumably waiting on control data to change in
> > the wrong address space, when it gets a coherent buffer above 4G. This
> > device also doesn't exactly play nice when using VT-d in anything other
> > than pass-through mode, so switching it into mapped mode is not really
> > an option.
>
> I don't understand. If your device can't cope with 64-bit addresses,
> then surely iommu_no_mapping() should return _false_ for it. And we'll
> actually use the IOMMU even though we're generally in passthrough mode.
The coherent_dma_mask is independent of the dma_mask and can be set
separately by the device. The default for any device that doesn't
specify one is 32bits. iommu_should_identity_map() only checks the
dma_mask, not the coherent_dma_mask. We never see "32bit %s uses
non-identity mapping" for this device, and I suspect if we did it also
wouldn't work because of the mentioned unfriendliness in mapped mode.
Since coherent mappings are generally a slow path, I think we should be
making more effort to get a usable buffer before dumping the device out
of passthrough mode. BTW, we skip RMRR setup when doing hardware
pass-through, but I can't find where they get reloaded if we then end up
removing the device from the si_domain. Is this another issue? Thanks,
Alex
On Thu, 2009-10-22 at 06:24 -0600, Alex Williamson wrote:
> The coherent_dma_mask is independent of the dma_mask and can be set
> separately by the device. The default for any device that doesn't
> specify one is 32bits. iommu_should_identity_map() only checks the
> dma_mask, not the coherent_dma_mask.
Are you telling me that this particular device supports only a 32-bit
coherent DMA mask, but that it _does_ support a 64-bit DMA mask for
non-coherent DMA? On x86?
> BTW, we skip RMRR setup when doing hardware pass-through, but I can't
> find where they get reloaded if we then end up removing the device
> from the si_domain. Is this another issue?
Maybe, theoretically. In practice, the whole RMRR thing is just broken
by design anyway. We have to quiesce the offending devices before we
turn on the IOMMU, because BIOSes tend to leave things out of the RMRR
table... and then crash in SMM mode when their DMA goes AWOL.
--
dwmw2
On Thu, 22 Oct 2009 23:47:47 +0900
David Woodhouse <[email protected]> wrote:
> On Thu, 2009-10-22 at 06:24 -0600, Alex Williamson wrote:
> > The coherent_dma_mask is independent of the dma_mask and can be set
> > separately by the device. The default for any device that doesn't
> > specify one is 32bits. iommu_should_identity_map() only checks the
> > dma_mask, not the coherent_dma_mask.
>
> Are you telling me that this particular device supports only a 32-bit
> coherent DMA mask, but that it _does_ support a 64-bit DMA mask for
> non-coherent DMA? On x86?
This is not related with architectures. This card support 64 bit
addresses but doesn't support 64 bit addresses for descriptors, etc.
There are several hardware that have such limit.
On Thu, 2009-10-22 at 23:47 +0900, David Woodhouse wrote:
> On Thu, 2009-10-22 at 06:24 -0600, Alex Williamson wrote:
> > The coherent_dma_mask is independent of the dma_mask and can be set
> > separately by the device. The default for any device that doesn't
> > specify one is 32bits. iommu_should_identity_map() only checks the
> > dma_mask, not the coherent_dma_mask.
>
> Are you telling me that this particular device supports only a 32-bit
> coherent DMA mask, but that it _does_ support a 64-bit DMA mask for
> non-coherent DMA? On x86?
Yes, yes. AFAIK, this is not that exceptional.
> > BTW, we skip RMRR setup when doing hardware pass-through, but I can't
> > find where they get reloaded if we then end up removing the device
> > from the si_domain. Is this another issue?
>
> Maybe, theoretically. In practice, the whole RMRR thing is just broken
> by design anyway. We have to quiesce the offending devices before we
> turn on the IOMMU, because BIOSes tend to leave things out of the RMRR
> table... and then crash in SMM mode when their DMA goes AWOL.
Hmm, we've had a lot of trouble getting our RMRRs to reflect shared
memory regions correctly, so I'm reluctant to just drop them like that.
Another issue, iommu_should_identity_map() only dumps devices if their
dma_mask is 32bit or less, but being greater than 32bits does not imply
64bit DMA support. If the device exports a DMA mask that's less than
the physical address width of the processor, you might be in trouble
(for example, a 40bit dma_mask on a platform that supports 44bits for
physical memory). Maybe dropping swiotlb out of the picture isn't such
a clean solution? Thanks,
Alex
On Thu, 2009-10-22 at 09:01 -0600, Alex Williamson wrote:
> On Thu, 2009-10-22 at 23:47 +0900, David Woodhouse wrote:
> > On Thu, 2009-10-22 at 06:24 -0600, Alex Williamson wrote:
> > > The coherent_dma_mask is independent of the dma_mask and can be set
> > > separately by the device. The default for any device that doesn't
> > > specify one is 32bits. iommu_should_identity_map() only checks the
> > > dma_mask, not the coherent_dma_mask.
> >
> > Are you telling me that this particular device supports only a 32-bit
> > coherent DMA mask, but that it _does_ support a 64-bit DMA mask for
> > non-coherent DMA? On x86?
>
> Yes, yes. AFAIK, this is not that exceptional.
Tomonori-san's explanation makes that make a little more sense. :)
> > > BTW, we skip RMRR setup when doing hardware pass-through, but I can't
> > > find where they get reloaded if we then end up removing the device
> > > from the si_domain. Is this another issue?
> >
> > Maybe, theoretically. In practice, the whole RMRR thing is just broken
> > by design anyway. We have to quiesce the offending devices before we
> > turn on the IOMMU, because BIOSes tend to leave things out of the RMRR
> > table... and then crash in SMM mode when their DMA goes AWOL.
>
> Hmm, we've had a lot of trouble getting our RMRRs to reflect shared
> memory regions correctly, so I'm reluctant to just drop them like that.
For devices which are still doing DMA on behalf of the BIOS when the
kernel is _running_? And for which the Linux kernel has an active driver
of its own _too_?
Words cannot describe the horror of what you seem to be telling me...
It would be possible for us to rescan the RMRR tables when we take a
device out of the si_domain, if we _really_ have to. But I'm going to
want a strand of hair from the engineer responsible for that design, for
my voodoo doll.
> Another issue, iommu_should_identity_map() only dumps devices if their
> dma_mask is 32bit or less, but being greater than 32bits does not imply
> 64bit DMA support. If the device exports a DMA mask that's less than
> the physical address width of the processor, you might be in trouble
> (for example, a 40bit dma_mask on a platform that supports 44bits for
> physical memory).
That should be comparing against the maximum physical memory address
rather than against DMA_BIT_MASK(32). I thought I'd done that, actually
-- but it seems not.
That approach isn't perfect if memory above the threshold is later
hotplugged -- but I'm prepared to declare that if you deliberately
disable the IOMMU and then insert memory at a higher address than your
devices can cope with, you get to keep both pieces.
> Maybe dropping swiotlb out of the picture isn't such
> a clean solution? Thanks,
Well, there are _other_ reasons why we want to keep swiotlb around --
the case where a BIOS bug causes us to have to abort the VT-d setup and
fall back to swiotlb. Currently we fall back to nommu and die horribly.
You don't really need full swiotlb support for the case you're
describing, do you? Using dma_generic_alloc_coherent() ought to be
perfectly sufficient?
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..773a662 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2765,6 +2765,10 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
void *vaddr;
int order;
+ if (iommu_no_mapping(hwdev))
+ return dma_generic_alloc_coherent(hwdev, size, dma_handle,
+ flags);
+
size = PAGE_ALIGN(size);
order = get_order(size);
flags &= ~(GFP_DMA | GFP_DMA32);
(That won't build on IA64)
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
On Wed, 2009-10-21 at 21:41 -0600, Alex Williamson wrote:
> This device also doesn't exactly play nice when using VT-d in anything other
> than pass-through mode, so switching it into mapped mode is not really
> an option.
Going back to this, now I'm slightly more awake and spotted it...
Er, why? Can you define "doesn't exactly play nice"?
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
On Fri, 23 Oct 2009 00:49:01 +0900
David Woodhouse <[email protected]> wrote:
> On Thu, 2009-10-22 at 09:01 -0600, Alex Williamson wrote:
> > On Thu, 2009-10-22 at 23:47 +0900, David Woodhouse wrote:
> > > On Thu, 2009-10-22 at 06:24 -0600, Alex Williamson wrote:
> > > > The coherent_dma_mask is independent of the dma_mask and can be set
> > > > separately by the device. The default for any device that doesn't
> > > > specify one is 32bits. iommu_should_identity_map() only checks the
> > > > dma_mask, not the coherent_dma_mask.
> > >
> > > Are you telling me that this particular device supports only a 32-bit
> > > coherent DMA mask, but that it _does_ support a 64-bit DMA mask for
> > > non-coherent DMA? On x86?
> >
> > Yes, yes. AFAIK, this is not that exceptional.
>
> Tomonori-san's explanation makes that make a little more sense. :)
>
> > > > BTW, we skip RMRR setup when doing hardware pass-through, but I can't
> > > > find where they get reloaded if we then end up removing the device
> > > > from the si_domain. Is this another issue?
> > >
> > > Maybe, theoretically. In practice, the whole RMRR thing is just broken
> > > by design anyway. We have to quiesce the offending devices before we
> > > turn on the IOMMU, because BIOSes tend to leave things out of the RMRR
> > > table... and then crash in SMM mode when their DMA goes AWOL.
> >
> > Hmm, we've had a lot of trouble getting our RMRRs to reflect shared
> > memory regions correctly, so I'm reluctant to just drop them like that.
>
> For devices which are still doing DMA on behalf of the BIOS when the
> kernel is _running_? And for which the Linux kernel has an active driver
> of its own _too_?
>
> Words cannot describe the horror of what you seem to be telling me...
>
> It would be possible for us to rescan the RMRR tables when we take a
> device out of the si_domain, if we _really_ have to. But I'm going to
> want a strand of hair from the engineer responsible for that design, for
> my voodoo doll.
>
> > Another issue, iommu_should_identity_map() only dumps devices if their
> > dma_mask is 32bit or less, but being greater than 32bits does not imply
> > 64bit DMA support. If the device exports a DMA mask that's less than
> > the physical address width of the processor, you might be in trouble
> > (for example, a 40bit dma_mask on a platform that supports 44bits for
> > physical memory).
>
> That should be comparing against the maximum physical memory address
> rather than against DMA_BIT_MASK(32). I thought I'd done that, actually
> -- but it seems not.
>
> That approach isn't perfect if memory above the threshold is later
> hotplugged -- but I'm prepared to declare that if you deliberately
> disable the IOMMU and then insert memory at a higher address than your
> devices can cope with, you get to keep both pieces.
>
> > Maybe dropping swiotlb out of the picture isn't such
> > a clean solution? Thanks,
>
> Well, there are _other_ reasons why we want to keep swiotlb around --
> the case where a BIOS bug causes us to have to abort the VT-d setup and
> fall back to swiotlb. Currently we fall back to nommu and die horribly.
>
> You don't really need full swiotlb support for the case you're
> describing, do you? Using dma_generic_alloc_coherent() ought to be
> perfectly sufficient?
>
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index b1e97e6..773a662 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -2765,6 +2765,10 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
> void *vaddr;
> int order;
>
> + if (iommu_no_mapping(hwdev))
> + return dma_generic_alloc_coherent(hwdev, size, dma_handle,
> + flags);
> +
> size = PAGE_ALIGN(size);
> order = get_order(size);
> flags &= ~(GFP_DMA | GFP_DMA32);
>
> (That won't build on IA64)
>
This is a box-killing post-2.6.31 regression, yes?
Alex appears to have disappeared on us, making it rather hard to fix
this up promptly.
On Tue, Nov 3, 2009 at 4:57 PM, Andrew Morton <[email protected]> wrote:
> On Fri, 23 Oct 2009 00:49:01 +0900
> David Woodhouse <[email protected]> wrote:
>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>> index b1e97e6..773a662 100644
>> --- a/drivers/pci/intel-iommu.c
>> +++ b/drivers/pci/intel-iommu.c
>> @@ -2765,6 +2765,10 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
>> void *vaddr;
>> int order;
>>
>> + if (iommu_no_mapping(hwdev))
>> + return dma_generic_alloc_coherent(hwdev, size, dma_handle,
>> + flags);
>> +
>> size = PAGE_ALIGN(size);
>> order = get_order(size);
>> flags &= ~(GFP_DMA | GFP_DMA32);
>>
>> (That won't build on IA64)
>>
>
> This is a box-killing post-2.6.31 regression, yes?
>
> Alex appears to have disappeared on us, making it rather hard to fix
> this up promptly.
Nope, I'm still here. David and I chatted about some of the issues
and I sent out a series of patches last week to address them. See
this thread:
https://lists.linux-foundation.org/pipermail/iommu/2009-October/001851.html
The first patch, attempting to implement the above was a non-starter
and dropped. I believe the other patches in the series are valid and
I'd certainly like to see at least the alloc_coherent part fixed for
.32. I'll be happy to send out a revised set if there's any confusion
of where things stand. Thanks,
Alex