Hi folks,
Today I noticed two issues with dwc3 on PCI-based systems which, while
debugging, I uncovered some details which we might want to change,
however I need a little guidance here.
The first problem is that when running with intel-iommu disable and
falling back to swiotlb, I can easily run out of space for DMA
mapping/unmapping:
[ 574.862949] DMA: Out of SW-IOMMU space for 217 bytes at device dwc3.0.auto
[ 574.870820] dwc3 dwc3.0.auto: failed to map buffer
I checked that I'm not leaking any of the mapped buffers and they're all
balanced with a matching unmap call.
The second problem is that when enabling intel-iommu then I can't
allocate from coherent:
[ 81.797657] DMAR: Allocating domain for dwc3.0.auto failed
[ 81.803980] dwc3 dwc3.0.auto: can't allocate event buffer
[ 81.810221] dwc3 dwc3.0.auto: failed to allocate event buffers
The reason for that I'm using a manually created platform_device and
that misses dev->archdata which the underlying/parent PCI device has.
Here I have two options:
1) continue to use my manually allocated platform_device pointer for DMA
operations and just copy necessary bits from the parent PCI device:
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index adc1e8a624cb..011d0055abd0 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -174,6 +174,14 @@ static int dwc3_pci_probe(struct pci_dev *pci,
if (ret)
goto err;
+ dwc3->dev.dma_mask = &pci->dma_mask;
+ dwc3->dev.dma_parms = &pci->dma_parms;
+
+ /* is there a better way ?? */
+ memcpy(&dwc3->dev.archdata, &dev->archdata, sizeof(dev->archdata));
+
+ dma_set_coherent_mask(&dwc3->dev, pci->dma_mask);
+
dwc3->dev.parent = dev;
ACPI_COMPANION_SET(&dwc3->dev, ACPI_COMPANION(dev));
This works fine with intel-iommu, I just tested.
2) map/unmap using the parent PCI device. IOW:
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 143deb420481..a4e4b0417bf3 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -967,7 +967,7 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
u32 transfer_size = 0;
u32 maxpacket;
- ret = usb_gadget_map_request(&dwc->gadget, &req->request,
+ ret = usb_gadget_map_request_by_dev(dwc->dev, parent, &req->request,
dep->number);
if (ret) {
dwc3_trace(trace_dwc3_ep0, "failed to map request\n");
@@ -995,7 +995,7 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
dwc->ep0_bounce_addr, transfer_size,
DWC3_TRBCTL_CONTROL_DATA, false);
} else {
- ret = usb_gadget_map_request(&dwc->gadget, &req->request,
+ ret = usb_gadget_map_request_by_dev(dwc->dev, parent, &req->request,
dep->number);
if (ret) {
dwc3_trace(trace_dwc3_ep0, "failed to map request\n");
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index e39f29bd2fff..0732d14d2687 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -191,7 +191,7 @@ void dwc3_gadget_giveback(struct dwc3_ep *dep, struct dwc3_request *req,
if (dwc->ep0_bounced && dep->number == 0)
dwc->ep0_bounced = false;
else
- usb_gadget_unmap_request(&dwc->gadget, &req->request,
+ usb_gadget_unmap_request_by_dev(dwc->dev->parent, &req->request,
req->direction);
trace_dwc3_gadget_giveback(req);
This I haven't tested, but it should work.
Anyway, the question is: which of the two approaches is preferred ?
cheers
ps: I haven't debugged why I'm swiotlb error, that's in my list for
after $subject gets solved.
--
balbi
On Tue, 2016-04-19 at 14:38 +0300, Felipe Balbi wrote:
>
> The reason for that I'm using a manually created platform_device and
> that misses dev->archdata which the underlying/parent PCI device has.
Typically we'd expect you to use the parent device for DMA, as in your
second option.
That said, we're exploring the option of moving the dma_ops to be a
first-class member of 'struct device' instead of hiding it in archdata,
and cleaning up the way that it gets initialised for newly-created
devices. And at that point we might end up letting it get inherited
from the parent so your original code *would* work... but I wouldn't
hold your breath for that.
Definitely *don't* mess around in archdata.
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
Hi,
David Woodhouse <[email protected]> writes:
> On Tue, 2016-04-19 at 14:38 +0300, Felipe Balbi wrote:
>>
>> The reason for that I'm using a manually created platform_device and
>> that misses dev->archdata which the underlying/parent PCI device has.
>
> Typically we'd expect you to use the parent device for DMA, as in your
> second option.
>
> That said, we're exploring the option of moving the dma_ops to be a
> first-class member of 'struct device' instead of hiding it in archdata,
> and cleaning up the way that it gets initialised for newly-created
> devices. And at that point we might end up letting it get inherited
> from the parent so your original code *would* work... but I wouldn't
> hold your breath for that.
>
> Definitely *don't* mess around in archdata.
alright, I'll patch it up to use parent device everywhere, at least for
now.
Thanks
--
balbi
Hi again,
Felipe Balbi <[email protected]> writes:
> David Woodhouse <[email protected]> writes:
>> On Tue, 2016-04-19 at 14:38 +0300, Felipe Balbi wrote:
>>>
>>> The reason for that I'm using a manually created platform_device and
>>> that misses dev->archdata which the underlying/parent PCI device has.
>>
>> Typically we'd expect you to use the parent device for DMA, as in your
>> second option.
>>
>> That said, we're exploring the option of moving the dma_ops to be a
>> first-class member of 'struct device' instead of hiding it in archdata,
>> and cleaning up the way that it gets initialised for newly-created
>> devices. And at that point we might end up letting it get inherited
>> from the parent so your original code *would* work... but I wouldn't
>> hold your breath for that.
>>
>> Definitely *don't* mess around in archdata.
>
> alright, I'll patch it up to use parent device everywhere, at least for
> now.
reviving this a little bit, it seems like inheritance of DMA bits from
parent is the way to go in the future. Let's consider a dual-role
instance of dwc3:
The peripheral IP is Synopsys' proprietary and gets built into dwc3.ko
which is a child device of a parent dwc3-pci.ko (in case of intel, at
least).
The host side, is regular XHCI, so dwc3.ko creates yet another child
device for xhci-plat.ko. The parent-child tree ends up like so:
dwc3-pci
|_ dwc3
|_ xhci-plat
dma for dwc3.ko is simple(-ish), instead of dma_alloc_coherent(dev,
....), we just replace that with dma_alloc_coherent(dev->parent, ...).
As for XHCI, which is a generic device also used directly by PCI
devices, the problem is more peculiar.
For dwc3's xHCI, we would have to call
dma_alloc_coherent(dev->parent->parent, ...), but that would break
regular, non-dwc3 XHCI blocks which have direct access to the pci
device.
So, for dwc3.ko, I've fixed this with [1], however, if I remove copying
of DMA bits from parent to child, xhci-plat.ko will regress. Any hints
for this particular situation ?
The solution, IMO, is to either automatic copying of parent's DMA bits
to child by the time it's registered, or make DMA API search the device
parent tree until it finds a parent with a suitable DMA configuration.
[1] https://marc.info/?l=linux-usb&m=146107237232681&w=2
--
balbi