2020-11-27 12:35:05

by Hans de Goede

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Hi,

On 11/27/20 12:41 PM, Hans de Goede wrote:
> Hi,
>
> On 11/24/20 11:27 AM, Christoph Hellwig wrote:
>> On Mon, Nov 23, 2020 at 03:49:09PM +0100, Hans de Goede wrote:
>>> Hi,
>>>
>>> +Cc Christoph Hellwig <[email protected]>
>>>
>>> Christoph, this is still an issue, so I've been looking around a bit and think this
>>> might have something to do with the dma-mapping-5.10 changes.
>>>
>>> Do you have any suggestions to debug this, or is it time to do a git bisect
>>> on this before 5.10 ships with regression?
>>
>> Given that DMAR prefix this seems to be about using intel-iommu + bounce
>> buffering for external devices. I can't really think of anything specific
>> in 5.10 related to that, so maybe you'll need to bisect.
>>
>> I doub this means we are actually leaking swiotlb buffers, so while
>> I'm pretty sure we broke something in lower layers this also means
>> xhci doesn't handle swiotlb operation very gracefully in general.
>
> I've done a git bisect, and the result is somewhat surprising. The git-bisect
> points to:
>
> commit 558033c2828f ("uas: fix sdev->host->dma_dev")
>
> Use scsi_add_host_with_dma() instead of scsi_add_host().
>
> When the scsi request queue is initialized/allocated, hw_max_sectors is clamped
> to the dma max mapping size. Therefore, the correct device that should be used
> for the clamping needs to be set.
>
> The same clamping is still needed in uas as hw_max_sectors could be changed
> there. The original clamping would be invalidated in such cases.
>
> I do have an UAS drive connected to the thunderbolt-dock, so I guess that this
> change is causing the UAS driver to gobble all all available swiotlb space.

I ran some more tests, I can confirm that reverting:

5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
558033c2828f "uas: fix sdev->host->dma_dev"

Makes the problem go away while running a 5.10 kernel. I also tried doubling
the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
does not help.

Some more observations:

1. The usb-storage driver does not cause this issue, even though it has a
very similar change.

2. The problem does not happen until I plug an UAS decvice into the dock.

3. The problem continues to happen even after I unplug the UAS device and
rmmod the uas module

3. made me take a bit closer look to the troublesome commit, it passes:
udev->bus->sysdev, which I assume is the XHCI controller itself as device
to scsi_add_host_with_dma, which in turn seems to cause permanent changes
to the dma settings for the XHCI controller. I'm not all that familiar with
the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
do; and that the intended effects (honor XHCI dma limits, but do not cause
any changes the XHCI dma settings) should be achieved differently.

Note that if this is indeed wrong, the matching usb-storage change should
likely also be dropped.

Regards,

Hans


2020-11-27 16:22:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

On Fri, Nov 27, 2020 at 01:32:16PM +0100, Hans de Goede wrote:
> I ran some more tests, I can confirm that reverting:
>
> 5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
> 558033c2828f "uas: fix sdev->host->dma_dev"
>
> Makes the problem go away while running a 5.10 kernel. I also tried doubling
> the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
> does not help.
>
> Some more observations:
>
> 1. The usb-storage driver does not cause this issue, even though it has a
> very similar change.
>
> 2. The problem does not happen until I plug an UAS decvice into the dock.
>
> 3. The problem continues to happen even after I unplug the UAS device and
> rmmod the uas module
>
> 3. made me take a bit closer look to the troublesome commit, it passes:
> udev->bus->sysdev, which I assume is the XHCI controller itself as device
> to scsi_add_host_with_dma, which in turn seems to cause permanent changes
> to the dma settings for the XHCI controller. I'm not all that familiar with
> the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
> device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
> do; and that the intended effects (honor XHCI dma limits, but do not cause
> any changes the XHCI dma settings) should be achieved differently.
>
> Note that if this is indeed wrong, the matching usb-storage change should
> likely also be dropped.

One problem in this area is that the clamping of the DMA size through
dma_max_mapping_size mentioned in the commit log doesn't work when
swiotlb is called from intel-iommu. I think we need to wire up those
calls there as well.

2020-11-27 18:16:54

by Hans de Goede

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Hi,

On 11/27/20 5:19 PM, Christoph Hellwig wrote:
> On Fri, Nov 27, 2020 at 01:32:16PM +0100, Hans de Goede wrote:
>> I ran some more tests, I can confirm that reverting:
>>
>> 5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
>> 558033c2828f "uas: fix sdev->host->dma_dev"
>>
>> Makes the problem go away while running a 5.10 kernel. I also tried doubling
>> the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
>> does not help.
>>
>> Some more observations:
>>
>> 1. The usb-storage driver does not cause this issue, even though it has a
>> very similar change.
>>
>> 2. The problem does not happen until I plug an UAS decvice into the dock.
>>
>> 3. The problem continues to happen even after I unplug the UAS device and
>> rmmod the uas module
>>
>> 3. made me take a bit closer look to the troublesome commit, it passes:
>> udev->bus->sysdev, which I assume is the XHCI controller itself as device
>> to scsi_add_host_with_dma, which in turn seems to cause permanent changes
>> to the dma settings for the XHCI controller. I'm not all that familiar with
>> the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
>> device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
>> do; and that the intended effects (honor XHCI dma limits, but do not cause
>> any changes the XHCI dma settings) should be achieved differently.
>>
>> Note that if this is indeed wrong, the matching usb-storage change should
>> likely also be dropped.
>
> One problem in this area is that the clamping of the DMA size through
> dma_max_mapping_size mentioned in the commit log doesn't work when
> swiotlb is called from intel-iommu. I think we need to wire up those
> calls there as well.

Ok, but that does not sound like a quick last minute fix for 5.10, so maybe
for 5.10 we should just revert the uas and usb-storage changes which trigger
this problem and then retry those for 5.11 ?

Regards,

Hans

2020-11-28 01:44:25

by Tom Yan

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Should we still be clamping max_sectors to dma_max_mapping_size(dev)
(for now)? with dev being us->pusb_dev->bus->sysdev and
devinfo->udev->bus->sysdev respectively (i.e. revert only
scsi_add_host_with_dma() to scsi_add_host())?

On Sat, 28 Nov 2020 at 02:12, Hans de Goede <[email protected]> wrote:
>
> Hi,
>
> On 11/27/20 5:19 PM, Christoph Hellwig wrote:
> > On Fri, Nov 27, 2020 at 01:32:16PM +0100, Hans de Goede wrote:
> >> I ran some more tests, I can confirm that reverting:
> >>
> >> 5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
> >> 558033c2828f "uas: fix sdev->host->dma_dev"
> >>
> >> Makes the problem go away while running a 5.10 kernel. I also tried doubling
> >> the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
> >> does not help.
> >>
> >> Some more observations:
> >>
> >> 1. The usb-storage driver does not cause this issue, even though it has a
> >> very similar change.
> >>
> >> 2. The problem does not happen until I plug an UAS decvice into the dock.
> >>
> >> 3. The problem continues to happen even after I unplug the UAS device and
> >> rmmod the uas module
> >>
> >> 3. made me take a bit closer look to the troublesome commit, it passes:
> >> udev->bus->sysdev, which I assume is the XHCI controller itself as device
> >> to scsi_add_host_with_dma, which in turn seems to cause permanent changes
> >> to the dma settings for the XHCI controller. I'm not all that familiar with
> >> the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
> >> device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
> >> do; and that the intended effects (honor XHCI dma limits, but do not cause
> >> any changes the XHCI dma settings) should be achieved differently.
> >>
> >> Note that if this is indeed wrong, the matching usb-storage change should
> >> likely also be dropped.
> >
> > One problem in this area is that the clamping of the DMA size through
> > dma_max_mapping_size mentioned in the commit log doesn't work when
> > swiotlb is called from intel-iommu. I think we need to wire up those
> > calls there as well.
>
> Ok, but that does not sound like a quick last minute fix for 5.10, so maybe
> for 5.10 we should just revert the uas and usb-storage changes which trigger
> this problem and then retry those for 5.11 ?
>
> Regards,
>
> Hans
>

2020-11-28 22:01:58

by Hans de Goede

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Hi Tom,

On 11/28/20 2:25 AM, Tom Yan wrote:
> Should we still be clamping max_sectors to dma_max_mapping_size(dev)
> (for now)? with dev being us->pusb_dev->bus->sysdev and
> devinfo->udev->bus->sysdev respectively (i.e. revert only
> scsi_add_host_with_dma() to scsi_add_host())?

I would expect that to work / avoid the regression, so yes that is
a good option.

If you can provide me with a patch doing that, then I can test it to
make sure it does indeed fix the regression.

Regards,

Hans


>
> On Sat, 28 Nov 2020 at 02:12, Hans de Goede <[email protected]> wrote:
>>
>> Hi,
>>
>> On 11/27/20 5:19 PM, Christoph Hellwig wrote:
>>> On Fri, Nov 27, 2020 at 01:32:16PM +0100, Hans de Goede wrote:
>>>> I ran some more tests, I can confirm that reverting:
>>>>
>>>> 5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
>>>> 558033c2828f "uas: fix sdev->host->dma_dev"
>>>>
>>>> Makes the problem go away while running a 5.10 kernel. I also tried doubling
>>>> the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
>>>> does not help.
>>>>
>>>> Some more observations:
>>>>
>>>> 1. The usb-storage driver does not cause this issue, even though it has a
>>>> very similar change.
>>>>
>>>> 2. The problem does not happen until I plug an UAS decvice into the dock.
>>>>
>>>> 3. The problem continues to happen even after I unplug the UAS device and
>>>> rmmod the uas module
>>>>
>>>> 3. made me take a bit closer look to the troublesome commit, it passes:
>>>> udev->bus->sysdev, which I assume is the XHCI controller itself as device
>>>> to scsi_add_host_with_dma, which in turn seems to cause permanent changes
>>>> to the dma settings for the XHCI controller. I'm not all that familiar with
>>>> the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
>>>> device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
>>>> do; and that the intended effects (honor XHCI dma limits, but do not cause
>>>> any changes the XHCI dma settings) should be achieved differently.
>>>>
>>>> Note that if this is indeed wrong, the matching usb-storage change should
>>>> likely also be dropped.
>>>
>>> One problem in this area is that the clamping of the DMA size through
>>> dma_max_mapping_size mentioned in the commit log doesn't work when
>>> swiotlb is called from intel-iommu. I think we need to wire up those
>>> calls there as well.
>>
>> Ok, but that does not sound like a quick last minute fix for 5.10, so maybe
>> for 5.10 we should just revert the uas and usb-storage changes which trigger
>> this problem and then retry those for 5.11 ?
>>
>> Regards,
>>
>> Hans
>>
>

2020-11-28 22:03:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Can you give this one-liner a spin?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c6622011d4938c..e889111b55c71d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4007,6 +4007,7 @@ static const struct dma_map_ops bounce_dma_ops = {
.alloc_pages = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.dma_supported = dma_direct_supported,
+ .max_mapping_size = swiotlb_max_mapping_size,
};

static inline int iommu_domain_cache_init(void)

2020-11-28 22:07:00

by Tom Yan

[permalink] [raw]
Subject: [PATCH 1/2] uas: revert from scsi_add_host_with_dma() to scsi_add_host()

Apparently the former (with the chosen dma_dev) may cause problem in certain
case (e.g. where thunderbolt dock and intel iommu are involved). The error
observed was:

XHCI swiotlb buffer is full / DMAR: Device bounce map failed

For now we retain the clamp for hw_max_sectors against the dma_max_mapping_size.
Since the device/size for the clamp that is applied when the scsi request queue
is initialized/allocated is different than the one used here, we invalidate the
early clamping by making a fallback blk_queue_max_hw_sectors() call.

Signed-off-by: Tom Yan <[email protected]>
---
drivers/usb/storage/uas.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index c8a577309e8f..5db1325cea20 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -843,18 +843,21 @@ static int uas_slave_alloc(struct scsi_device *sdev)
static int uas_slave_configure(struct scsi_device *sdev)
{
struct uas_dev_info *devinfo = sdev->hostdata;
- struct device *dev = sdev->host->dma_dev;
+ struct usb_device *udev = devinfo->udev;

if (devinfo->flags & US_FL_MAX_SECTORS_64)
blk_queue_max_hw_sectors(sdev->request_queue, 64);
else if (devinfo->flags & US_FL_MAX_SECTORS_240)
blk_queue_max_hw_sectors(sdev->request_queue, 240);
- else if (devinfo->udev->speed >= USB_SPEED_SUPER)
+ else if (udev->speed >= USB_SPEED_SUPER)
blk_queue_max_hw_sectors(sdev->request_queue, 2048);
+ else
+ blk_queue_max_hw_sectors(sdev->request_queue,
+ SCSI_DEFAULT_MAX_SECTORS);

blk_queue_max_hw_sectors(sdev->request_queue,
min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
- dma_max_mapping_size(dev) >> SECTOR_SHIFT));
+ dma_max_mapping_size(udev->bus->sysdev) >> SECTOR_SHIFT));

if (devinfo->flags & US_FL_NO_REPORT_OPCODES)
sdev->no_report_opcodes = 1;
@@ -1040,7 +1043,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id)
shost->can_queue = devinfo->qdepth - 2;

usb_set_intfdata(intf, shost);
- result = scsi_add_host_with_dma(shost, &intf->dev, udev->bus->sysdev);
+ result = scsi_add_host(shost, &intf->dev);
if (result)
goto free_streams;

--
2.29.2

2020-11-30 08:47:35

by Hans de Goede

[permalink] [raw]
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

Hi,

On 11/28/20 6:15 PM, Christoph Hellwig wrote:
> Can you give this one-liner a spin?
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index c6622011d4938c..e889111b55c71d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4007,6 +4007,7 @@ static const struct dma_map_ops bounce_dma_ops = {
> .alloc_pages = dma_common_alloc_pages,
> .free_pages = dma_common_free_pages,
> .dma_supported = dma_direct_supported,
> + .max_mapping_size = swiotlb_max_mapping_size,
> };
>
> static inline int iommu_domain_cache_init(void)
>

I'm afraid that this does not help.

Also I still find it somewhat wrong that the use of scsi_add_host_with_dma()
in uas.c, which then passed the XHCI controller as dma-dev is causing changes
to the DMA settings of the XHCI controller, impacting *other* USB devices
and these changes also are permanent, they stay around even after unbinding
the uas driver.

This just feels wrong on many levels. If some changes to the XHCI controllers
DMA settings are necessary for better uas performance then these changes
really should be made inside the XHCI driver, so that they always apply and
not have this weirdness going on where binding one USB driver permanently
changes the behavior of the entire USB bus (until rebooted).

Querying the DMA settings of the XHCI controller in the uas driver is fine,
but changing them seems like a big nono to me.

Regards,

Hans

2020-11-30 09:52:07

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 1/2] uas: revert from scsi_add_host_with_dma() to scsi_add_host()

Hi,

On 11/28/20 4:48 PM, Tom Yan wrote:
> Apparently the former (with the chosen dma_dev) may cause problem in certain
> case (e.g. where thunderbolt dock and intel iommu are involved). The error
> observed was:
>
> XHCI swiotlb buffer is full / DMAR: Device bounce map failed
>
> For now we retain the clamp for hw_max_sectors against the dma_max_mapping_size.
> Since the device/size for the clamp that is applied when the scsi request queue
> is initialized/allocated is different than the one used here, we invalidate the
> early clamping by making a fallback blk_queue_max_hw_sectors() call.
>
> Signed-off-by: Tom Yan <[email protected]>

I can confirm that this fixes the network performance on a Lenovo Thunderbolt
dock generation 2, which uses an USB attach NIC.

With this patch added on top of 5.10-rc5 scp performance to another machine
on the local gbit LAN goes back from the regressed 1 MB/s to its original 100MB/s
as it should be:

Tested-by: Hans de Goede <[email protected]>

Regards,

Hans


> ---
> drivers/usb/storage/uas.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
> index c8a577309e8f..5db1325cea20 100644
> --- a/drivers/usb/storage/uas.c
> +++ b/drivers/usb/storage/uas.c
> @@ -843,18 +843,21 @@ static int uas_slave_alloc(struct scsi_device *sdev)
> static int uas_slave_configure(struct scsi_device *sdev)
> {
> struct uas_dev_info *devinfo = sdev->hostdata;
> - struct device *dev = sdev->host->dma_dev;
> + struct usb_device *udev = devinfo->udev;
>
> if (devinfo->flags & US_FL_MAX_SECTORS_64)
> blk_queue_max_hw_sectors(sdev->request_queue, 64);
> else if (devinfo->flags & US_FL_MAX_SECTORS_240)
> blk_queue_max_hw_sectors(sdev->request_queue, 240);
> - else if (devinfo->udev->speed >= USB_SPEED_SUPER)
> + else if (udev->speed >= USB_SPEED_SUPER)
> blk_queue_max_hw_sectors(sdev->request_queue, 2048);
> + else
> + blk_queue_max_hw_sectors(sdev->request_queue,
> + SCSI_DEFAULT_MAX_SECTORS);
>
> blk_queue_max_hw_sectors(sdev->request_queue,
> min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
> - dma_max_mapping_size(dev) >> SECTOR_SHIFT));
> + dma_max_mapping_size(udev->bus->sysdev) >> SECTOR_SHIFT));
>
> if (devinfo->flags & US_FL_NO_REPORT_OPCODES)
> sdev->no_report_opcodes = 1;
> @@ -1040,7 +1043,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id)
> shost->can_queue = devinfo->qdepth - 2;
>
> usb_set_intfdata(intf, shost);
> - result = scsi_add_host_with_dma(shost, &intf->dev, udev->bus->sysdev);
> + result = scsi_add_host(shost, &intf->dev);
> if (result)
> goto free_streams;
>
>

2020-11-30 19:34:02

by Tom Yan

[permalink] [raw]
Subject: Re: [PATCH 1/2] uas: revert from scsi_add_host_with_dma() to scsi_add_host()

Hmm, I wonder if I/we wrongly assumed that the dma_dev used for the
hw_max_sectors clamping in __scsi_init_queue() is wrong.

So instead of adding a fallback else-clause here or using "sysdev" as
dma_dev like in the current upstream code, maybe we should actually do
a three-way min: the "changed" hw_max_sectors, dma_max_mapping_size of
dma_dev("dev") and dma_max_mapping_size of sysdev...?

On Mon, 30 Nov 2020 at 17:48, Hans de Goede <[email protected]> wrote:
>
> Hi,
>
> On 11/28/20 4:48 PM, Tom Yan wrote:
> > Apparently the former (with the chosen dma_dev) may cause problem in certain
> > case (e.g. where thunderbolt dock and intel iommu are involved). The error
> > observed was:
> >
> > XHCI swiotlb buffer is full / DMAR: Device bounce map failed
> >
> > For now we retain the clamp for hw_max_sectors against the dma_max_mapping_size.
> > Since the device/size for the clamp that is applied when the scsi request queue
> > is initialized/allocated is different than the one used here, we invalidate the
> > early clamping by making a fallback blk_queue_max_hw_sectors() call.
> >
> > Signed-off-by: Tom Yan <[email protected]>
>
> I can confirm that this fixes the network performance on a Lenovo Thunderbolt
> dock generation 2, which uses an USB attach NIC.
>
> With this patch added on top of 5.10-rc5 scp performance to another machine
> on the local gbit LAN goes back from the regressed 1 MB/s to its original 100MB/s
> as it should be:
>
> Tested-by: Hans de Goede <[email protected]>
>
> Regards,
>
> Hans
>
>
> > ---
> > drivers/usb/storage/uas.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
> > index c8a577309e8f..5db1325cea20 100644
> > --- a/drivers/usb/storage/uas.c
> > +++ b/drivers/usb/storage/uas.c
> > @@ -843,18 +843,21 @@ static int uas_slave_alloc(struct scsi_device *sdev)
> > static int uas_slave_configure(struct scsi_device *sdev)
> > {
> > struct uas_dev_info *devinfo = sdev->hostdata;
> > - struct device *dev = sdev->host->dma_dev;
> > + struct usb_device *udev = devinfo->udev;
> >
> > if (devinfo->flags & US_FL_MAX_SECTORS_64)
> > blk_queue_max_hw_sectors(sdev->request_queue, 64);
> > else if (devinfo->flags & US_FL_MAX_SECTORS_240)
> > blk_queue_max_hw_sectors(sdev->request_queue, 240);
> > - else if (devinfo->udev->speed >= USB_SPEED_SUPER)
> > + else if (udev->speed >= USB_SPEED_SUPER)
> > blk_queue_max_hw_sectors(sdev->request_queue, 2048);
> > + else
> > + blk_queue_max_hw_sectors(sdev->request_queue,
> > + SCSI_DEFAULT_MAX_SECTORS);
> >
> > blk_queue_max_hw_sectors(sdev->request_queue,
> > min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
> > - dma_max_mapping_size(dev) >> SECTOR_SHIFT));
> > + dma_max_mapping_size(udev->bus->sysdev) >> SECTOR_SHIFT));
> >
> > if (devinfo->flags & US_FL_NO_REPORT_OPCODES)
> > sdev->no_report_opcodes = 1;
> > @@ -1040,7 +1043,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id)
> > shost->can_queue = devinfo->qdepth - 2;
> >
> > usb_set_intfdata(intf, shost);
> > - result = scsi_add_host_with_dma(shost, &intf->dev, udev->bus->sysdev);
> > + result = scsi_add_host(shost, &intf->dev);
> > if (result)
> > goto free_streams;
> >
> >
>

2020-12-01 11:14:35

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 1/2] uas: revert from scsi_add_host_with_dma() to scsi_add_host()

Hi,

On 11/30/20 8:30 PM, Tom Yan wrote:
> Hmm, I wonder if I/we wrongly assumed that the dma_dev used for the
> hw_max_sectors clamping in __scsi_init_queue() is wrong.
>
> So instead of adding a fallback else-clause here or using "sysdev" as
> dma_dev like in the current upstream code, maybe we should actually do
> a three-way min: the "changed" hw_max_sectors, dma_max_mapping_size of
> dma_dev("dev") and dma_max_mapping_size of sysdev...?

So both here and in the 2/2 patch thread there are lots of open questions,
which to me suggests that for 5.10 we really should just go with the 3 reverts
which I suggested earlier.

Regards,

Hans



>
> On Mon, 30 Nov 2020 at 17:48, Hans de Goede <[email protected]> wrote:
>>
>> Hi,
>>
>> On 11/28/20 4:48 PM, Tom Yan wrote:
>>> Apparently the former (with the chosen dma_dev) may cause problem in certain
>>> case (e.g. where thunderbolt dock and intel iommu are involved). The error
>>> observed was:
>>>
>>> XHCI swiotlb buffer is full / DMAR: Device bounce map failed
>>>
>>> For now we retain the clamp for hw_max_sectors against the dma_max_mapping_size.
>>> Since the device/size for the clamp that is applied when the scsi request queue
>>> is initialized/allocated is different than the one used here, we invalidate the
>>> early clamping by making a fallback blk_queue_max_hw_sectors() call.
>>>
>>> Signed-off-by: Tom Yan <[email protected]>
>>
>> I can confirm that this fixes the network performance on a Lenovo Thunderbolt
>> dock generation 2, which uses an USB attach NIC.
>>
>> With this patch added on top of 5.10-rc5 scp performance to another machine
>> on the local gbit LAN goes back from the regressed 1 MB/s to its original 100MB/s
>> as it should be:
>>
>> Tested-by: Hans de Goede <[email protected]>
>>
>> Regards,
>>
>> Hans
>>
>>
>>> ---
>>> drivers/usb/storage/uas.c | 11 +++++++----
>>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
>>> index c8a577309e8f..5db1325cea20 100644
>>> --- a/drivers/usb/storage/uas.c
>>> +++ b/drivers/usb/storage/uas.c
>>> @@ -843,18 +843,21 @@ static int uas_slave_alloc(struct scsi_device *sdev)
>>> static int uas_slave_configure(struct scsi_device *sdev)
>>> {
>>> struct uas_dev_info *devinfo = sdev->hostdata;
>>> - struct device *dev = sdev->host->dma_dev;
>>> + struct usb_device *udev = devinfo->udev;
>>>
>>> if (devinfo->flags & US_FL_MAX_SECTORS_64)
>>> blk_queue_max_hw_sectors(sdev->request_queue, 64);
>>> else if (devinfo->flags & US_FL_MAX_SECTORS_240)
>>> blk_queue_max_hw_sectors(sdev->request_queue, 240);
>>> - else if (devinfo->udev->speed >= USB_SPEED_SUPER)
>>> + else if (udev->speed >= USB_SPEED_SUPER)
>>> blk_queue_max_hw_sectors(sdev->request_queue, 2048);
>>> + else
>>> + blk_queue_max_hw_sectors(sdev->request_queue,
>>> + SCSI_DEFAULT_MAX_SECTORS);
>>>
>>> blk_queue_max_hw_sectors(sdev->request_queue,
>>> min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
>>> - dma_max_mapping_size(dev) >> SECTOR_SHIFT));
>>> + dma_max_mapping_size(udev->bus->sysdev) >> SECTOR_SHIFT));
>>>
>>> if (devinfo->flags & US_FL_NO_REPORT_OPCODES)
>>> sdev->no_report_opcodes = 1;
>>> @@ -1040,7 +1043,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id)
>>> shost->can_queue = devinfo->qdepth - 2;
>>>
>>> usb_set_intfdata(intf, shost);
>>> - result = scsi_add_host_with_dma(shost, &intf->dev, udev->bus->sysdev);
>>> + result = scsi_add_host(shost, &intf->dev);
>>> if (result)
>>> goto free_streams;
>>>
>>>
>>
>