This series addresses the following problems:
- Fix receive buffer corruptions
- Fix receive transfers with 0 byte transfer length
- Abort transactions after unknown receive errors
if the receive buffer is full
- Reduce "trimming xfer length" logging noise
The problems fixed with this series were observed when connecting
a DM9600 Ethernet adapter to Veyron Chromebooks such as the ASUS
Chromebook C201PA. The series was tested extensively with this and
other adapters.
The observed problems are also reported when tethering various
phones, so test coverage with such phones would be very appreciated.
----------------------------------------------------------------
Guenter Roeck (4):
usb: dwc2: Simplify and fix DMA alignment code
usb: dwc2: Do not update data length if it is 0 on inbound transfers
usb: dwc2: Abort transaction after errors with unknown reason
usb: dwc2: Make "trimming xfer length" a debug message
drivers/usb/dwc2/hcd.c | 82 ++++++++++++++++++++++++---------------------
drivers/usb/dwc2/hcd_intr.c | 14 +++++++-
2 files changed, 56 insertions(+), 40 deletions(-)
The DWC2 documentation states that transfers with zero data length should
set the number of packets to 1 and the transfer length to 0. This is not
currently the case for inbound transfers: the transfer length is set to
the maximum packet length. This can have adverse effects if the chip
actually does transfer data as it is programmed to do. Follow chip
documentation and keep the transfer length set to 0 in that situation.
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/usb/dwc2/hcd.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index f6d8cc9cee34..506fdffd82ab 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -1313,19 +1313,20 @@ static void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg,
if (num_packets > max_hc_pkt_count) {
num_packets = max_hc_pkt_count;
chan->xfer_len = num_packets * chan->max_packet;
+ } else if (chan->ep_is_in) {
+ /*
+ * Always program an integral # of max packets
+ * for IN transfers.
+ * Note: This assumes that the input buffer is
+ * aligned and sized accordingly.
+ */
+ chan->xfer_len = num_packets * chan->max_packet;
}
} else {
/* Need 1 packet for transfer length of 0 */
num_packets = 1;
}
- if (chan->ep_is_in)
- /*
- * Always program an integral # of max packets for IN
- * transfers
- */
- chan->xfer_len = num_packets * chan->max_packet;
-
if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
chan->ep_type == USB_ENDPOINT_XFER_ISOC)
/*
--
2.17.1
In some situations, the following error messages are reported.
dwc2 ff540000.usb: dwc2_hc_chhltd_intr_dma: Channel 1 - ChHltd set, but reason is unknown
dwc2 ff540000.usb: hcint 0x00000002, intsts 0x04000021
This is sometimes followed by:
dwc2 ff540000.usb: dwc2_update_urb_state_abn(): trimming xfer length
and then:
WARNING: CPU: 0 PID: 0 at kernel/v4.19/drivers/usb/dwc2/hcd.c:2913
dwc2_assign_and_init_hc+0x98c/0x990
The warning suggests that an odd buffer address is to be used for DMA.
After an error is observed, the receive buffer may be full
(urb->actual_length >= urb->length). However, the urb is still left in
the queue unless three errors were observed in a row. When it is queued
again, the dwc2 hcd code translates this into a 1-block transfer.
If urb->actual_length (ie the total expected receive length) is not
DMA-aligned, the buffer pointer programmed into the chip will be
unaligned. This results in the observed warning.
To solve the problem, abort input transactions after an error with
unknown cause if the entire packet was already received. This may be
a bit drastic, but we don't really know why the transfer was aborted
even though the entire packet was received. Aborting the transfer in
this situation is less risky than accepting a potentially corrupted
packet.
With this patch in place, the 'ChHltd set' and 'trimming xfer length'
messages are still observed, but there are no more transfer attempts
with odd buffer addresses.
Cc: Boris ARZUR <[email protected]>
Cc: Douglas Anderson <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/usb/dwc2/hcd_intr.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index a052d39b4375..12819e019e13 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -1977,6 +1977,18 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg *hsotg,
qtd->error_count++;
dwc2_update_urb_state_abn(hsotg, chan, chnum, qtd->urb,
qtd, DWC2_HC_XFER_XACT_ERR);
+ /*
+ * We can get here after a completed transaction
+ * (urb->actual_length >= urb->length) which was not reported
+ * as completed. If that is the case, and we do not abort
+ * the transfer, a transfer of size 0 will be enqueued
+ * subsequently. If urb->actual_length is not DMA-aligned,
+ * the buffer will then point to an unaligned address, and
+ * the resulting behavior is undefined. Bail out in that
+ * situation.
+ */
+ if (qtd->urb->actual_length >= qtd->urb->length)
+ qtd->error_count = 3;
dwc2_hcd_save_data_toggle(hsotg, chan, chnum, qtd);
dwc2_halt_channel(hsotg, chan, qtd, DWC2_HC_XFER_XACT_ERR);
}
--
2.17.1
The code to align buffers for DMA was first introduced with commit
3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way").
It was updated with commit 56406e017a88 ("usb: dwc2: Fix DMA alignment
to start at allocated boundary") because it did not really align buffers to
DMA boundaries but to word offsets. This was then optimized in commit
1e111e885238 ("usb: dwc2: Fix inefficient copy of unaligned buffers")
to only copy actual data rather than the whole buffer. Commit 4a4863bf2e79
("usb: dwc2: Fix DMA cache alignment issues") changed this further to add
a padding at the end of the buffer to ensure that the old data pointer is
not in the same cache line as the buffer.
This last commit states "Otherwise, the stored_xfer_buffer gets corrupted
for IN URBs on non-cache-coherent systems". However, such corruptions are
still observed. This suggests that the commit may have been hiding a
problem rather than fixing it. Further analysis shows that this is indeed
the case: The code in dwc2_hc_start_transfer() assumes that the transfer
size is a multiple of wMaxPacketSize, and rounds up the transfer size
communicated to the chip accordingly. Added debug code confirms that
the chip does under some circumstances indeed send more data than requested
in the urb receive buffer size.
On top of that, it turns out that buffers are still not guaranteed to be
aligned to dma_get_cache_alignment(), but to DWC2_USB_DMA_ALIGN (4).
Further debugging shows that packets aligned to DWC2_USB_DMA_ALIGN
but not to dma_get_cache_alignment() are indeed common and work just fine.
This suggests that commit 56406e017a88 was not really necessary because
even without it packets were already aligned to DWC2_USB_DMA_ALIGN.
To simplify the code, move the old data pointer back to the beginning of
the new buffer, restoring most of the original commit. Stop aligning the
buffer to dma_get_cache_alignment() since it isn't needed and only makes
the code more complex. Instead, ensure that the allocated buffer is a
multiple of wMaxPacketSize to ensure that the chip does not write beyond
the end of the buffer.
Cc: Douglas Anderson <[email protected]>
Cc: Boris Arzur <[email protected]>
Fixes: 56406e017a88 ("usb: dwc2: Fix DMA alignment to start at allocated boundary")
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/usb/dwc2/hcd.c | 67 ++++++++++++++++++++++--------------------
1 file changed, 35 insertions(+), 32 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index b90f858af960..f6d8cc9cee34 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2471,19 +2471,21 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
#define DWC2_USB_DMA_ALIGN 4
+struct dma_aligned_buffer {
+ void *old_xfer_buffer;
+ u8 data[0];
+};
+
static void dwc2_free_dma_aligned_buffer(struct urb *urb)
{
- void *stored_xfer_buffer;
+ struct dma_aligned_buffer *dma;
size_t length;
if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
return;
- /* Restore urb->transfer_buffer from the end of the allocated area */
- memcpy(&stored_xfer_buffer,
- PTR_ALIGN(urb->transfer_buffer + urb->transfer_buffer_length,
- dma_get_cache_alignment()),
- sizeof(urb->transfer_buffer));
+ dma = container_of(urb->transfer_buffer,
+ struct dma_aligned_buffer, data);
if (usb_urb_dir_in(urb)) {
if (usb_pipeisoc(urb->pipe))
@@ -2491,49 +2493,50 @@ static void dwc2_free_dma_aligned_buffer(struct urb *urb)
else
length = urb->actual_length;
- memcpy(stored_xfer_buffer, urb->transfer_buffer, length);
+ memcpy(dma->old_xfer_buffer, dma->data, length);
}
- kfree(urb->transfer_buffer);
- urb->transfer_buffer = stored_xfer_buffer;
+ urb->transfer_buffer = dma->old_xfer_buffer;
+ kfree(dma);
urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}
static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- void *kmalloc_ptr;
+ struct dma_aligned_buffer *dma;
size_t kmalloc_size;
- if (urb->num_sgs || urb->sg ||
- urb->transfer_buffer_length == 0 ||
+ if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
+ (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
!((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
return 0;
- /*
- * Allocate a buffer with enough padding for original transfer_buffer
- * pointer. This allocation is guaranteed to be aligned properly for
- * DMA
- */
- kmalloc_size = urb->transfer_buffer_length +
- (dma_get_cache_alignment() - 1) +
- sizeof(urb->transfer_buffer);
+ kmalloc_size = sizeof(struct dma_aligned_buffer);
+ if (usb_urb_dir_out(urb)) {
+ kmalloc_size += urb->transfer_buffer_length;
+ } else {
+ struct usb_host_endpoint *ep = urb->ep;
+ int maxp = usb_endpoint_maxp(&ep->desc);
- kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
- if (!kmalloc_ptr)
- return -ENOMEM;
+ /*
+ * Input transfer buffer size must be a multiple of the
+ * endpoint's maximum packet size to match the transfer
+ * limit programmed into the chip.
+ * See calculation of chan->xfer_len in
+ * dwc2_hc_start_transfer().
+ */
+ kmalloc_size += roundup(urb->transfer_buffer_length, maxp);
+ }
- /*
- * Position value of original urb->transfer_buffer pointer to the end
- * of allocation for later referencing
- */
- memcpy(PTR_ALIGN(kmalloc_ptr + urb->transfer_buffer_length,
- dma_get_cache_alignment()),
- &urb->transfer_buffer, sizeof(urb->transfer_buffer));
+ dma = kmalloc(kmalloc_size, mem_flags);
+ if (!dma)
+ return -ENOMEM;
+ dma->old_xfer_buffer = urb->transfer_buffer;
if (usb_urb_dir_out(urb))
- memcpy(kmalloc_ptr, urb->transfer_buffer,
+ memcpy(dma->data, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = kmalloc_ptr;
+ urb->transfer_buffer = dma->data;
urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
--
2.17.1
With some USB network adapters, such as DM96xx, the following message
is seen for each maximum size receive packet.
dwc2 ff540000.usb: dwc2_update_urb_state(): trimming xfer length
This happens because the packet size requested by the driver is 1522
bytes, wMaxPacketSize is 64, the dwc2 driver configures the chip to
receive 24*64 = 1536 bytes, and the chip does indeed send more than
1522 bytes of data. Since the event does not indicate an error condition,
the message is just noise. Demote it to debug level.
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/usb/dwc2/hcd_intr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 12819e019e13..d5f4ec1b73b1 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -500,7 +500,7 @@ static int dwc2_update_urb_state(struct dwc2_hsotg *hsotg,
&short_read);
if (urb->actual_length + xfer_length > urb->length) {
- dev_warn(hsotg->dev, "%s(): trimming xfer length\n", __func__);
+ dev_dbg(hsotg->dev, "%s(): trimming xfer length\n", __func__);
xfer_length = urb->length - urb->actual_length;
}
--
2.17.1
Hi,
On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>
> The DWC2 documentation states that transfers with zero data length should
> set the number of packets to 1 and the transfer length to 0. This is not
> currently the case for inbound transfers: the transfer length is set to
> the maximum packet length. This can have adverse effects if the chip
> actually does transfer data as it is programmed to do. Follow chip
> documentation and keep the transfer length set to 0 in that situation.
>
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> drivers/usb/dwc2/hcd.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
I don't have any other test setup that you don't have, so just giving
my review tag and not tested tag.
I will note that it feels like this should have a "Fixes" tag or a
direct Cc to stable to make it obvious that it should make its way
back to stable trees.
Reviewed-by: Douglas Anderson <[email protected]>
Hi,
On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>
> The code to align buffers for DMA was first introduced with commit
> 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way").
> It was updated with commit 56406e017a88 ("usb: dwc2: Fix DMA alignment
> to start at allocated boundary") because it did not really align buffers to
> DMA boundaries but to word offsets. This was then optimized in commit
> 1e111e885238 ("usb: dwc2: Fix inefficient copy of unaligned buffers")
> to only copy actual data rather than the whole buffer. Commit 4a4863bf2e79
> ("usb: dwc2: Fix DMA cache alignment issues") changed this further to add
> a padding at the end of the buffer to ensure that the old data pointer is
> not in the same cache line as the buffer.
>
> This last commit states "Otherwise, the stored_xfer_buffer gets corrupted
> for IN URBs on non-cache-coherent systems". However, such corruptions are
> still observed. This suggests that the commit may have been hiding a
> problem rather than fixing it. Further analysis shows that this is indeed
> the case: The code in dwc2_hc_start_transfer() assumes that the transfer
> size is a multiple of wMaxPacketSize, and rounds up the transfer size
> communicated to the chip accordingly. Added debug code confirms that
> the chip does under some circumstances indeed send more data than requested
> in the urb receive buffer size.
>
> On top of that, it turns out that buffers are still not guaranteed to be
> aligned to dma_get_cache_alignment(), but to DWC2_USB_DMA_ALIGN (4).
> Further debugging shows that packets aligned to DWC2_USB_DMA_ALIGN
> but not to dma_get_cache_alignment() are indeed common and work just fine.
> This suggests that commit 56406e017a88 was not really necessary because
> even without it packets were already aligned to DWC2_USB_DMA_ALIGN.
>
> To simplify the code, move the old data pointer back to the beginning of
> the new buffer, restoring most of the original commit. Stop aligning the
> buffer to dma_get_cache_alignment() since it isn't needed and only makes
> the code more complex. Instead, ensure that the allocated buffer is a
> multiple of wMaxPacketSize to ensure that the chip does not write beyond
> the end of the buffer.
I do like the cleanliness of being able to easily identify the old
buffer (AKA by putting it first) and I agree that the existing code
was only really guaranteeing 4-byte alignment and if we truly needed
more alignment then we'd be allocating a lot more bounce buffers
(which is pretty expensive).
...but the argument in commit 56406e017a88 ("usb: dwc2: Fix DMA
alignment to start at allocated boundary") is still a compelling one.
Maybe at least put a comment in the code next to the "#define
DWC2_USB_DMA_ALIGN" saying that we think that this is enough alignment
for anyone using dwc2's built-in DMA logic?
> Cc: Douglas Anderson <[email protected]>
> Cc: Boris Arzur <[email protected]>
> Fixes: 56406e017a88 ("usb: dwc2: Fix DMA alignment to start at allocated boundary")
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> drivers/usb/dwc2/hcd.c | 67 ++++++++++++++++++++++--------------------
> 1 file changed, 35 insertions(+), 32 deletions(-)
Sorry for such a mess and thank you for all the work tracking down and
documenting all the problems. Clearly deep understanding of DMA is
not something I can claim. :(
A few points of order first:
* Although get_maintainer doesn't identify him, it has seemed like
Felipe Balbi lands most of the dwc2 things. Probably a good idea to
CC him.
* I have historically found Stefan Wahren interested in dwc2 fixes and
willing to test them on Raspberry Pi w/ various peripherals.
* Since one of the commits you refer to was written by Martin Schiller
it probably wouldn't hurt to CC him.
> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index b90f858af960..f6d8cc9cee34 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -2471,19 +2471,21 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
>
> #define DWC2_USB_DMA_ALIGN 4
>
> +struct dma_aligned_buffer {
> + void *old_xfer_buffer;
> + u8 data[0];
> +};
> +
> static void dwc2_free_dma_aligned_buffer(struct urb *urb)
> {
> - void *stored_xfer_buffer;
> + struct dma_aligned_buffer *dma;
> size_t length;
>
> if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
> return;
>
> - /* Restore urb->transfer_buffer from the end of the allocated area */
> - memcpy(&stored_xfer_buffer,
> - PTR_ALIGN(urb->transfer_buffer + urb->transfer_buffer_length,
> - dma_get_cache_alignment()),
> - sizeof(urb->transfer_buffer));
> + dma = container_of(urb->transfer_buffer,
> + struct dma_aligned_buffer, data);
>
> if (usb_urb_dir_in(urb)) {
> if (usb_pipeisoc(urb->pipe))
> @@ -2491,49 +2493,50 @@ static void dwc2_free_dma_aligned_buffer(struct urb *urb)
> else
> length = urb->actual_length;
>
> - memcpy(stored_xfer_buffer, urb->transfer_buffer, length);
> + memcpy(dma->old_xfer_buffer, dma->data, length);
> }
> - kfree(urb->transfer_buffer);
> - urb->transfer_buffer = stored_xfer_buffer;
> + urb->transfer_buffer = dma->old_xfer_buffer;
> + kfree(dma);
>
> urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
> }
>
> static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
> {
> - void *kmalloc_ptr;
> + struct dma_aligned_buffer *dma;
> size_t kmalloc_size;
>
> - if (urb->num_sgs || urb->sg ||
> - urb->transfer_buffer_length == 0 ||
> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
Maybe I'm misunderstanding things, but it feels like we need something
more here. Specifically I'm worried about the fact when the transfer
buffer is already aligned but the length is not a multiple of the
endpoint's maximum transfer size. You need to handle that, right?
AKA something like this (untested):
/* Simple case of not having to allocate a bounce buffer */
if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
return 0;
/* Can also avoid bounce buffer if alignment and size are good */
maxp = usb_endpoint_maxp(&ep->desc);
if (maxp == urb->transfer_buffer_length &&
!((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
return 0;
Other than that this looks good to me.
-Doug
Hi,
On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>
> With some USB network adapters, such as DM96xx, the following message
> is seen for each maximum size receive packet.
>
> dwc2 ff540000.usb: dwc2_update_urb_state(): trimming xfer length
>
> This happens because the packet size requested by the driver is 1522
> bytes, wMaxPacketSize is 64, the dwc2 driver configures the chip to
> receive 24*64 = 1536 bytes, and the chip does indeed send more than
> 1522 bytes of data. Since the event does not indicate an error condition,
> the message is just noise. Demote it to debug level.
>
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> drivers/usb/dwc2/hcd_intr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Suggest a "Fixes" or "Cc: stable" tag. This one isn't as important as
the others, but presumably you'll start hitting it a lot more now
(whereas previously we'd just crash).
Reviewed-by: Douglas Anderson <[email protected]>
Hi,
On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>
> In some situations, the following error messages are reported.
>
> dwc2 ff540000.usb: dwc2_hc_chhltd_intr_dma: Channel 1 - ChHltd set, but reason is unknown
> dwc2 ff540000.usb: hcint 0x00000002, intsts 0x04000021
>
> This is sometimes followed by:
>
> dwc2 ff540000.usb: dwc2_update_urb_state_abn(): trimming xfer length
>
> and then:
>
> WARNING: CPU: 0 PID: 0 at kernel/v4.19/drivers/usb/dwc2/hcd.c:2913
> dwc2_assign_and_init_hc+0x98c/0x990
>
> The warning suggests that an odd buffer address is to be used for DMA.
>
> After an error is observed, the receive buffer may be full
> (urb->actual_length >= urb->length). However, the urb is still left in
> the queue unless three errors were observed in a row. When it is queued
> again, the dwc2 hcd code translates this into a 1-block transfer.
> If urb->actual_length (ie the total expected receive length) is not
> DMA-aligned, the buffer pointer programmed into the chip will be
> unaligned. This results in the observed warning.
>
> To solve the problem, abort input transactions after an error with
> unknown cause if the entire packet was already received. This may be
> a bit drastic, but we don't really know why the transfer was aborted
> even though the entire packet was received. Aborting the transfer in
> this situation is less risky than accepting a potentially corrupted
> packet.
>
> With this patch in place, the 'ChHltd set' and 'trimming xfer length'
> messages are still observed, but there are no more transfer attempts
> with odd buffer addresses.
>
> Cc: Boris ARZUR <[email protected]>
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> drivers/usb/dwc2/hcd_intr.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
Seems sane to me. Also suggest a "Fixes" or "Cc: stable" tag.
Reviewed-by: Douglas Anderson <[email protected]>
On 2/27/20 2:06 PM, Doug Anderson wrote:
> Hi,
>
> On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>>
>> The code to align buffers for DMA was first introduced with commit
>> 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way").
>> It was updated with commit 56406e017a88 ("usb: dwc2: Fix DMA alignment
>> to start at allocated boundary") because it did not really align buffers to
>> DMA boundaries but to word offsets. This was then optimized in commit
>> 1e111e885238 ("usb: dwc2: Fix inefficient copy of unaligned buffers")
>> to only copy actual data rather than the whole buffer. Commit 4a4863bf2e79
>> ("usb: dwc2: Fix DMA cache alignment issues") changed this further to add
>> a padding at the end of the buffer to ensure that the old data pointer is
>> not in the same cache line as the buffer.
>>
>> This last commit states "Otherwise, the stored_xfer_buffer gets corrupted
>> for IN URBs on non-cache-coherent systems". However, such corruptions are
>> still observed. This suggests that the commit may have been hiding a
>> problem rather than fixing it. Further analysis shows that this is indeed
>> the case: The code in dwc2_hc_start_transfer() assumes that the transfer
>> size is a multiple of wMaxPacketSize, and rounds up the transfer size
>> communicated to the chip accordingly. Added debug code confirms that
>> the chip does under some circumstances indeed send more data than requested
>> in the urb receive buffer size.
>>
>> On top of that, it turns out that buffers are still not guaranteed to be
>> aligned to dma_get_cache_alignment(), but to DWC2_USB_DMA_ALIGN (4).
>> Further debugging shows that packets aligned to DWC2_USB_DMA_ALIGN
>> but not to dma_get_cache_alignment() are indeed common and work just fine.
>> This suggests that commit 56406e017a88 was not really necessary because
>> even without it packets were already aligned to DWC2_USB_DMA_ALIGN.
>>
>> To simplify the code, move the old data pointer back to the beginning of
>> the new buffer, restoring most of the original commit. Stop aligning the
>> buffer to dma_get_cache_alignment() since it isn't needed and only makes
>> the code more complex. Instead, ensure that the allocated buffer is a
>> multiple of wMaxPacketSize to ensure that the chip does not write beyond
>> the end of the buffer.
>
> I do like the cleanliness of being able to easily identify the old
> buffer (AKA by putting it first) and I agree that the existing code
> was only really guaranteeing 4-byte alignment and if we truly needed
> more alignment then we'd be allocating a lot more bounce buffers
> (which is pretty expensive).
>
> ...but the argument in commit 56406e017a88 ("usb: dwc2: Fix DMA
> alignment to start at allocated boundary") is still a compelling one.
Sure, but, as mentioned above, it wasn't followed anyway.
> Maybe at least put a comment in the code next to the "#define
> DWC2_USB_DMA_ALIGN" saying that we think that this is enough alignment
> for anyone using dwc2's built-in DMA logic?
>
Makes sense. I'll add a comment.
>
>> Cc: Douglas Anderson <[email protected]>
>> Cc: Boris Arzur <[email protected]>
>> Fixes: 56406e017a88 ("usb: dwc2: Fix DMA alignment to start at allocated boundary")
>> Signed-off-by: Guenter Roeck <[email protected]>
>> ---
>> drivers/usb/dwc2/hcd.c | 67 ++++++++++++++++++++++--------------------
>> 1 file changed, 35 insertions(+), 32 deletions(-)
>
> Sorry for such a mess and thank you for all the work tracking down and
> documenting all the problems. Clearly deep understanding of DMA is
> not something I can claim. :(
>
> A few points of order first:
> * Although get_maintainer doesn't identify him, it has seemed like
> Felipe Balbi lands most of the dwc2 things. Probably a good idea to
> CC him.
> * I have historically found Stefan Wahren interested in dwc2 fixes and
> willing to test them on Raspberry Pi w/ various peripherals.
> * Since one of the commits you refer to was written by Martin Schiller
> it probably wouldn't hurt to CC him.
>
I'll do that.
>
>> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
>> index b90f858af960..f6d8cc9cee34 100644
>> --- a/drivers/usb/dwc2/hcd.c
>> +++ b/drivers/usb/dwc2/hcd.c
>> @@ -2471,19 +2471,21 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
>>
>> #define DWC2_USB_DMA_ALIGN 4
>>
>> +struct dma_aligned_buffer {
>> + void *old_xfer_buffer;
>> + u8 data[0];
>> +};
>> +
>> static void dwc2_free_dma_aligned_buffer(struct urb *urb)
>> {
>> - void *stored_xfer_buffer;
>> + struct dma_aligned_buffer *dma;
>> size_t length;
>>
>> if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
>> return;
>>
>> - /* Restore urb->transfer_buffer from the end of the allocated area */
>> - memcpy(&stored_xfer_buffer,
>> - PTR_ALIGN(urb->transfer_buffer + urb->transfer_buffer_length,
>> - dma_get_cache_alignment()),
>> - sizeof(urb->transfer_buffer));
>> + dma = container_of(urb->transfer_buffer,
>> + struct dma_aligned_buffer, data);
>>
>> if (usb_urb_dir_in(urb)) {
>> if (usb_pipeisoc(urb->pipe))
>> @@ -2491,49 +2493,50 @@ static void dwc2_free_dma_aligned_buffer(struct urb *urb)
>> else
>> length = urb->actual_length;
>>
>> - memcpy(stored_xfer_buffer, urb->transfer_buffer, length);
>> + memcpy(dma->old_xfer_buffer, dma->data, length);
>> }
>> - kfree(urb->transfer_buffer);
>> - urb->transfer_buffer = stored_xfer_buffer;
>> + urb->transfer_buffer = dma->old_xfer_buffer;
>> + kfree(dma);
>>
>> urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
>> }
>>
>> static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
>> {
>> - void *kmalloc_ptr;
>> + struct dma_aligned_buffer *dma;
>> size_t kmalloc_size;
>>
>> - if (urb->num_sgs || urb->sg ||
>> - urb->transfer_buffer_length == 0 ||
>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
>
> Maybe I'm misunderstanding things, but it feels like we need something
> more here. Specifically I'm worried about the fact when the transfer
> buffer is already aligned but the length is not a multiple of the
> endpoint's maximum transfer size. You need to handle that, right?
> AKA something like this (untested):
>
> /* Simple case of not having to allocate a bounce buffer */
> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
> return 0;
>
> /* Can also avoid bounce buffer if alignment and size are good */
> maxp = usb_endpoint_maxp(&ep->desc);
> if (maxp == urb->transfer_buffer_length &&
No, transfer_buffer_length would have to be a multiple of maxp. There
are many situations where roundup(transfer_buffer_length, maxp) !=
transfer_buffer_length. I agree, this would be the prudent approach
(and it was my original implementation), but then it didn't seem to
cause trouble so far, and I was hesitant to add it in because it results
in creating temporary buffers for almost every receive operation.
I'd like to get some test feedback from Boris - if the current code
causes crashes with his use case, we'll know that it is needed.
Otherwise, we'll have to decide if the current approach (with fewer
copies) is worth the risk, or if we want to play save and always
copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
Thanks,
Guenter
>
> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
> return 0;
> >
> Other than that this looks good to me.
>
> -Doug
>
On Thu, Feb 27, 2020 at 02:06:44PM -0800, Doug Anderson wrote:
> Hi,
>
> On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
> >
> > The DWC2 documentation states that transfers with zero data length should
> > set the number of packets to 1 and the transfer length to 0. This is not
> > currently the case for inbound transfers: the transfer length is set to
> > the maximum packet length. This can have adverse effects if the chip
> > actually does transfer data as it is programmed to do. Follow chip
> > documentation and keep the transfer length set to 0 in that situation.
> >
> > Signed-off-by: Guenter Roeck <[email protected]>
> > ---
> > drivers/usb/dwc2/hcd.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
>
> I don't have any other test setup that you don't have, so just giving
> my review tag and not tested tag.
>
> I will note that it feels like this should have a "Fixes" tag or a
> direct Cc to stable to make it obvious that it should make its way
> back to stable trees.
>
I added
Fixes: 56f5b1cff22a1 ("staging: Core files for the DWC2 driver")
> Reviewed-by: Douglas Anderson <[email protected]>
Thanks!
Guenter
On Thu, Feb 27, 2020 at 02:06:55PM -0800, Doug Anderson wrote:
> Hi,
>
> On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
> >
> > In some situations, the following error messages are reported.
> >
> > dwc2 ff540000.usb: dwc2_hc_chhltd_intr_dma: Channel 1 - ChHltd set, but reason is unknown
> > dwc2 ff540000.usb: hcint 0x00000002, intsts 0x04000021
> >
> > This is sometimes followed by:
> >
> > dwc2 ff540000.usb: dwc2_update_urb_state_abn(): trimming xfer length
> >
> > and then:
> >
> > WARNING: CPU: 0 PID: 0 at kernel/v4.19/drivers/usb/dwc2/hcd.c:2913
> > dwc2_assign_and_init_hc+0x98c/0x990
> >
> > The warning suggests that an odd buffer address is to be used for DMA.
> >
> > After an error is observed, the receive buffer may be full
> > (urb->actual_length >= urb->length). However, the urb is still left in
> > the queue unless three errors were observed in a row. When it is queued
> > again, the dwc2 hcd code translates this into a 1-block transfer.
> > If urb->actual_length (ie the total expected receive length) is not
> > DMA-aligned, the buffer pointer programmed into the chip will be
> > unaligned. This results in the observed warning.
> >
> > To solve the problem, abort input transactions after an error with
> > unknown cause if the entire packet was already received. This may be
> > a bit drastic, but we don't really know why the transfer was aborted
> > even though the entire packet was received. Aborting the transfer in
> > this situation is less risky than accepting a potentially corrupted
> > packet.
> >
> > With this patch in place, the 'ChHltd set' and 'trimming xfer length'
> > messages are still observed, but there are no more transfer attempts
> > with odd buffer addresses.
> >
> > Cc: Boris ARZUR <[email protected]>
> > Cc: Douglas Anderson <[email protected]>
> > Signed-off-by: Guenter Roeck <[email protected]>
> > ---
> > drivers/usb/dwc2/hcd_intr.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
>
> Seems sane to me. Also suggest a "Fixes" or "Cc: stable" tag.
>
Added:
Fixes: 151d0cbdbe860 ("usb: dwc2: make the scheduler handle excessive NAKs better")
> Reviewed-by: Douglas Anderson <[email protected]>
Thanks!
Guenter
On Thu, Feb 27, 2020 at 02:07:14PM -0800, Doug Anderson wrote:
> Hi,
>
> On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
> >
> > With some USB network adapters, such as DM96xx, the following message
> > is seen for each maximum size receive packet.
> >
> > dwc2 ff540000.usb: dwc2_update_urb_state(): trimming xfer length
> >
> > This happens because the packet size requested by the driver is 1522
> > bytes, wMaxPacketSize is 64, the dwc2 driver configures the chip to
> > receive 24*64 = 1536 bytes, and the chip does indeed send more than
> > 1522 bytes of data. Since the event does not indicate an error condition,
> > the message is just noise. Demote it to debug level.
> >
> > Signed-off-by: Guenter Roeck <[email protected]>
> > ---
> > drivers/usb/dwc2/hcd_intr.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Suggest a "Fixes" or "Cc: stable" tag. This one isn't as important as
> the others, but presumably you'll start hitting it a lot more now
> (whereas previously we'd just crash).
>
Good point. Added
Fixes: 7359d482eb4d3 ("staging: HCD files for the DWC2 driver")
> Reviewed-by: Douglas Anderson <[email protected]>
Thanks again!
Guenter
On 2/27/20 2:27 PM, Guenter Roeck wrote:
> On 2/27/20 2:06 PM, Doug Anderson wrote:
[ ... ]
>>> - if (urb->num_sgs || urb->sg ||
>>> - urb->transfer_buffer_length == 0 ||
>>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
>>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
>>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
>>
>> Maybe I'm misunderstanding things, but it feels like we need something
>> more here. Specifically I'm worried about the fact when the transfer
>> buffer is already aligned but the length is not a multiple of the
>> endpoint's maximum transfer size. You need to handle that, right?
>> AKA something like this (untested):
>>
>> /* Simple case of not having to allocate a bounce buffer */
>> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
>> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
>> return 0;
>>
>> /* Can also avoid bounce buffer if alignment and size are good */
>> maxp = usb_endpoint_maxp(&ep->desc);
>> if (maxp == urb->transfer_buffer_length &&
>
> No, transfer_buffer_length would have to be a multiple of maxp. There
> are many situations where roundup(transfer_buffer_length, maxp) !=
> transfer_buffer_length. I agree, this would be the prudent approach
> (and it was my original implementation), but then it didn't seem to
> cause trouble so far, and I was hesitant to add it in because it results
> in creating temporary buffers for almost every receive operation.
> I'd like to get some test feedback from Boris - if the current code
> causes crashes with his use case, we'll know that it is needed.
> Otherwise, we'll have to decide if the current approach (with fewer
> copies) is worth the risk, or if we want to play save and always
> copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
>
Thinking more about this, the situation is actually much worse:
In Boris' testing, he found inbound transactions requested by usb
storage code with a requested transfer size of 13 bytes ... with
URB_NO_TRANSFER_DMA_MAP set. This means the requesting code has
provided a DMA ready buffer, transfer_buffer isn't even used,
and we can not reallocate it. In this situation we can just hope
that the chip (and the connected USB device) don't send more data
than requested.
With that in mind, I think we should stick with the current
scheme (ie only allocate a new buffer if the provided buffer is
unaligned) unless Boris comes back and tells us that it doesn't
work.
Thanks,
Guenter
Hi,
On Thu, Feb 27, 2020 at 8:28 PM Guenter Roeck <[email protected]> wrote:
>
> On 2/27/20 2:27 PM, Guenter Roeck wrote:
> > On 2/27/20 2:06 PM, Doug Anderson wrote:
> [ ... ]
> >>> - if (urb->num_sgs || urb->sg ||
> >>> - urb->transfer_buffer_length == 0 ||
> >>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> >>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
> >>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
> >>
> >> Maybe I'm misunderstanding things, but it feels like we need something
> >> more here. Specifically I'm worried about the fact when the transfer
> >> buffer is already aligned but the length is not a multiple of the
> >> endpoint's maximum transfer size. You need to handle that, right?
> >> AKA something like this (untested):
> >>
> >> /* Simple case of not having to allocate a bounce buffer */
> >> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> >> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
> >> return 0;
> >>
> >> /* Can also avoid bounce buffer if alignment and size are good */
> >> maxp = usb_endpoint_maxp(&ep->desc);
> >> if (maxp == urb->transfer_buffer_length &&
> >
> > No, transfer_buffer_length would have to be a multiple of maxp. There
> > are many situations where roundup(transfer_buffer_length, maxp) !=
> > transfer_buffer_length. I agree, this would be the prudent approach
> > (and it was my original implementation), but then it didn't seem to
> > cause trouble so far, and I was hesitant to add it in because it results
> > in creating temporary buffers for almost every receive operation.
> > I'd like to get some test feedback from Boris - if the current code
> > causes crashes with his use case, we'll know that it is needed.
> > Otherwise, we'll have to decide if the current approach (with fewer
> > copies) is worth the risk, or if we want to play save and always
> > copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
> >
>
> Thinking more about this, the situation is actually much worse:
> In Boris' testing, he found inbound transactions requested by usb
> storage code with a requested transfer size of 13 bytes ... with
> URB_NO_TRANSFER_DMA_MAP set. This means the requesting code has
> provided a DMA ready buffer, transfer_buffer isn't even used,
> and we can not reallocate it. In this situation we can just hope
> that the chip (and the connected USB device) don't send more data
> than requested.
>
> With that in mind, I think we should stick with the current
> scheme (ie only allocate a new buffer if the provided buffer is
> unaligned) unless Boris comes back and tells us that it doesn't
> work.
I dunno. I'd rather see correctness over performance. Certainly we'd
only need to do the extra bounce buffer for input buffers at least.
Although I don't love the idea, is this something where we want to
introduce a config option (either runtime or through KConfig),
something like:
CONFIG_DWC2_FAST_AND_LOOSE - Avoid bounce buffers and thus run faster
at the risk of a bad USB device being able to clobber some of your
memory. Only do this if you really care about speed and have some
trust in the USB devices connected to your system.
-Doug
Hi Doug,
[add Nicolas the new BCM2835 maintainer]
Am 27.02.20 um 23:06 schrieb Doug Anderson:
> Hi,
>
> On Wed, Feb 26, 2020 at 1:04 PM Guenter Roeck <[email protected]> wrote:
>> The code to align buffers for DMA was first introduced with commit
>> 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way").
>> It was updated with commit 56406e017a88 ("usb: dwc2: Fix DMA alignment
>> to start at allocated boundary") because it did not really align buffers to
>> DMA boundaries but to word offsets. This was then optimized in commit
>> 1e111e885238 ("usb: dwc2: Fix inefficient copy of unaligned buffers")
>> to only copy actual data rather than the whole buffer. Commit 4a4863bf2e79
>> ("usb: dwc2: Fix DMA cache alignment issues") changed this further to add
>> a padding at the end of the buffer to ensure that the old data pointer is
>> not in the same cache line as the buffer.
>>
>> This last commit states "Otherwise, the stored_xfer_buffer gets corrupted
>> for IN URBs on non-cache-coherent systems". However, such corruptions are
>> still observed. This suggests that the commit may have been hiding a
>> problem rather than fixing it. Further analysis shows that this is indeed
>> the case: The code in dwc2_hc_start_transfer() assumes that the transfer
>> size is a multiple of wMaxPacketSize, and rounds up the transfer size
>> communicated to the chip accordingly. Added debug code confirms that
>> the chip does under some circumstances indeed send more data than requested
>> in the urb receive buffer size.
>>
>> On top of that, it turns out that buffers are still not guaranteed to be
>> aligned to dma_get_cache_alignment(), but to DWC2_USB_DMA_ALIGN (4).
>> Further debugging shows that packets aligned to DWC2_USB_DMA_ALIGN
>> but not to dma_get_cache_alignment() are indeed common and work just fine.
>> This suggests that commit 56406e017a88 was not really necessary because
>> even without it packets were already aligned to DWC2_USB_DMA_ALIGN.
>>
>> To simplify the code, move the old data pointer back to the beginning of
>> the new buffer, restoring most of the original commit. Stop aligning the
>> buffer to dma_get_cache_alignment() since it isn't needed and only makes
>> the code more complex. Instead, ensure that the allocated buffer is a
>> multiple of wMaxPacketSize to ensure that the chip does not write beyond
>> the end of the buffer.
> I do like the cleanliness of being able to easily identify the old
> buffer (AKA by putting it first) and I agree that the existing code
> was only really guaranteeing 4-byte alignment and if we truly needed
> more alignment then we'd be allocating a lot more bounce buffers
> (which is pretty expensive).
>
> ...but the argument in commit 56406e017a88 ("usb: dwc2: Fix DMA
> alignment to start at allocated boundary") is still a compelling one.
> Maybe at least put a comment in the code next to the "#define
> DWC2_USB_DMA_ALIGN" saying that we think that this is enough alignment
> for anyone using dwc2's built-in DMA logic?
>
>
>> Cc: Douglas Anderson <[email protected]>
>> Cc: Boris Arzur <[email protected]>
>> Fixes: 56406e017a88 ("usb: dwc2: Fix DMA alignment to start at allocated boundary")
>> Signed-off-by: Guenter Roeck <[email protected]>
>> ---
>> drivers/usb/dwc2/hcd.c | 67 ++++++++++++++++++++++--------------------
>> 1 file changed, 35 insertions(+), 32 deletions(-)
> Sorry for such a mess and thank you for all the work tracking down and
> documenting all the problems. Clearly deep understanding of DMA is
> not something I can claim. :(
>
> A few points of order first:
> * Although get_maintainer doesn't identify him, it has seemed like
> Felipe Balbi lands most of the dwc2 things. Probably a good idea to
> CC him.
> * I have historically found Stefan Wahren interested in dwc2 fixes and
> willing to test them on Raspberry Pi w/ various peripherals.
i'm not the BCM2835 maintainer anymore, but will give it a try.
Regards
Stefan
On Fri, Feb 28, 2020 at 08:14:35AM -0800, Doug Anderson wrote:
> Hi,
>
> On Thu, Feb 27, 2020 at 8:28 PM Guenter Roeck <[email protected]> wrote:
> >
> > On 2/27/20 2:27 PM, Guenter Roeck wrote:
> > > On 2/27/20 2:06 PM, Doug Anderson wrote:
> > [ ... ]
> > >>> - if (urb->num_sgs || urb->sg ||
> > >>> - urb->transfer_buffer_length == 0 ||
> > >>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> > >>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
> > >>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
> > >>
> > >> Maybe I'm misunderstanding things, but it feels like we need something
> > >> more here. Specifically I'm worried about the fact when the transfer
> > >> buffer is already aligned but the length is not a multiple of the
> > >> endpoint's maximum transfer size. You need to handle that, right?
> > >> AKA something like this (untested):
> > >>
> > >> /* Simple case of not having to allocate a bounce buffer */
> > >> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> > >> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
> > >> return 0;
> > >>
> > >> /* Can also avoid bounce buffer if alignment and size are good */
> > >> maxp = usb_endpoint_maxp(&ep->desc);
> > >> if (maxp == urb->transfer_buffer_length &&
> > >
> > > No, transfer_buffer_length would have to be a multiple of maxp. There
> > > are many situations where roundup(transfer_buffer_length, maxp) !=
> > > transfer_buffer_length. I agree, this would be the prudent approach
> > > (and it was my original implementation), but then it didn't seem to
> > > cause trouble so far, and I was hesitant to add it in because it results
> > > in creating temporary buffers for almost every receive operation.
> > > I'd like to get some test feedback from Boris - if the current code
> > > causes crashes with his use case, we'll know that it is needed.
> > > Otherwise, we'll have to decide if the current approach (with fewer
> > > copies) is worth the risk, or if we want to play save and always
> > > copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
> > >
> >
> > Thinking more about this, the situation is actually much worse:
> > In Boris' testing, he found inbound transactions requested by usb
> > storage code with a requested transfer size of 13 bytes ... with
> > URB_NO_TRANSFER_DMA_MAP set. This means the requesting code has
> > provided a DMA ready buffer, transfer_buffer isn't even used,
> > and we can not reallocate it. In this situation we can just hope
> > that the chip (and the connected USB device) don't send more data
> > than requested.
> >
> > With that in mind, I think we should stick with the current
> > scheme (ie only allocate a new buffer if the provided buffer is
> > unaligned) unless Boris comes back and tells us that it doesn't
> > work.
>
> I dunno. I'd rather see correctness over performance. Certainly we'd
> only need to do the extra bounce buffer for input buffers at least.
>
> Although I don't love the idea, is this something where we want to
> introduce a config option (either runtime or through KConfig),
> something like:
>
> CONFIG_DWC2_FAST_AND_LOOSE - Avoid bounce buffers and thus run faster
> at the risk of a bad USB device being able to clobber some of your
> memory. Only do this if you really care about speed and have some
> trust in the USB devices connected to your system.
>
I understand your point. Unfortunately that would only work if the driver
doesn't set URB_NO_TRANSFER_DMA_MAP.
$ git grep "=.*URB_NO_TRANSFER_DMA_MAP" | wc
115 498 10104
isn't exactly reassuring - a quick checks suggests that almost 50%
of USB drivers set this flag.
So all we'd really accomplish is to give people a false sense of
security.
In this context, I did play around with configuring the real receive
buffer size (ie in my reproducer 1522 instead of 1536). If I do that,
reading the HCTSIZ register after the transfer reports 0x7fff2
(or -14 = 1522-1536 if I treat the value as signed) as actual transfer
size. Maybe that would be an option, if properly handled, but who knows
what the IP actually does in this case, and what it does on other
implementations (not rk3288).
Guenter
On Fri, 28 Feb 2020 at 19:59, Guenter Roeck <[email protected]> wrote:
>
> On Fri, Feb 28, 2020 at 08:14:35AM -0800, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, Feb 27, 2020 at 8:28 PM Guenter Roeck <[email protected]> wrote:
> > >
> > > On 2/27/20 2:27 PM, Guenter Roeck wrote:
> > > > On 2/27/20 2:06 PM, Doug Anderson wrote:
> > > [ ... ]
> > > >>> - if (urb->num_sgs || urb->sg ||
> > > >>> - urb->transfer_buffer_length == 0 ||
> > > >>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> > > >>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
> > > >>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
> > > >>
> > > >> Maybe I'm misunderstanding things, but it feels like we need something
> > > >> more here. Specifically I'm worried about the fact when the transfer
> > > >> buffer is already aligned but the length is not a multiple of the
> > > >> endpoint's maximum transfer size. You need to handle that, right?
> > > >> AKA something like this (untested):
> > > >>
> > > >> /* Simple case of not having to allocate a bounce buffer */
> > > >> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
> > > >> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
> > > >> return 0;
> > > >>
> > > >> /* Can also avoid bounce buffer if alignment and size are good */
> > > >> maxp = usb_endpoint_maxp(&ep->desc);
> > > >> if (maxp == urb->transfer_buffer_length &&
> > > >
> > > > No, transfer_buffer_length would have to be a multiple of maxp. There
> > > > are many situations where roundup(transfer_buffer_length, maxp) !=
> > > > transfer_buffer_length. I agree, this would be the prudent approach
> > > > (and it was my original implementation), but then it didn't seem to
> > > > cause trouble so far, and I was hesitant to add it in because it results
> > > > in creating temporary buffers for almost every receive operation.
> > > > I'd like to get some test feedback from Boris - if the current code
> > > > causes crashes with his use case, we'll know that it is needed.
> > > > Otherwise, we'll have to decide if the current approach (with fewer
> > > > copies) is worth the risk, or if we want to play save and always
> > > > copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
> > > >
> > >
> > > Thinking more about this, the situation is actually much worse:
> > > In Boris' testing, he found inbound transactions requested by usb
> > > storage code with a requested transfer size of 13 bytes ... with
> > > URB_NO_TRANSFER_DMA_MAP set. This means the requesting code has
> > > provided a DMA ready buffer, transfer_buffer isn't even used,
> > > and we can not reallocate it. In this situation we can just hope
> > > that the chip (and the connected USB device) don't send more data
> > > than requested.
> > >
> > > With that in mind, I think we should stick with the current
> > > scheme (ie only allocate a new buffer if the provided buffer is
> > > unaligned) unless Boris comes back and tells us that it doesn't
> > > work.
> >
> > I dunno. I'd rather see correctness over performance. Certainly we'd
> > only need to do the extra bounce buffer for input buffers at least.
> >
> > Although I don't love the idea, is this something where we want to
> > introduce a config option (either runtime or through KConfig),
> > something like:
> >
> > CONFIG_DWC2_FAST_AND_LOOSE - Avoid bounce buffers and thus run faster
> > at the risk of a bad USB device being able to clobber some of your
> > memory. Only do this if you really care about speed and have some
> > trust in the USB devices connected to your system.
> >
>
> I understand your point. Unfortunately that would only work if the driver
> doesn't set URB_NO_TRANSFER_DMA_MAP.
>
> $ git grep "=.*URB_NO_TRANSFER_DMA_MAP" | wc
> 115 498 10104
>
> isn't exactly reassuring - a quick checks suggests that almost 50%
> of USB drivers set this flag.
>
> So all we'd really accomplish is to give people a false sense of
> security.
>
> In this context, I did play around with configuring the real receive
> buffer size (ie in my reproducer 1522 instead of 1536). If I do that,
> reading the HCTSIZ register after the transfer reports 0x7fff2
> (or -14 = 1522-1536 if I treat the value as signed) as actual transfer
> size. Maybe that would be an option, if properly handled, but who knows
> what the IP actually does in this case, and what it does on other
> implementations (not rk3288).
>
> Guenter
Hi Guenter.
I decided to give your patch-set a spin on my Lantiq device and I'm
sorry to report that I'm now experiencing a complete crash of the
kernel due to unaligned access if I try to do usb transfers:
Unhandled kernel unaligned access[#1]:
CPU: 1 PID: 3149 Comm: sh Not tainted 5.6-rc3 #0
task: 87db2580 task.stack: 868c6000
$ 0 : 00000000 00000001 00000000 81125088
$ 4 : 8064ffc4 00000001 00000001 2a624a29
$ 8 : 00000c43 00000c42 80010770 00000000
$12 : 7f801be8 77fac2a0 00000022 00000000
$16 : 87c02300 014000c0 87d1ddc0 00000000
$20 : 87db2580 00000000 00000000 00000000
$24 : 00000000 77f5fbcc
$28 : 868c6000 868c7e00 00000000 800347b0
Hi : 0000001b
Lo : 0000005b
epc : 8012d278 kmem_cache_alloc+0x128/0x17c
ra : 800347b0 copy_process.part.94+0xd8/0x1690
Status: 1100c303 KERNEL EXL IE
Cause : 00800010 (ExcCode 04)
BadVA : 2a624a29
PrId : 00019556 (MIPS 34Kc)
Modules linked in: ath9k ath9k_common option ath9k_hw ath10k_pci
ath10k_core ath usb_wwan qmi_wwan pppoe nf_conntrack_ipv6 mac80211
iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp
xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack
xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD usbserial
usbnet ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot
ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda pppox
ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect
nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat
nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9
iptable_mangle iptable_filter ip_tables crc_ccitt compat cdc_wdm
drv_dsl_cpe_api drv_mei_cpe nf_log_ipv6 nf_log_common
ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables
nf_reject_ipv6 pppoatm ppp_generic slhc br2684 atm drv_ifxos
usb_storage dwc2 sd_mod scsi_mod gpio_button_hotplug mii
Process sh (pid: 3149, threadinfo=868c6000, task=87db2580, tls=77fadefc)
Stack : 80657098 00000100 77fa6db0 00000000 00000012 00000012 ffffffff 800347b0
80650000 8013906c 87c02c00 8012d9fc 00000000 00000000 00000020 807b0000
879f1148 00000001 868c7e98 804f6018 00000000 801391a8 868c7ef0 80153258
00000000 00000001 864f62a0 00000020 864f62b8 80044818 00000400 8015fe94
00000003 00000005 00000012 00000000 7f801be0 00430871 00000000 77fac000
...
Call Trace:
[<8012d278>] kmem_cache_alloc+0x128/0x17c
[<800347b0>] copy_process.part.94+0xd8/0x1690
[<80035f00>] _do_fork+0xe4/0x3bc
[<80036238>] sys_fork+0x24/0x30
[<8001c438>] syscall_common+0x34/0x58
Code: 00000000 8e020014 00e23821 <8ce20000> 10000009 cc400000
1040ffbd 00000000 8e060010
---[ end trace 3d8df00f1a0d123c ]---
Don't be fooled by the Call Trace, the stack unwinder is most likely
totally confused due to memory corruption.
The culprit is the first patch in the series that does not align DMA
carefully enough. Apparently these big endian MIPS devices are very
picky on how that is supposed to be handled.
Couple of years ago mips people even contemplated on adding warn_on to
kernel to yell at driver authors who do not align their DMA properly.
[1.]
That patch explanation actually served as an inspiration for commit
56406e017a88 in the first place.
[1.] https://www.linux-mips.org/archives/linux-mips/2016-11/msg00267.html
--
Antti
On 2/28/20 11:46 PM, Antti Seppälä wrote:
> On Fri, 28 Feb 2020 at 19:59, Guenter Roeck <[email protected]> wrote:
>>
>> On Fri, Feb 28, 2020 at 08:14:35AM -0800, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, Feb 27, 2020 at 8:28 PM Guenter Roeck <[email protected]> wrote:
>>>>
>>>> On 2/27/20 2:27 PM, Guenter Roeck wrote:
>>>>> On 2/27/20 2:06 PM, Doug Anderson wrote:
>>>> [ ... ]
>>>>>>> - if (urb->num_sgs || urb->sg ||
>>>>>>> - urb->transfer_buffer_length == 0 ||
>>>>>>> + if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
>>>>>>> + (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) ||
>>>>>>> !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
>>>>>>
>>>>>> Maybe I'm misunderstanding things, but it feels like we need something
>>>>>> more here. Specifically I'm worried about the fact when the transfer
>>>>>> buffer is already aligned but the length is not a multiple of the
>>>>>> endpoint's maximum transfer size. You need to handle that, right?
>>>>>> AKA something like this (untested):
>>>>>>
>>>>>> /* Simple case of not having to allocate a bounce buffer */
>>>>>> if (urb->num_sgs || urb->sg || urb->transfer_buffer_length == 0 ||
>>>>>> (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
>>>>>> return 0;
>>>>>>
>>>>>> /* Can also avoid bounce buffer if alignment and size are good */
>>>>>> maxp = usb_endpoint_maxp(&ep->desc);
>>>>>> if (maxp == urb->transfer_buffer_length &&
>>>>>
>>>>> No, transfer_buffer_length would have to be a multiple of maxp. There
>>>>> are many situations where roundup(transfer_buffer_length, maxp) !=
>>>>> transfer_buffer_length. I agree, this would be the prudent approach
>>>>> (and it was my original implementation), but then it didn't seem to
>>>>> cause trouble so far, and I was hesitant to add it in because it results
>>>>> in creating temporary buffers for almost every receive operation.
>>>>> I'd like to get some test feedback from Boris - if the current code
>>>>> causes crashes with his use case, we'll know that it is needed.
>>>>> Otherwise, we'll have to decide if the current approach (with fewer
>>>>> copies) is worth the risk, or if we want to play save and always
>>>>> copy if roundup(transfer_buffer_length, maxp) != transfer_buffer_length.
>>>>>
>>>>
>>>> Thinking more about this, the situation is actually much worse:
>>>> In Boris' testing, he found inbound transactions requested by usb
>>>> storage code with a requested transfer size of 13 bytes ... with
>>>> URB_NO_TRANSFER_DMA_MAP set. This means the requesting code has
>>>> provided a DMA ready buffer, transfer_buffer isn't even used,
>>>> and we can not reallocate it. In this situation we can just hope
>>>> that the chip (and the connected USB device) don't send more data
>>>> than requested.
>>>>
>>>> With that in mind, I think we should stick with the current
>>>> scheme (ie only allocate a new buffer if the provided buffer is
>>>> unaligned) unless Boris comes back and tells us that it doesn't
>>>> work.
>>>
>>> I dunno. I'd rather see correctness over performance. Certainly we'd
>>> only need to do the extra bounce buffer for input buffers at least.
>>>
>>> Although I don't love the idea, is this something where we want to
>>> introduce a config option (either runtime or through KConfig),
>>> something like:
>>>
>>> CONFIG_DWC2_FAST_AND_LOOSE - Avoid bounce buffers and thus run faster
>>> at the risk of a bad USB device being able to clobber some of your
>>> memory. Only do this if you really care about speed and have some
>>> trust in the USB devices connected to your system.
>>>
>>
>> I understand your point. Unfortunately that would only work if the driver
>> doesn't set URB_NO_TRANSFER_DMA_MAP.
>>
>> $ git grep "=.*URB_NO_TRANSFER_DMA_MAP" | wc
>> 115 498 10104
>>
>> isn't exactly reassuring - a quick checks suggests that almost 50%
>> of USB drivers set this flag.
>>
>> So all we'd really accomplish is to give people a false sense of
>> security.
>>
>> In this context, I did play around with configuring the real receive
>> buffer size (ie in my reproducer 1522 instead of 1536). If I do that,
>> reading the HCTSIZ register after the transfer reports 0x7fff2
>> (or -14 = 1522-1536 if I treat the value as signed) as actual transfer
>> size. Maybe that would be an option, if properly handled, but who knows
>> what the IP actually does in this case, and what it does on other
>> implementations (not rk3288).
>>
>> Guenter
>
> Hi Guenter.
>
> I decided to give your patch-set a spin on my Lantiq device and I'm
> sorry to report that I'm now experiencing a complete crash of the
> kernel due to unaligned access if I try to do usb transfers:
>
> Unhandled kernel unaligned access[#1]:
> CPU: 1 PID: 3149 Comm: sh Not tainted 5.6-rc3 #0
> task: 87db2580 task.stack: 868c6000
> $ 0 : 00000000 00000001 00000000 81125088
> $ 4 : 8064ffc4 00000001 00000001 2a624a29
> $ 8 : 00000c43 00000c42 80010770 00000000
> $12 : 7f801be8 77fac2a0 00000022 00000000
> $16 : 87c02300 014000c0 87d1ddc0 00000000
> $20 : 87db2580 00000000 00000000 00000000
> $24 : 00000000 77f5fbcc
> $28 : 868c6000 868c7e00 00000000 800347b0
> Hi : 0000001b
> Lo : 0000005b
> epc : 8012d278 kmem_cache_alloc+0x128/0x17c
> ra : 800347b0 copy_process.part.94+0xd8/0x1690
> Status: 1100c303 KERNEL EXL IE
> Cause : 00800010 (ExcCode 04)
> BadVA : 2a624a29
> PrId : 00019556 (MIPS 34Kc)
> Modules linked in: ath9k ath9k_common option ath9k_hw ath10k_pci
> ath10k_core ath usb_wwan qmi_wwan pppoe nf_conntrack_ipv6 mac80211
> iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp
> xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack
> xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD usbserial
> usbnet ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot
> ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda pppox
> ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect
> nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat
> nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6
> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9
> iptable_mangle iptable_filter ip_tables crc_ccitt compat cdc_wdm
> drv_dsl_cpe_api drv_mei_cpe nf_log_ipv6 nf_log_common
> ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables
> nf_reject_ipv6 pppoatm ppp_generic slhc br2684 atm drv_ifxos
> usb_storage dwc2 sd_mod scsi_mod gpio_button_hotplug mii
> Process sh (pid: 3149, threadinfo=868c6000, task=87db2580, tls=77fadefc)
> Stack : 80657098 00000100 77fa6db0 00000000 00000012 00000012 ffffffff 800347b0
> 80650000 8013906c 87c02c00 8012d9fc 00000000 00000000 00000020 807b0000
> 879f1148 00000001 868c7e98 804f6018 00000000 801391a8 868c7ef0 80153258
> 00000000 00000001 864f62a0 00000020 864f62b8 80044818 00000400 8015fe94
> 00000003 00000005 00000012 00000000 7f801be0 00430871 00000000 77fac000
> ...
> Call Trace:
> [<8012d278>] kmem_cache_alloc+0x128/0x17c
> [<800347b0>] copy_process.part.94+0xd8/0x1690
> [<80035f00>] _do_fork+0xe4/0x3bc
> [<80036238>] sys_fork+0x24/0x30
> [<8001c438>] syscall_common+0x34/0x58
> Code: 00000000 8e020014 00e23821 <8ce20000> 10000009 cc400000
> 1040ffbd 00000000 8e060010
>
> ---[ end trace 3d8df00f1a0d123c ]---
>
>
> Don't be fooled by the Call Trace, the stack unwinder is most likely
> totally confused due to memory corruption.
>
> The culprit is the first patch in the series that does not align DMA
> carefully enough. Apparently these big endian MIPS devices are very
> picky on how that is supposed to be handled.
>
Sigh. It would have been too simple. Too bad I can't test myself.
I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
transfer, or because the beginning of the buffer indeed needs to be aligned
to the DMA cache line size on that system. In the latter case, the question
is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
question would be why the realignment does any good in the first place.
Any chance you can add some test code to help figuring out what exactly
goes wrong ?
Thanks,
Guenter
> Couple of years ago mips people even contemplated on adding warn_on to
> kernel to yell at driver authors who do not align their DMA properly.
> [1.]
> That patch explanation actually served as an inspiration for commit
> 56406e017a88 in the first place.
>
> [1.] https://www.linux-mips.org/archives/linux-mips/2016-11/msg00267.html
>
On Sat, 29 Feb 2020 at 17:25, Guenter Roeck <[email protected]> wrote:
>
> Sigh. It would have been too simple. Too bad I can't test myself.
> I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
> transfer, or because the beginning of the buffer indeed needs to be aligned
> to the DMA cache line size on that system. In the latter case, the question
> is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
> question would be why the realignment does any good in the first place.
>
> Any chance you can add some test code to help figuring out what exactly
> goes wrong ?
>
Sure, I can try to help. Just let me know what code you would like to
insert and where and I'll see what I can do.
--
Antti
On Sat, 29 Feb 2020 at 18:33, Antti Seppälä <[email protected]> wrote:
>
> On Sat, 29 Feb 2020 at 17:25, Guenter Roeck <[email protected]> wrote:
> >
> > Sigh. It would have been too simple. Too bad I can't test myself.
> > I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
> > transfer, or because the beginning of the buffer indeed needs to be aligned
> > to the DMA cache line size on that system. In the latter case, the question
> > is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
> > question would be why the realignment does any good in the first place.
> >
> > Any chance you can add some test code to help figuring out what exactly
> > goes wrong ?
> >
>
> Sure, I can try to help. Just let me know what code you would like to
> insert and where and I'll see what I can do.
>
So I did some further research on this and it turns out that:
- URB_NO_TRANSFER_DMA_MAP is not set on the offending transfers so
the issue really is buffer alignment
- DWC2_USB_DMA_ALIGN=4 "works" because in my limited testcase (usb
4g-dongle utilized via qmi-wwan) all transfers are unaligned. That is,
every urb->transfer_buffer is misaligned by 2 bytes == unaligned
- I can fix both issues and thus make the patch work on MIPS by
modifying it like this:
-#define DWC2_USB_DMA_ALIGN 4
+#define DWC2_USB_DMA_ALIGN dma_get_cache_alignment()
struct dma_aligned_buffer {
void *old_xfer_buffer;
- u8 data[0];
+ u8 data[0] ____cacheline_aligned;
};
--
Antti
Hi Antti,
On 3/1/20 7:51 AM, Antti Seppälä wrote:
> On Sat, 29 Feb 2020 at 18:33, Antti Seppälä <[email protected]> wrote:
>>
>> On Sat, 29 Feb 2020 at 17:25, Guenter Roeck <[email protected]> wrote:
>>>
>>> Sigh. It would have been too simple. Too bad I can't test myself.
>>> I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
>>> transfer, or because the beginning of the buffer indeed needs to be aligned
>>> to the DMA cache line size on that system. In the latter case, the question
>>> is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
>>> question would be why the realignment does any good in the first place.
>>>
>>> Any chance you can add some test code to help figuring out what exactly
>>> goes wrong ?
>>>
>>
>> Sure, I can try to help. Just let me know what code you would like to
>> insert and where and I'll see what I can do.
>>
>
> So I did some further research on this and it turns out that:
> - URB_NO_TRANSFER_DMA_MAP is not set on the offending transfers so
> the issue really is buffer alignment
> - DWC2_USB_DMA_ALIGN=4 "works" because in my limited testcase (usb
> 4g-dongle utilized via qmi-wwan) all transfers are unaligned. That is,
> every urb->transfer_buffer is misaligned by 2 bytes == unaligned
> - I can fix both issues and thus make the patch work on MIPS by
> modifying it like this:
>
> -#define DWC2_USB_DMA_ALIGN 4
> +#define DWC2_USB_DMA_ALIGN dma_get_cache_alignment()
>
> struct dma_aligned_buffer {
> void *old_xfer_buffer;
> - u8 data[0];
> + u8 data[0] ____cacheline_aligned;
> };
>
Thanks for the additional testing. That means that the existing code
is already broken, or am I missing something ?
Updating DWC2_USB_DMA_ALIGN to dma_get_cache_alignment() was part
of my initial fix, but then I abandoned it because I thought, well,
the existing alignment works, so that can't be the problem.
Anyway, using ____cacheline_aligned is an excellent idea. I'll make
the changes suggested above for the next version of my series.
Thanks,
Guenter
On Sun, 1 Mar 2020 at 18:24, Guenter Roeck <[email protected]> wrote:
>
> Hi Antti,
>
> On 3/1/20 7:51 AM, Antti Seppälä wrote:
> > On Sat, 29 Feb 2020 at 18:33, Antti Seppälä <[email protected]> wrote:
> >>
> >> On Sat, 29 Feb 2020 at 17:25, Guenter Roeck <[email protected]> wrote:
> >>>
> >>> Sigh. It would have been too simple. Too bad I can't test myself.
> >>> I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
> >>> transfer, or because the beginning of the buffer indeed needs to be aligned
> >>> to the DMA cache line size on that system. In the latter case, the question
> >>> is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
> >>> question would be why the realignment does any good in the first place.
> >>>
> >>> Any chance you can add some test code to help figuring out what exactly
> >>> goes wrong ?
> >>>
> >>
> >> Sure, I can try to help. Just let me know what code you would like to
> >> insert and where and I'll see what I can do.
> >>
> >
> > So I did some further research on this and it turns out that:
> > - URB_NO_TRANSFER_DMA_MAP is not set on the offending transfers so
> > the issue really is buffer alignment
> > - DWC2_USB_DMA_ALIGN=4 "works" because in my limited testcase (usb
> > 4g-dongle utilized via qmi-wwan) all transfers are unaligned. That is,
> > every urb->transfer_buffer is misaligned by 2 bytes == unaligned
> > - I can fix both issues and thus make the patch work on MIPS by
> > modifying it like this:
> >
> > -#define DWC2_USB_DMA_ALIGN 4
> > +#define DWC2_USB_DMA_ALIGN dma_get_cache_alignment()
> >
> > struct dma_aligned_buffer {
> > void *old_xfer_buffer;
> > - u8 data[0];
> > + u8 data[0] ____cacheline_aligned;
> > };
> >
>
> Thanks for the additional testing. That means that the existing code
> is already broken, or am I missing something ?
>
Yes, I believe the existing code is broken if 4 byte aligned transfers
occur. I seem to have no easy way to prove that though as none of my
devices generate those.
> Updating DWC2_USB_DMA_ALIGN to dma_get_cache_alignment() was part
> of my initial fix, but then I abandoned it because I thought, well,
> the existing alignment works, so that can't be the problem.
>
> Anyway, using ____cacheline_aligned is an excellent idea. I'll make
> the changes suggested above for the next version of my series.
>
Great. In the meanwhile I'll continue testing and will report back if
I find anything new.
--
Antti
On 3/1/20 8:51 AM, Antti Seppälä wrote:
> On Sun, 1 Mar 2020 at 18:24, Guenter Roeck <[email protected]> wrote:
>>
>> Hi Antti,
>>
>> On 3/1/20 7:51 AM, Antti Seppälä wrote:
>>> On Sat, 29 Feb 2020 at 18:33, Antti Seppälä <[email protected]> wrote:
>>>>
>>>> On Sat, 29 Feb 2020 at 17:25, Guenter Roeck <[email protected]> wrote:
>>>>>
>>>>> Sigh. It would have been too simple. Too bad I can't test myself.
>>>>> I'd like to know if this is because URB_NO_TRANSFER_DMA_MAP is set on a
>>>>> transfer, or because the beginning of the buffer indeed needs to be aligned
>>>>> to the DMA cache line size on that system. In the latter case, the question
>>>>> is why the alignment to DWC2_USB_DMA_ALIGN (=4) works. In the former case,
>>>>> question would be why the realignment does any good in the first place.
>>>>>
>>>>> Any chance you can add some test code to help figuring out what exactly
>>>>> goes wrong ?
>>>>>
>>>>
>>>> Sure, I can try to help. Just let me know what code you would like to
>>>> insert and where and I'll see what I can do.
>>>>
>>>
>>> So I did some further research on this and it turns out that:
>>> - URB_NO_TRANSFER_DMA_MAP is not set on the offending transfers so
>>> the issue really is buffer alignment
>>> - DWC2_USB_DMA_ALIGN=4 "works" because in my limited testcase (usb
>>> 4g-dongle utilized via qmi-wwan) all transfers are unaligned. That is,
>>> every urb->transfer_buffer is misaligned by 2 bytes == unaligned
>>> - I can fix both issues and thus make the patch work on MIPS by
>>> modifying it like this:
>>>
>>> -#define DWC2_USB_DMA_ALIGN 4
>>> +#define DWC2_USB_DMA_ALIGN dma_get_cache_alignment()
>>>
>>> struct dma_aligned_buffer {
>>> void *old_xfer_buffer;
>>> - u8 data[0];
>>> + u8 data[0] ____cacheline_aligned;
>>> };
>>>
>>
>> Thanks for the additional testing. That means that the existing code
>> is already broken, or am I missing something ?
>>
>
> Yes, I believe the existing code is broken if 4 byte aligned transfers
> occur. I seem to have no easy way to prove that though as none of my
> devices generate those.
>
I did see this a lot when connecting USB drives. I'll re-test next week
and provide more details. I do have some concern because I also saw
transfers with URB_NO_TRANSFER_DMA_MAP set but transfer_buffer was
not DMA aligned. However, it may well be that the provided DMA buffer
_was_ aligned in that situation. I'll re-test that and let you know.
Thanks,
Guenter
Hi Guenter,
I found patches #2 and #3 fixed an issue for us at suse. Are you planning on
sending a v2? Do you mind if I give it a try?
Regards,
Nicolas
H Nicolas,
On 1/12/21 11:44 AM, Nicolas Saenz Julienne wrote:
> Hi Guenter,
> I found patches #2 and #3 fixed an issue for us at suse. Are you planning on
> sending a v2? Do you mind if I give it a try?
>
I don't plan a v2. The pandemic put a big hole into both my plans and my
ability to test the patch series. I don't plan to follow up on it,
sorry. Please feel free to do with it whatever you like.
Guenter