2015-07-18 14:49:43

by Arkadiusz Miskiewicz

[permalink] [raw]
Subject: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1


Hi.

I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file from
usb storage (sata disk behind sata-usb bridge or pendrive; hapens in both cases)
copying process hangs just early after start with:

[ 77.372137] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 77.388945] usb 2-1: New USB device found, idVendor=1f75, idProduct=0611
[ 77.388952] usb 2-1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[ 77.388956] usb 2-1: SerialNumber: 20130514
[ 77.402599] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 77.403177] scsi host6: usb-storage 2-1:1.0
[ 77.403318] usbcore: registered new interface driver usb-storage
[ 77.407529] usbcore: registered new interface driver uas
[ 78.400954] scsi 6:0:0:0: scsi scan: INQUIRY result too short (5), using 36
[ 78.400961] scsi 6:0:0:0: Direct-Access Hitachi HDS723020BLA642 PQ: 0 ANSI: 0
[ 78.401401] sd 6:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 78.402126] sd 6:0:0:0: [sdb] Write Protect is off
[ 78.402130] sd 6:0:0:0: [sdb] Mode Sense: 3b 00 00 00
[ 78.402876] sd 6:0:0:0: [sdb] No Caching mode page found
[ 78.402882] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 78.444310] sd 6:0:0:0: [sdb] Attached SCSI disk
[ 85.907972] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: (null)
[ 113.556376] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 113.556383] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffa4000 trb-start 00000000fffa5fe0 trb-end 00000000fffa6000 seg-start 00000000fffa5000 seg-end 00000000fffa5ff0
[ 234.236311] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 234.890862] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 234.890869] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff94000 trb-start 00000000fff95fd0 trb-end 00000000fff96000 seg-start 00000000fff95000 seg-end 00000000fff95ff0
[ 355.339935] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 355.574012] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 355.574018] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff84000 trb-start 00000000fff85fb0 trb-end 00000000fff86000 seg-start 00000000fff85000 seg-end 00000000fff85ff0
[ 476.430339] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 476.728729] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 476.728738] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff74000 trb-start 00000000fff75fe0 trb-end 00000000fff76000 seg-start 00000000fff75000 seg-end 00000000fff75ff0

3.19.3 works fine
4.0.8 fails
4.2.0-rc2-00077-gf760b87 fails

There was some similar thread in march 2015 but the issue there got resolved
by reverting one usb patch, so my issue has to be different (that revert is included
in 4.0 final I think).

.config -> http://sprunge.us/bACC
lsusb -> http://sprunge.us/DOWb
dmesg -> http://sprunge.us/UbTF

machine is dell xps 15 9530

--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )


2015-07-20 20:13:35

by Arkadiusz Miskiewicz

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> Hi.
>
> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
> both cases) copying process hangs just early after start with:

Looks like suspend & resume is enough. Reloading bluetooth firmware done by kernel
triggers problem:

[ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
[ 106.313280] PM: resume of devices complete after 3003.032 msecs
[ 106.314079] Restarting tasks ... done.
[ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
[ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3000 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.400396] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.400402] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3030 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.402225] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.402228] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3060 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.404401] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.404408] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3090 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.406229] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.406232] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd30c0 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.408389] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.408395] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd30f0 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.410291] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.410294] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3120 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.412427] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.412433] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3150 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.414315] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.414318] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3180 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0

[...]



http://sprunge.us/IDUh


--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

2015-07-21 14:41:58

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>> Hi.
>>
>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
>> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
>> both cases) copying process hangs just early after start with:
>
> Looks like suspend & resume is enough. Reloading bluetooth firmware done by kernel
> triggers problem:
>
> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> [ 106.314079] Restarting tasks ... done.
> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
> [ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1

Looks like we get an event for a really old transfer for some reason, it should probably be handled already.

I got a patch that adds more paranoid checks for TRB cancel, which has been one major reasons
for the "Transfer event TRB DMA ptr not part of current TD" Errors. It also adds some logging
to show what's went wrong. (patch attached, applies on 4.2-rc3) Can you see if it helps?

If it doesn't, then adding xhci debugging could give us some clue:
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you could
bisect it?

Thanks

-Mathias



Attachments:
0001-xhci-Don-t-touch-URB-TD-memory-if-they-are-no-longer.patch (5.30 kB)

2015-07-22 14:13:06

by Arkadiusz Miskiewicz

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1


[sorry, resend from different email - vger postmaster team has stupid filters
in place]

On Tuesday 21 of July 2015, Mathias Nyman wrote:
> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> > On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> >> Hi.
> >>
> >> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> >> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> >> hapens in
> >
> >> both cases) copying process hangs just early after start with:
> > Looks like suspend & resume is enough. Reloading bluetooth firmware done
> > by kernel triggers problem:
> >
> > [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> > [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> > [ 106.314079] Restarting tasks ... done.
> > [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> > [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
> > intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
> > 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
> > ep_index 0 comp_code 1
>
> Looks like we get an event for a really old transfer for some reason, it
> should probably be handled already.
>
> I got a patch that adds more paranoid checks for TRB cancel, which has been
> one major reasons for the "Transfer event TRB DMA ptr not part of current
> TD" Errors. It also adds some logging to show what's went wrong. (patch
> attached, applies on 4.2-rc3) Can you see if it helps?

It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
-> dmesg http://sprunge.us/PDIE

around 91s - after resume from ram bluetooth driver reloads

around 754s - tried to copy data from external usb disk


> If it doesn't, then adding xhci debugging could give us some clue:
> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

Ok, http://sprunge.us/GiHX

mounted fs around 1347s and started copying; TRB problems were still there but
file was being copied (in bursts)

> You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you
> could bisect it?

Unfortunately some kernels around 3.19 don't even boot (grub says it loads
initrd and no progress from that) - some commit fixed that but no idea which
one.

> Thanks
> -Mathias

--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

2015-07-24 11:56:07

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>
> [sorry, resend from different email - vger postmaster team has stupid filters
> in place]
>
> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>> Hi.
>>>>
>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>> hapens in
>>>
>>>> both cases) copying process hangs just early after start with:
>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done
>>> by kernel triggers problem:
>>>
>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>> [ 106.314079] Restarting tasks ... done.
>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>> ep_index 0 comp_code 1
>>
>> Looks like we get an event for a really old transfer for some reason, it
>> should probably be handled already.
>>
>> I got a patch that adds more paranoid checks for TRB cancel, which has been
>> one major reasons for the "Transfer event TRB DMA ptr not part of current
>> TD" Errors. It also adds some logging to show what's went wrong. (patch
>> attached, applies on 4.2-rc3) Can you see if it helps?
>
> It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
> -> dmesg http://sprunge.us/PDIE
>
> around 91s - after resume from ram bluetooth driver reloads
>
> around 754s - tried to copy data from external usb disk
>
>
>> If it doesn't, then adding xhci debugging could give us some clue:
>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
>
> Ok, http://sprunge.us/GiHX
>

Thanks for the logs, They show that the error is related to transfer descriptors that wrap around
on the endpoint ring buffer by exactly one transfer block.

I don't know yet why this happens, and I might need some help running additional debug
patches to solve this. I'll take a more in depth look at the code one more time first.

A short explanation of the error, mostly for myself:

To transfer data we have a ring buffer that holds transfer requests blocks (TRBs).
The ring buffer is made up of two segments,
the last TRB in each segment is a special link TRB that points to the next segment.

Segment A: 0x3000 - 0x3ff0 (where link TRB at 0x3ff0 points to Segment B, 0x4000)
Segment B: 0x4000 - 0x4ff0 (where link TRB at 0x4ff0 points back at Segment A, 0x3000)

A tranfer descriptor (TD) can consist of many TRBs, in this case three TRBs.
When a TD is completed we will get an event telling where the last transferred TRB of the TD was.

So in this case the error:
xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3000 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000
seg-start 00000000fffd4000 seg-end 00000000fffd4ff0

tells us the first TRB of the TD (trb-start) is at 4fd0,
The second TRB is at 4fe0,
Then we have the special link TRB be at 4ff0, pointing us back to the first segment.
The third and final TRB should be back at the first segment at 3000.

We get an event for the last TRB at 3000 and all should be fine, but driver claims the TDs TRBs start at 4fd0,
and the last TRB is at 5000, missing the link TRB wrapping us back to the first segment.


-Mathias








2015-07-24 13:35:52

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On 24.07.2015 14:59, Mathias Nyman wrote:
> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>>
>> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>>> Hi.
>>>>>
>>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>>> hapens in
>>>>
>>>>> both cases) copying process hangs just early after start with:
>>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done
>>>> by kernel triggers problem:
>>>>
>>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>>> [ 106.314079] Restarting tasks ... done.
>>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>>> ep_index 0 comp_code 1
>>>
>
> Thanks for the logs, They show that the error is related to transfer descriptors that wrap around
> on the endpoint ring buffer by exactly one transfer block.
>
> I don't know yet why this happens, and I might need some help running additional debug
> patches to solve this. I'll take a more in depth look at the code one more time first.
>

I think I found something, The recent ring segment size increase exposed an off by one
error that has been in the driver for a long time. But you need to be unlucky and have
your memory pages allocated in a specific order to trigger it.

small fix, looks like this:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg->trbs;
- if (segment_offset > TRBS_PER_SEGMENT)
+ if (segment_offset > TRBS_PER_SEGMENT - 1)
return 0;
return seg->dma + (segment_offset * sizeof(*trb));
}


Patch attached, could you try it out?

Thanks
-Mathias


Attachments:
0001-xhci-fix-off-by-one-error-in-TRB-DMA-address-boundar.patch (1.77 kB)

2015-07-24 13:53:39

by Peter Stuge

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

Mathias Nyman wrote:
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> - if (segment_offset > TRBS_PER_SEGMENT)
> + if (segment_offset > TRBS_PER_SEGMENT - 1)

Maybe change the > comparison to >= rather than add the extra "- 1"?


//Peter

2015-07-24 14:34:18

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On 24.07.2015 16:53, Peter Stuge wrote:
> Mathias Nyman wrote:
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> - if (segment_offset > TRBS_PER_SEGMENT)
>> + if (segment_offset > TRBS_PER_SEGMENT - 1)
>
> Maybe change the > comparison to >= rather than add the extra "- 1"?

Yes, sure, I'll change it if this really was the cause.

Just happy to finally find a probable cause after staring at logs and code for some time

-Mathias


2015-07-24 15:33:26

by Arkadiusz Miskiewicz

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On Friday 24 of July 2015, Mathias Nyman wrote:
> On 24.07.2015 14:59, Mathias Nyman wrote:
> > On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
> >> On Tuesday 21 of July 2015, Mathias Nyman wrote:
> >>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> >>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> >>>>> Hi.
> >>>>>
> >>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> >>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> >>>>> hapens in
> >>>>
> >>>>> both cases) copying process hangs just early after start with:
> >>>> Looks like suspend & resume is enough. Reloading bluetooth firmware
> >>>> done by kernel triggers problem:
> >>>>
> >>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> >>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> >>>> [ 106.314079] Restarting tasks ... done.
> >>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> >>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
> >>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
> >>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
> >>>> ep_index 0 comp_code 1
> >
> > Thanks for the logs, They show that the error is related to transfer
> > descriptors that wrap around on the endpoint ring buffer by exactly one
> > transfer block.
> >
> > I don't know yet why this happens, and I might need some help running
> > additional debug patches to solve this. I'll take a more in depth look
> > at the code one more time first.
>
> I think I found something, The recent ring segment size increase exposed an
> off by one error that has been in the driver for a long time. But you need
> to be unlucky and have your memory pages allocated in a specific order to
> trigger it.
>
> small fix, looks like this:
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 94416ff..77da8fe 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> - if (segment_offset > TRBS_PER_SEGMENT)
> + if (segment_offset > TRBS_PER_SEGMENT - 1)
> return 0;
> return seg->dma + (segment_offset * sizeof(*trb));
> }
>
>
> Patch attached, could you try it out?

Works fine with this patch, so:

Tested-by: Arkadiusz Miśkiewicz <[email protected]>

Thanks!

ps. please push to stable@, too

>
> Thanks
> -Mathias

--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

2015-08-03 13:08:50

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

On 24.07.2015 18:33, Arkadiusz Miskiewicz wrote:
> On Friday 24 of July 2015, Mathias Nyman wrote:
>> On 24.07.2015 14:59, Mathias Nyman wrote:
>>> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>>>> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>>>>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>>>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>>>>> hapens in
>>>>>>
>>>>>>> both cases) copying process hangs just early after start with:
>>>>>> Looks like suspend & resume is enough. Reloading bluetooth firmware
>>>>>> done by kernel triggers problem:
>>>>>>
>>>>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>>>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>>>>> [ 106.314079] Restarting tasks ... done.
>>>>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>>>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>>>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>>>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>>>>> ep_index 0 comp_code 1
>>>
>>> Thanks for the logs, They show that the error is related to transfer
>>> descriptors that wrap around on the endpoint ring buffer by exactly one
>>> transfer block.
>>>
>>> I don't know yet why this happens, and I might need some help running
>>> additional debug patches to solve this. I'll take a more in depth look
>>> at the code one more time first.
>>
>> I think I found something, The recent ring segment size increase exposed an
>> off by one error that has been in the driver for a long time. But you need
>> to be unlucky and have your memory pages allocated in a specific order to
>> trigger it.
>>
>> small fix, looks like this:
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>> index 94416ff..77da8fe 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> - if (segment_offset > TRBS_PER_SEGMENT)
>> + if (segment_offset > TRBS_PER_SEGMENT - 1)
>> return 0;
>> return seg->dma + (segment_offset * sizeof(*trb));
>> }
>>
>>
>> Patch attached, could you try it out?
>
> Works fine with this patch, so:
>
> Tested-by: Arkadiusz Miśkiewicz <[email protected]>
>
> Thanks!
>
> ps. please push to stable@, too
>

Patch sent forward, added Tested-by and stable tags

Thanks
- Mathias