Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754299AbbGXNfw (ORCPT ); Fri, 24 Jul 2015 09:35:52 -0400 Received: from mga02.intel.com ([134.134.136.20]:58659 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753502AbbGXNfu (ORCPT ); Fri, 24 Jul 2015 09:35:50 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,538,1432623600"; d="scan'208,223";a="768968393" Message-ID: <55B24006.7020300@linux.intel.com> Date: Fri, 24 Jul 2015 16:39:18 +0300 From: Mathias Nyman User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: arekm@maven.pl CC: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 References: <201507181649.37745.a.miskiewicz@gmail.com> <201507202213.29189.a.miskiewicz@gmail.com> <55AE5B02.1000908@linux.intel.com> <201507221612.58846.a.miskiewicz@gmail.com> <55B228A4.2020604@linux.intel.com> In-Reply-To: <55B228A4.2020604@linux.intel.com> X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------000204050307070007090803" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4904 Lines: 128 This is a multi-part message in MIME format. --------------000204050307070007090803 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 24.07.2015 14:59, Mathias Nyman wrote: > On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote: >> >> On Tuesday 21 of July 2015, Mathias Nyman wrote: >>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote: >>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote: >>>>> Hi. >>>>> >>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some >>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive; >>>>> hapens in >>>> >>>>> both cases) copying process hangs just early after start with: >>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done >>>> by kernel triggers problem: >>>> >>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI >>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs >>>> [ 106.314079] Restarting tasks ... done. >>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00 >>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: >>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd >>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD >>>> ep_index 0 comp_code 1 >>> > > Thanks for the logs, They show that the error is related to transfer descriptors that wrap around > on the endpoint ring buffer by exactly one transfer block. > > I don't know yet why this happens, and I might need some help running additional debug > patches to solve this. I'll take a more in depth look at the code one more time first. > I think I found something, The recent ring segment size increase exposed an off by one error that has been in the driver for a long time. But you need to be unlucky and have your memory pages allocated in a specific order to trigger it. small fix, looks like this: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 94416ff..77da8fe 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg, return 0; /* offset in TRBs */ segment_offset = trb - seg->trbs; - if (segment_offset > TRBS_PER_SEGMENT) + if (segment_offset > TRBS_PER_SEGMENT - 1) return 0; return seg->dma + (segment_offset * sizeof(*trb)); } Patch attached, could you try it out? Thanks -Mathias --------------000204050307070007090803 Content-Type: text/x-patch; name="0001-xhci-fix-off-by-one-error-in-TRB-DMA-address-boundar.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-xhci-fix-off-by-one-error-in-TRB-DMA-address-boundar.pa"; filename*1="tch" >From 10e909ee20846793e41973941b1367e2303ec313 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 24 Jul 2015 15:56:23 +0300 Subject: [PATCH] xhci: fix off by one error in TRB DMA address boundary check We need to check that a TRB is part of the current segment in use before calculating its DMA address. Previously a ring segment didn't use a full memory page, and every new ring segment got a new memory page, so the off by one error in checking the upper bound was never seen. Now that we use a full memory page, 256 TRBs (4096 bytes) the off by one caused issues as it doesnt catch the case when a TRB was the first element of the next segment. This is triggered if the virtual memory pages for a ring segment are next to each in increasing order where the ring buffer wraps around and causes errors like: [ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1 [ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0 the trb-end address is one outside the end-seg address. Signed-off-by: Mathias Nyman --- drivers/usb/host/xhci-ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 94416ff..77da8fe 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg, return 0; /* offset in TRBs */ segment_offset = trb - seg->trbs; - if (segment_offset > TRBS_PER_SEGMENT) + if (segment_offset > TRBS_PER_SEGMENT - 1) return 0; return seg->dma + (segment_offset * sizeof(*trb)); } -- 1.8.3.2 --------------000204050307070007090803-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/