Received: by 10.192.165.148 with SMTP id m20csp883400imm; Wed, 2 May 2018 10:16:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrhUEfURrTJy6xHZJArxGMP2VRUB5qGz97erifcGf8nK2hUa5tduoGXer2xiYvm/bQvwXPF X-Received: by 2002:a17:902:d687:: with SMTP id v7-v6mr20800994ply.201.1525281397700; Wed, 02 May 2018 10:16:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525281397; cv=none; d=google.com; s=arc-20160816; b=Zt/bq9FTqxuBfrBX8gm7ktqa/DJRQG5Kyi3dtpzhPIrUivaAvNWGzU/UmFi7vi+hE7 8aBMCH2Ltmm68g9FCXHSimoYB/2ZPCc+Sqjl1GAv6Clh6oGh5VNuPJPCug2Pnm1jwGTm BEe8EubAiZ2VQKvNNnUA6GSoPJXKMwMqOHwlPA48SNue0BcwH6MZr2ENArzIKIAx9/J2 8swf+XdCIsyPL2dUyTB+kfC5H9kaSIshxcr0G/Vbl7L6pzBxjIEQZuhHnxstO+oSp4Ca zZVbvZNUTDt6/INFbHuCyPfNMXmIhe6aELUVUYJwTp109vMB9TBErQ+AMAGStWgVtyzy 3CBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=MwTWoTdpB22p/u/JTPxxrUq2THG4F9G7LJtCS0x8INs=; b=PCAPiTrWr9JreE9TXBoadVQOc9ZnGYcsgxxfVUstTaK+2bvNnBhWNxUzo8SIs/dZgH RZkM5jWyDfPmW9I9+uYxTQJY6iotdwqfjsYeTAZypkZMWWLXGx5vTMJSNZlMuEvxhVMo MoyMONuRtbAAxFwhZ0RWvmXsu4sGGNOo5jqF8vJ9Beis5q2z8bU/AXNXJmvzoPgOyNKi 4lfZMiTexAf5b3FW52Zfq6YaUUE7rFHA+eGhtZpdCHmRR9gBk4tYJ2VQBKz1u8JslEx/ n/XH5VdShEloahiB9OnicLSUPeBGU7eHHB0/X44mvfZfMUkoEuTe19vFenO+rSYNytVx +y4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z188si11860099pfz.336.2018.05.02.10.16.22; Wed, 02 May 2018 10:16:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751602AbeEBRPF (ORCPT + 99 others); Wed, 2 May 2018 13:15:05 -0400 Received: from regular1.263xmail.com ([211.150.99.131]:42719 "EHLO regular1.263xmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751020AbeEBRPB (ORCPT ); Wed, 2 May 2018 13:15:01 -0400 Received: from wulf?rock-chips.com (unknown [192.168.167.12]) by regular1.263xmail.com (Postfix) with ESMTP id AB1D45E08; Thu, 3 May 2018 01:14:45 +0800 (CST) X-263anti-spam: KSV:0; X-MAIL-GRAY: 0 X-MAIL-DELIVERY: 1 X-KSVirus-check: 0 X-ABS-CHECKED: 4 Received: from [192.168.1.3] (localhost [127.0.0.1]) by smtp.263.net (Postfix) with ESMTPA id 91AA2375; Thu, 3 May 2018 01:14:44 +0800 (CST) X-RL-SENDER: wulf@rock-chips.com X-FST-TO: stantsui@aopen.com X-SENDER-IP: 111.146.46.34 X-LOGIN-NAME: wulf@rock-chips.com X-UNIQUE-TAG: <8f323061f4d6e51c7f94436961f0d04f> X-ATTACHMENT-NUM: 0 X-SENDER: wulf@rock-chips.com X-DNS-TYPE: 0 Received: from [192.168.1.3] (unknown [111.146.46.34]) by smtp.263.net (Postfix) whith ESMTP id 22463NBLHQ2; Thu, 03 May 2018 01:14:46 +0800 (CST) Subject: Re: [PATCH v2 1/2] usb: dwc2: alloc dma aligned buffer for isoc split in To: Doug Anderson , William Wu Cc: hminas@synopsys.com, felipe.balbi@linux.intel.com, Greg Kroah-Hartman , Sergei Shtylyov , =?UTF-8?Q?Heiko_St=c3=bcbner?= , LKML , linux-usb@vger.kernel.org, "open list:ARM/Rockchip SoC..." , Frank Wang , =?UTF-8?B?6buE5rab?= , "daniel.meng" , John Youn , =?UTF-8?B?546L5b6B5aKe?= , zsq@rock-chips.com, Allen.Hsu@quantatw.com, StanTsui@aopen.com References: <1525230288-6014-1-git-send-email-william.wu@rock-chips.com> <1525230288-6014-2-git-send-email-william.wu@rock-chips.com> From: wlf Message-ID: <8ce56264-cdd1-1a12-c642-79224c65f639@rock-chips.com> Date: Thu, 3 May 2018 01:14:43 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Doug, 在 2018年05月02日 12:33, Doug Anderson 写道: > Hi, > > On Tue, May 1, 2018 at 8:04 PM, William Wu wrote: >> The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in >> a more supported way") rips out a lot of code to simply the >> allocation of aligned DMA. However, it also introduces a new >> issue when use isoc split in transfer. >> >> In my test case, I connect the dwc2 controller with an usb hs >> Hub (GL852G-12), and plug an usb fs audio device (Plantronics >> headset) into the downstream port of Hub. Then use the usb mic >> to record, we can find noise when playback. >> >> It's because that the usb Hub uses an MDATA for the first >> transaction and a DATA0 for the second transaction for the isoc >> split in transaction. An typical isoc split in transaction sequence >> like this: >> >> - SSPLIT IN transaction >> - CSPLIT IN transaction >> - MDATA packet >> - CSPLIT IN transaction >> - DATA0 packet >> >> The DMA address of MDATA (urb->dma) is always DWORD-aligned, but >> the DMA address of DATA0 (urb->dma + qtd->isoc_split_offset) may >> not be DWORD-aligned, it depends on the qtd->isoc_split_offset (the >> length of MDATA). In my test case, the length of MDATA is usually >> unaligned, this casue DATA0 packet transmission error. >> >> This patch base on the old way of aligned DMA allocation in the >> dwc2 driver to get aligned DMA for isoc split in. >> >> Signed-off-by: William Wu >> --- >> Changes in v2: >> - None >> >> drivers/usb/dwc2/hcd.c | 63 +++++++++++++++++++++++++++++++++++++++++--- >> drivers/usb/dwc2/hcd.h | 10 +++++++ >> drivers/usb/dwc2/hcd_intr.c | 8 ++++++ >> drivers/usb/dwc2/hcd_queue.c | 8 +++++- >> 4 files changed, 85 insertions(+), 4 deletions(-) > A word of warning that I'm pretty rusty on dwc2 and even when I was > making lots of patches I still considered myself a bit clueless. > ...so if something seems wrong, please call me on it... Most of your suggestions are great, thanks very much! > >> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c >> index 190f959..8c2b35f 100644 >> --- a/drivers/usb/dwc2/hcd.c >> +++ b/drivers/usb/dwc2/hcd.c >> @@ -1562,11 +1562,20 @@ static void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg, >> } >> >> if (hsotg->params.host_dma) { >> - dwc2_writel((u32)chan->xfer_dma, >> - hsotg->regs + HCDMA(chan->hc_num)); >> + dma_addr_t dma_addr; >> + >> + if (chan->align_buf) { >> + if (dbg_hc(chan)) >> + dev_vdbg(hsotg->dev, "align_buf\n"); >> + dma_addr = chan->align_buf; >> + } else { >> + dma_addr = chan->xfer_dma; >> + } >> + dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num)); >> + >> if (dbg_hc(chan)) >> dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n", >> - (unsigned long)chan->xfer_dma, chan->hc_num); >> + (unsigned long)dma_addr, chan->hc_num); >> } >> >> /* Start the split */ >> @@ -2620,6 +2629,33 @@ static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg, >> } >> } >> >> +static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg, >> + struct dwc2_qh *qh, >> + struct dwc2_host_chan *chan) >> +{ >> + if (!qh->dw_align_buf) { >> + qh->dw_align_buf = kmalloc(chan->max_packet, > So you're potentially doing a bounce buffer atop a bounce buffer now, > right? That seems pretty non-optimal. You're also back to doing a > kmalloc at interrupt time which I found was pretty bad for > performance. Yes, I just allocate an additional bounce buffer here. I haven't thought consummately about this patch, it's really not a good way to use a kmalloc at interrut time. > Is there really no way you can do your memory allocation in > dwc2_alloc_dma_aligned_buffer() instead of here? For input packets, > if you could just allocate an extra 3 bytes in the original bounce > buffer you could just re-use the original bounce buffer together with > a memmove? AKA: > > transfersize = 13 + 64; > buf = alloc(16 + 64); > > // Do the first transfer, no problems. > dma_into(buf, 13); > > // 2nd transfer isn't aligned, so align. > // we allocated a little extra to account for this > dma_into(buf + 16, 64); > > // move back where it belongs. > memmove(buf + 13, buf + 16, 64); > > > To make the above work you'd need to still allocate a bounce buffer > even if the original "urb->transfer_buffer" was already aligned. > Ideally you'd be able to know when dwc2_alloc_dma_aligned_buffer() > that this is one of the special cases where you need a slightly > oversized bounce buffer. It's a good way to allocate an extra 3 bytes in the original bounce buffer for this unaligned issue, it's similar to the tailroom of sk_buff. However, just as you said, we'd better find the special cases where we need an oversized bounce buffer, otherwise,we need to allocate a bounce buffer for all of urbs. It's hard for me to know the special cases in the dwc2_alloc_dma_aligned_buffer(), because it's called from usb_submit_urb() in the device class driver, and I hardly know the split state in this process, much less if the split transaction need aligned buffer. Do you have any idea? I suppose that we can't find the special cases where we need an oversized bounce buffer in the dwc2_alloc_dma_aligned_buffer(), and if we still want to re-use the original bounce buffer with extra 3 bytes, then we need to allocate a  bounce buffer for all of urbs, and do unnecessary  data copy for these urbs  whose transfer_buffer were already aligned.  This may reduce the transmission rate of USB. Can we just pre-allocate an additional aligned buffer (the size is 200 bytes) for split transaction in dwc2_map_urb_for_dma for all of urbs. And if we find the split transaction is unaligned, we can easily use the pre-allocated aligned buffer. > > --- > > If you somehow need to do something for output, you'd do the opposite. > You'd copy backwards top already transferred data to align. > > >> + GFP_ATOMIC | GFP_DMA); >> + if (!qh->dw_align_buf) >> + return -ENOMEM; >> + >> + qh->dw_align_buf_size = chan->max_packet; >> + } >> + >> + qh->dw_align_buf_dma = dma_map_single(hsotg->dev, qh->dw_align_buf, >> + qh->dw_align_buf_size, >> + DMA_FROM_DEVICE); >> + >> + if (dma_mapping_error(hsotg->dev, qh->dw_align_buf_dma)) { >> + dev_err(hsotg->dev, "can't map align_buf\n"); >> + chan->align_buf = 0; >> + return -EINVAL; >> + } >> + >> + chan->align_buf = qh->dw_align_buf_dma; >> + return 0; >> +} >> + >> #define DWC2_USB_DMA_ALIGN 4 >> >> struct dma_aligned_buffer { >> @@ -2797,6 +2833,27 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh) >> /* Set the transfer attributes */ >> dwc2_hc_init_xfer(hsotg, chan, qtd); >> >> + /* For non-dword aligned buffers */ >> + if (hsotg->params.host_dma > 0 && qh->do_split && >> + chan->ep_is_in && (chan->xfer_dma & 0x3)) { > So what happens if were unaligned (AKA (chan->xfer_dma & 0x3)) but > we're not doing split or it's not an IN EP? Do we just fail then? > > I guess the rest of this patch only handles the "in" case and maybe > you expect that the problems will only come about for do_split, but it > still might be wise to at least print a warning in the other cases? > >From reading dwc2_hc_init_xfer() it seems like you could run into this > same problem in the "out" case? Actually, I only find non-dword aligned issue in the case of split in transaction. And I think that if we're not doing split or it's an OUT EP, we can always get aligned buffer in the current code. For non-split case, the dwc2_alloc_dma_aligned_buffer() is enough. And for split out case, if the transaction is subdivided into multiple start-splits, each with a data payload of 188 bytes or less, so the DMA address is always aligned. > > >> + dev_vdbg(hsotg->dev, "Non-aligned buffer\n"); >> + if (dwc2_alloc_split_dma_aligned_buf(hsotg, qh, chan)) { >> + dev_err(hsotg->dev, >> + "%s: Failed to allocate memory to handle non-dword aligned buffer\n", >> + __func__); > Here and elsewhere in this patch: In general I think policy is that > you shouldn't include __func__ for dev_err(). The string together > with the name of the device is supposed to uniquely determine where > you are. Ah, I got it. I will fix it in next patch. >> + /* Add channel back to free list */ >> + chan->align_buf = 0; >> + chan->multi_count = 0; >> + list_add_tail(&chan->hc_list_entry, >> + &hsotg->free_hc_list); >> + qtd->in_process = 0; >> + qh->channel = NULL; >> + return -ENOMEM; >> + } >> + } else { >> + chan->align_buf = 0; >> + } >> + >> if (chan->ep_type == USB_ENDPOINT_XFER_INT || >> chan->ep_type == USB_ENDPOINT_XFER_ISOC) >> /* >> diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h >> index 96a9da5..adf1fd0 100644 >> --- a/drivers/usb/dwc2/hcd.h >> +++ b/drivers/usb/dwc2/hcd.h >> @@ -76,6 +76,8 @@ struct dwc2_qh; >> * (micro)frame >> * @xfer_buf: Pointer to current transfer buffer position >> * @xfer_dma: DMA address of xfer_buf >> + * @align_buf: In Buffer DMA mode this will be used if xfer_buf is not >> + * DWORD aligned > Your patch uses tabs but everything else in this comment uses spaces. > Please be consistent with surrounding code. It looks like I may have > screwed this up in the past on the comment of "struct dwc2_qh", but > that's no reason to mess it up again... Ah, I got it. I will fix it in next patch. > >> * @xfer_len: Total number of bytes to transfer >> * @xfer_count: Number of bytes transferred so far >> * @start_pkt_count: Packet count at start of transfer >> @@ -133,6 +135,7 @@ struct dwc2_host_chan { >> >> u8 *xfer_buf; >> dma_addr_t xfer_dma; >> + dma_addr_t align_buf; >> u32 xfer_len; >> u32 xfer_count; >> u16 start_pkt_count; >> @@ -303,6 +306,10 @@ struct dwc2_hs_transfer_time { >> * is tightly packed. >> * @ls_duration_us: Duration on the low speed bus schedule. >> * @ntd: Actual number of transfer descriptors in a list >> + * @dw_align_buf: Used instead of original buffer if its physical address >> + * is not dword-aligned >> + * @dw_align_buf_size: Size of dw_align_buf >> + * @dw_align_buf_dma: DMA address for dw_align_buf >> * @qtd_list: List of QTDs for this QH >> * @channel: Host channel currently processing transfers for this QH >> * @qh_list_entry: Entry for QH in either the periodic or non-periodic >> @@ -350,6 +357,9 @@ struct dwc2_qh { >> struct dwc2_hs_transfer_time hs_transfers[DWC2_HS_SCHEDULE_UFRAMES]; >> u32 ls_start_schedule_slice; >> u16 ntd; >> + u8 *dw_align_buf; >> + int dw_align_buf_size; >> + dma_addr_t dw_align_buf_dma; >> struct list_head qtd_list; >> struct dwc2_host_chan *channel; >> struct list_head qh_list_entry; >> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c >> index a5dfd9d..5e2378f 100644 >> --- a/drivers/usb/dwc2/hcd_intr.c >> +++ b/drivers/usb/dwc2/hcd_intr.c >> @@ -938,6 +938,14 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg *hsotg, >> >> frame_desc->actual_length += len; >> >> + if (chan->align_buf) { >> + dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__); >> + dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma, >> + chan->qh->dw_align_buf_size, DMA_FROM_DEVICE); >> + memcpy(qtd->urb->buf + frame_desc->offset + >> + qtd->isoc_split_offset, chan->qh->dw_align_buf, len); > Assuming I'm understanding this patch correctly, I think it would be > better to write: > > memcpy(qtd->xfer_dma, chan->qh->dw_align_buf, len); Sorry, there's no "xfer_buf" in qtd, do you means the "chan->xfer_dma"? If it's, I think we can't do memcpy from a transfer buffer to a DMA address. Maybe chan->xfer_buf is more suitable, but it seems that the dwc2 driver doesn't update the chan->xfer_buf for isoc transfer with dma enabled in dwc2_hc_init_xfer(). > > Then if you ever end up having to align a transfer other than a split > you won't be doing the wrong math. As it is it's very non-obvious > that you're hardcoding the same formula that's in dwc2_hc_init_xfer() Actually, I'm hardcoding the same formula from the old code which has been ripped out in the commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way"). > > >> + } >> + >> qtd->isoc_split_offset += len; >> >> hctsiz = dwc2_readl(hsotg->regs + HCTSIZ(chnum)); >> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c >> index e34ad5e..a120bb0 100644 >> --- a/drivers/usb/dwc2/hcd_queue.c >> +++ b/drivers/usb/dwc2/hcd_queue.c >> @@ -1693,8 +1693,14 @@ void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh) >> >> dwc2_host_put_tt_info(hsotg, qh->dwc_tt); >> >> - if (qh->desc_list) >> + if (qh->desc_list) { >> dwc2_hcd_qh_free_ddma(hsotg, qh); >> + } else { >> + /* kfree(NULL) is safe */ >> + kfree(qh->dw_align_buf); >> + qh->dw_align_buf_dma = (dma_addr_t)0; > Why assign qh->dw_align_buf_dma to 0? The next thing you're doing is > kfree(qh). If you want extra debugging, turn on slub_debug. I just copy the same code from the old code which has been ripped out in the commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way"). I really don't know why assign qh->dw_align_buf_dma to 0. > > >> + } >> + >> kfree(qh); >> } > >