Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp244138pxj; Thu, 17 Jun 2021 01:33:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyE0wSXjYGFH/HCASeUxLxWRx1rdX57YZjtVpzb8GvE0DubuqFlwwuzS3iMxKBG8yINstDY X-Received: by 2002:a05:6402:22d0:: with SMTP id dm16mr4952254edb.123.1623918807228; Thu, 17 Jun 2021 01:33:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623918807; cv=none; d=google.com; s=arc-20160816; b=KPRTOimwqVB5ogCMMq0En7F26naR8ZNQ3yeW4MzOHSIDN65ISzTkvP44hrK3lXjgmC Ra6Nv4XsFyBOIMSZABt2JfqfPycdy7s/+CIzqx1jZBHK262/5hlr2HDva8q6DwQQj+7A DMPBAFGzduvOu6XC/vKjqOa/nkyEDktVEovkF135jGkfZY4VDBxcw8am/FTpbPcYdMyM LY6WkSoUEwNOBLo4YbLScSbj/0wh6ay5+ebMorhiHriuh/1tMwq8077ttJKJpyPqqtiw YI8AmGLizTSxURDu8490MeKJJ7ixJmBVODonDNzPtxgY7QZAlblTSeKcn/AnzKUePKV1 hERA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dmarc-filter:sender:dkim-signature; bh=HaOc5QLBux2ks4tPU9cAdUn1nI4ZODOvu6G6gQcGFVk=; b=SQ/OxtIdoggUxsK9dx/mJkQ0ljclWz31x5fLxQQBKWnM0NxqpP7pZjBuuFlBnCQ+Hj x3zfy96trzd46tVMRgOYCL/fI074Ybs3IDKJ3n2wFKl+vQa8j0Fq5oBlcAEgsdpGAajs v4lWOVFAf03GE3lvPOwYzAUOfP0UXaYgl1o+/Z8sqBZNPT+VFvvAOt96A33XSMaswxcj IMZBAeKEHiU4vAx8bwBYtpv1K8cVNnrjsHwDq7RnSbiYyEmbrpzONU6vvNS5JoTPOh/P o0c9ovPVj/HezsoQEN793tUSOpDtUuXk0g6ECfiOWIHIF4iLDZ9NKOAOdeA3cA7dzUPS dxFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=tTKNbGlo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e17si4301283ejr.626.2021.06.17.01.33.04; Thu, 17 Jun 2021 01:33:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=tTKNbGlo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231148AbhFQIcj (ORCPT + 99 others); Thu, 17 Jun 2021 04:32:39 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:54243 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230473AbhFQIcf (ORCPT ); Thu, 17 Jun 2021 04:32:35 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1623918628; h=Content-Transfer-Encoding: Content-Type: In-Reply-To: MIME-Version: Date: Message-ID: From: References: Cc: To: Subject: Sender; bh=HaOc5QLBux2ks4tPU9cAdUn1nI4ZODOvu6G6gQcGFVk=; b=tTKNbGlooyMCvTC9jenNYzUJWlWH4ZW6QO7Cdpw61h25kfUw1fuSl/9t2yoxg5EFWVh4D/RC /WSUNJQi6M3lLD1YR158OLMFEdxoaGTDmO9Si8BJMnbURsr9QAtIRM244McpaTUrWn3nbPnW qSn0tIVWt6y/CKpEKp3+24R+w/A= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n07.prod.us-east-1.postgun.com with SMTP id 60cb081de27c0cc77f4805d3 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Thu, 17 Jun 2021 08:30:21 GMT Sender: wcheng=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 3189EC4338A; Thu, 17 Jun 2021 08:30:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=2.0 tests=ALL_TRUSTED,BAYES_00, NICE_REPLY_A,SPF_FAIL autolearn=no autolearn_force=no version=3.4.0 Received: from [10.110.86.111] (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: wcheng) by smtp.codeaurora.org (Postfix) with ESMTPSA id 23FF9C433D3; Thu, 17 Jun 2021 08:30:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 23FF9C433D3 Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=fail smtp.mailfrom=wcheng@codeaurora.org Subject: Re: [PATCH v9 0/5] Re-introduce TX FIFO resize for larger EP bursting To: Ferry Toth , Andy Shevchenko , Heikki Krogerus Cc: Felipe Balbi , Andy Shevchenko , Greg KH , Andy Gross , Bjorn Andersson , Rob Herring , USB , Linux Kernel Mailing List , linux-arm-msm@vger.kernel.org, devicetree , Jack Pham , Thinh Nguyen , John Youn References: <1621410561-32762-1-git-send-email-wcheng@codeaurora.org> <87k0n9btnb.fsf@kernel.org> <874ke62i0v.fsf@kernel.org> <8735to29tt.fsf@kernel.org> <87tum4zhc9.fsf@kernel.org> <33c6cecf-3dce-8c4f-2be2-55cc3c6c6830@gmail.com> <098b2211-c3cf-e4c3-c0bd-9b4f8253389b@gmail.com> <0e9373ec-931b-f96a-b2c9-dbd532a823a6@codeaurora.org> From: Wesley Cheng Message-ID: <76cb9339-c558-d11b-2cbd-41688c30ba70@codeaurora.org> Date: Thu, 17 Jun 2021 01:30:14 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/17/2021 12:47 AM, Ferry Toth wrote: > Hi > > Op 17-06-2021 om 06:25 schreef Wesley Cheng: >> On 6/15/2021 12:53 PM, Ferry Toth wrote: >>> Hi >>> >>> Op 15-06-2021 om 06:22 schreef Wesley Cheng: >>>> On 6/14/2021 12:30 PM, Ferry Toth wrote: >>>>> Op 14-06-2021 om 20:58 schreef Wesley Cheng: >>>>>> On 6/12/2021 2:27 PM, Ferry Toth wrote: >>>>>>> Hi >>>>>>> >>>>>>> Op 11-06-2021 om 15:21 schreef Andy Shevchenko: >>>>>>>> On Fri, Jun 11, 2021 at 4:14 PM Heikki Krogerus >>>>>>>> wrote: >>>>>>>>> On Fri, Jun 11, 2021 at 04:00:38PM +0300, Felipe Balbi wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Wesley Cheng writes: >>>>>>>>>>>>>>>>> to be honest, I don't think these should go in (apart from >>>>>>>>>>>>>>>>> the build >>>>>>>>>>>>>>>>> failure) because it's likely to break instantiations of the >>>>>>>>>>>>>>>>> core with >>>>>>>>>>>>>>>>> differing FIFO sizes. Some instantiations even have some >>>>>>>>>>>>>>>>> endpoints with >>>>>>>>>>>>>>>>> dedicated functionality that requires the default FIFO size >>>>>>>>>>>>>>>>> configured >>>>>>>>>>>>>>>>> during coreConsultant instantiation. I know of at OMAP5 and >>>>>>>>>>>>>>>>> some Intel >>>>>>>>>>>>>>>>> implementations which have dedicated endpoints for processor >>>>>>>>>>>>>>>>> tracing. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> With OMAP5, these endpoints are configured at the top of the >>>>>>>>>>>>>>>>> available >>>>>>>>>>>>>>>>> endpoints, which means that if a gadget driver gets loaded >>>>>>>>>>>>>>>>> and takes >>>>>>>>>>>>>>>>> over most of the FIFO space because of this resizing, >>>>>>>>>>>>>>>>> processor tracing >>>>>>>>>>>>>>>>> will have a hard time running. That being said, processor >>>>>>>>>>>>>>>>> tracing isn't >>>>>>>>>>>>>>>>> supported in upstream at this moment. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree that the application of this logic may differ between >>>>>>>>>>>>>>> vendors, >>>>>>>>>>>>>>> hence why I wanted to keep this controllable by the DT >>>>>>>>>>>>>>> property, so that >>>>>>>>>>>>>>> for those which do not support this use case can leave it >>>>>>>>>>>>>>> disabled.  The >>>>>>>>>>>>>>> logic is there to ensure that for a given USB configuration, >>>>>>>>>>>>>>> for each EP >>>>>>>>>>>>>>> it would have at least 1 TX FIFO.  For USB configurations >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>> utilize all available IN EPs, it would allow re-allocation of >>>>>>>>>>>>>>> internal >>>>>>>>>>>>>>> memory to EPs which will actually be in use. >>>>>>>>>>>>>> The feature ends up being all-or-nothing, then :-) It sounds >>>>>>>>>>>>>> like we can >>>>>>>>>>>>>> be a little nicer in this regard. >>>>>>>>>>>>>> >>>>>>>>>>>>> Don't get me wrong, I think once those features become available >>>>>>>>>>>>> upstream, we can improve the logic.  From what I remember when >>>>>>>>>>>>> looking >>>>>>>>>>>> sure, I support that. But I want to make sure the first cut isn't >>>>>>>>>>>> likely >>>>>>>>>>>> to break things left and right :) >>>>>>>>>>>> >>>>>>>>>>>> Hence, let's at least get more testing. >>>>>>>>>>>> >>>>>>>>>>> Sure, I'd hope that the other users of DWC3 will also see some >>>>>>>>>>> pretty >>>>>>>>>>> big improvements on the TX path with this. >>>>>>>>>> fingers crossed >>>>>>>>>> >>>>>>>>>>>>> at Andy Shevchenko's Github, the Intel tracer downstream changes >>>>>>>>>>>>> were >>>>>>>>>>>>> just to remove physical EP1 and 2 from the DWC3 endpoint list. >>>>>>>>>>>>> If that >>>>>>>>>>>> right, that's the reason why we introduced the endpoint feature >>>>>>>>>>>> flags. The end goal was that the UDC would be able to have custom >>>>>>>>>>>> feature flags paired with ->validate_endpoint() or whatever >>>>>>>>>>>> before >>>>>>>>>>>> allowing it to be enabled. Then the UDC driver could tell UDC >>>>>>>>>>>> core to >>>>>>>>>>>> skip that endpoint on that particular platform without >>>>>>>>>>>> interefering with >>>>>>>>>>>> everything else. >>>>>>>>>>>> >>>>>>>>>>>> Of course, we still need to figure out a way to abstract the >>>>>>>>>>>> different >>>>>>>>>>>> dwc3 instantiations. >>>>>>>>>>>> >>>>>>>>>>>>> was the change which ended up upstream for the Intel tracer >>>>>>>>>>>>> then we >>>>>>>>>>>>> could improve the logic to avoid re-sizing those particular EPs. >>>>>>>>>>>> The problem then, just as I mentioned in the previous paragraph, >>>>>>>>>>>> will be >>>>>>>>>>>> coming up with a solution that's elegant and works for all >>>>>>>>>>>> different >>>>>>>>>>>> instantiations of dwc3 (or musb, cdns3, etc). >>>>>>>>>>>> >>>>>>>>>>> Well, at least for the TX FIFO resizing logic, we'd only be >>>>>>>>>>> needing to >>>>>>>>>>> focus on the DWC3 implementation. >>>>>>>>>>> >>>>>>>>>>> You bring up another good topic that I'll eventually needing to be >>>>>>>>>>> taking a look at, which is a nice way we can handle vendor >>>>>>>>>>> specific >>>>>>>>>>> endpoints and how they can co-exist with other "normal" >>>>>>>>>>> endpoints.  We >>>>>>>>>>> have a few special HW eps as well, which we try to maintain >>>>>>>>>>> separately >>>>>>>>>>> in our DWC3 vendor driver, but it isn't the most convenient, or >>>>>>>>>>> most >>>>>>>>>>> pretty method :). >>>>>>>>>> Awesome, as mentioned, the endpoint feature flags were added >>>>>>>>>> exactly to >>>>>>>>>> allow for these vendor-specific features :-) >>>>>>>>>> >>>>>>>>>> I'm more than happy to help testing now that I finally got our >>>>>>>>>> SM8150 >>>>>>>>>> Surface Duo device tree accepted by Bjorn ;-) >>>>>>>>>> >>>>>>>>>>>>> However, I'm not sure how the changes would look like in the >>>>>>>>>>>>> end, >>>>>>>>>>>>> so I >>>>>>>>>>>>> would like to wait later down the line to include that :). >>>>>>>>>>>> Fair enough, I agree. Can we get some more testing of $subject, >>>>>>>>>>>> though? >>>>>>>>>>>> Did you test $subject with upstream too? Which gadget drivers >>>>>>>>>>>> did you >>>>>>>>>>>> use? How did you test >>>>>>>>>>>> >>>>>>>>>>> The results that I included in the cover page was tested with the >>>>>>>>>>> pure >>>>>>>>>>> upstream kernel on our device.  Below was using the ConfigFS >>>>>>>>>>> gadget >>>>>>>>>>> w/ a >>>>>>>>>>> mass storage only composition. >>>>>>>>>>> >>>>>>>>>>> Test Parameters: >>>>>>>>>>>     - Platform: Qualcomm SM8150 >>>>>>>>>>>     - bMaxBurst = 6 >>>>>>>>>>>     - USB req size = 256kB >>>>>>>>>>>     - Num of USB reqs = 16 >>>>>>>>>> do you mind testing with the regular request size (16KiB) and 250 >>>>>>>>>> requests? I think we can even do 15 bursts in that case. >>>>>>>>>> >>>>>>>>>>>     - USB Speed = Super-Speed >>>>>>>>>>>     - Function Driver: Mass Storage (w/ ramdisk) >>>>>>>>>>>     - Test Application: CrystalDiskMark >>>>>>>>>>> >>>>>>>>>>> Results: >>>>>>>>>>> >>>>>>>>>>> TXFIFO Depth = 3 max packets >>>>>>>>>>> >>>>>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>>>>> ------------------------------------------- >>>>>>>>>>> Sequential|1 GB x     | >>>>>>>>>>> Read      |9 loops    | 193.60 >>>>>>>>>>>              |           | 195.86 >>>>>>>>>>>              |           | 184.77 >>>>>>>>>>>              |           | 193.60 >>>>>>>>>>> ------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> TXFIFO Depth = 6 max packets >>>>>>>>>>> >>>>>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>>>>> ------------------------------------------- >>>>>>>>>>> Sequential|1 GB x     | >>>>>>>>>>> Read      |9 loops    | 287.35 >>>>>>>>>>>            |           | 304.94 >>>>>>>>>>>              |           | 289.64 >>>>>>>>>>>              |           | 293.61 >>>>>>>>>> I remember getting close to 400MiB/sec with Intel platforms without >>>>>>>>>> resizing FIFOs and I'm sure the FIFO size was set to 2x1024, >>>>>>>>>> though my >>>>>>>>>> memory could be failing. >>>>>>>>>> >>>>>>>>>> Then again, I never ran with CrystalDiskMark, I was using my own >>>>>>>>>> tool >>>>>>>>>> (it's somewhere in github. If you care, I can look up the URL). >>>>>>>>>> >>>>>>>>>>> We also have internal numbers which have shown similar >>>>>>>>>>> improvements as >>>>>>>>>>> well.  Those are over networking/tethering interfaces, so testing >>>>>>>>>>> IPERF >>>>>>>>>>> loopback over TCP/UDP. >>>>>>>>>> loopback iperf? That would skip the wire, no? >>>>>>>>>> >>>>>>>>>>>>> size of 2 and TX threshold of 1, this would really be not >>>>>>>>>>>>> beneficial to >>>>>>>>>>>>> us, because we can only change the TX threshold to 2 at max, >>>>>>>>>>>>> and at >>>>>>>>>>>>> least in my observations, once we have to go out to system >>>>>>>>>>>>> memory to >>>>>>>>>>>>> fetch the next data packet, that latency takes enough time >>>>>>>>>>>>> for the >>>>>>>>>>>>> controller to end the current burst. >>>>>>>>>>>> What I noticed with g_mass_storage is that we can amortize the >>>>>>>>>>>> cost of >>>>>>>>>>>> fetching data from memory, with a deeper request queue. >>>>>>>>>>>> Whenever I >>>>>>>>>>>> test(ed) g_mass_storage, I was doing so with 250 requests. And >>>>>>>>>>>> that was >>>>>>>>>>>> enough to give me very good performance. Never had to poke at TX >>>>>>>>>>>> FIFO >>>>>>>>>>>> resizing. Did you try something like this too? >>>>>>>>>>>> >>>>>>>>>>>> I feel that allocating more requests is a far simpler and more >>>>>>>>>>>> generic >>>>>>>>>>>> method that changing FIFO sizes :) >>>>>>>>>>>> >>>>>>>>>>> I wish I had a USB bus trace handy to show you, which would >>>>>>>>>>> make it >>>>>>>>>>> very >>>>>>>>>>> clear how the USB bus is currently utilized with TXFIFO size 2 vs >>>>>>>>>>> 6.  So >>>>>>>>>>> by increasing the number of USB requests, that will help if there >>>>>>>>>>> was a >>>>>>>>>>> bottleneck at the SW level where the application/function driver >>>>>>>>>>> utilizing the DWC3 was submitting data much faster than the HW was >>>>>>>>>>> processing them. >>>>>>>>>>> >>>>>>>>>>> So yes, this method of increasing the # of USB reqs will >>>>>>>>>>> definitely >>>>>>>>>>> help >>>>>>>>>>> with situations such as HSUSB or in SSUSB when EP bursting isn't >>>>>>>>>>> used. >>>>>>>>>>> The TXFIFO resize comes into play for SSUSB, which utilizes >>>>>>>>>>> endpoint >>>>>>>>>>> bursting. >>>>>>>>>> Hmm, that's not what I remember. Perhaps the TRB cache size plays a >>>>>>>>>> role >>>>>>>>>> here too. I have clear memories of testing this very scenario of >>>>>>>>>> bursting (using g_mass_storage at the time) because I was curious >>>>>>>>>> about >>>>>>>>>> it. Back then, my tests showed no difference in behavior. >>>>>>>>>> >>>>>>>>>> It could be nice if Heikki could test Intel parts with and without >>>>>>>>>> your >>>>>>>>>> changes on g_mass_storage with 250 requests. >>>>>>>>> Andy, you have a system at hand that has the DWC3 block enabled, >>>>>>>>> right? Can you help out here? >>>>>>>> I'm not sure if i will have time soon, I Cc'ed to Ferry who has a few >>>>>>>> more test cases (I have only one or two) and maybe can help. But I'll >>>>>>>> keep this in mind. >>>>>>> I just tested on 5.13.0-rc4 on Intel Edison (x86_64). All 5 patches >>>>>>> apply. Switching between host/gadget works, no connections >>>>>>> dropping, no >>>>>>> errors in dmesg. >>>>>>> >>>>>>> In host mode I connect a smsc9504 eth+4p hub. In gadget mode I have >>>>>>> composite device created from configfs with gser / eem / >>>>>>> mass_storage / >>>>>>> uac2. >>>>>>> >>>>>>> Tested with iperf3 performance in host (93.6Mbits/sec) and gadget >>>>>>> (207Mbits/sec) mode. Compared to v5.10.41 without patches host >>>>>>> (93.4Mbits/sec) and gadget (198Mbits/sec). >>>>>>> >>>>>>> Gadget seems to be a little faster with the patches, but that might >>>>>>> also >>>>>>> be caused  by something else, on v5.10.41 I see the bitrate bouncing >>>>>>> between 207 and 199. >>>>>>> >>>>>>> I saw a mention to test iperf3 to self (loopback). 3.09 Gbits/sec. >>>>>>> With >>>>>>> v5.10.41 3.07Gbits/sec. Not bad for a 500MHz device. >>>>>>> >>>>>>> With gnome-disks I did a read access benchmark 35.4MB/s, with v5.10.41 >>>>>>> 34.7MB/s. This might be limited by Edison's internal eMMC speed (when >>>>>>> booting U-Boot reads the kernel with 21.4 MiB/s). >>>>>>> >>>>>> Hi Ferry, >>>>>> >>>>>> Thanks for the testing.  Just to double check, did you also enable the >>>>>> property, which enabled the TXFIFO resize feature on the platform?  For >>>>>> example, for the QCOM SM8150 platform, we're adding the following to >>>>>> our >>>>>> device tree node: >>>>>> >>>>>> tx-fifo-resize >>>>>> >>>>>> If not, then your results at least confirms that w/o the property >>>>>> present, the changes won't break anything :).  Thanks again for the >>>>>> initial testing! >>> I applied the patch now to 5.13.0-rc5 + the following: >>> >> Hi Ferry, >> >> Quick question...there was a compile error with the V9 patch series, as >> it was using the dwc3_mwidth() incorrectly. I will update this with the >> proper use of the mdwidth, but which patch version did you use? Hi Ferry, > The V9 set gets applied to 5.13.0-rc5 by Yocto, if it doesn't apply it > stops the build. I didn't notice any compile errors, they stop the whole > build process too. But warnings are ignored. I'll check the logs to be sure. Ah, ok. I think the incorrect usage of the API will result as a warning as seen in Greg's compile log: drivers/usb/dwc3/gadget.c: In function ‘dwc3_gadget_calc_tx_fifo_size’: drivers/usb/dwc3/gadget.c:653:45: warning: passing argument 1 of ‘dwc3_mdwidth’ makes pointer from integer without a cast [-Wint-conversion] 653 | mdwidth = dwc3_mdwidth(dwc->hwparams.hwparams0); This is probably why the page fault occurs, as we're not passing in the DWC3 struct. I will send out a V10 shortly after testing it on my device. Thanks Wesley Cheng >> Thanks >> Wesley Cheng >> >>> --- a/drivers/usb/dwc3/dwc3-pci.c >>> +++ b/drivers/usb/dwc3/dwc3-pci.c >>> @@ -124,6 +124,7 @@ static const struct property_entry >>> dwc3_pci_mrfld_properties[] = { >>>      PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"), >>>      PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"), >>>      PROPERTY_ENTRY_BOOL("snps,usb2-gadget-lpm-disable"), >>> +    PROPERTY_ENTRY_BOOL("tx-fifo-resize"), >>>      PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"), >>>      {} >>>  }; >>> >>>  and when switching to gadget mode unfortunately received the following >>> oops: >>> >>> BUG: unable to handle page fault for address: 00000000202043f2 >>> #PF: supervisor read access in kernel mode >>> #PF: error_code(0x0000) - not-present page >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP PTI >>> CPU: 0 PID: 617 Comm: conf-gadget.sh Not tainted >>> 5.13.0-rc5-edison-acpi-standard #1 >>> Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 >>> 2015.01.21:18.19.48 >>> RIP: 0010:dwc3_gadget_check_config+0x33/0x80 >>> Code: 59 04 00 00 04 74 61 48 c1 ee 10 48 89 f7 f3 48 0f b8 c7 48 89 c7 >>> 39 81 60 04 00 00 7d 4a 89 81 60 04 00 00 8b 81 08 04 00 00 <81> b8 e8 >>> 03 00 00 32 33 00 00 0f b6 b0 09 04 00 00 75 0d 8b 80 20 >>> RSP: 0018:ffffb5550038fda0 EFLAGS: 00010297 >>> RAX: 000000002020400a RBX: ffffa04502627348 RCX: ffffa04507354028 >>> RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000004 >>> RBP: ffffa04508ac0550 R08: ffffa04503a75b2c R09: 0000000000000000 >>> R10: 0000000000000216 R11: 000000000002eba0 R12: ffffa04508ac0550 >>> R13: dead000000000100 R14: ffffa04508ac0600 R15: ffffa04508ac0520 >>> FS:  00007f7471e2f740(0000) GS:ffffa0453e200000(0000) >>> knlGS:0000000000000000 >>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 00000000202043f2 CR3: 0000000003f38000 CR4: 00000000001006f0 >>> Call Trace: >>>  configfs_composite_bind+0x2f4/0x430 [libcomposite] >>>  udc_bind_to_driver+0x64/0x180 >>>  usb_gadget_probe_driver+0x114/0x150 >>>  gadget_dev_desc_UDC_store+0xbc/0x130 [libcomposite] >>>  configfs_write_file+0xcd/0x140 >>>  vfs_write+0xbb/0x250 >>>  ksys_write+0x5a/0xd0 >>>  do_syscall_64+0x40/0x80 >>>  entry_SYSCALL_64_after_hwframe+0x44/0xae >>> RIP: 0033:0x7f7471f1ff53 >>> Code: 8b 15 21 cf 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f >>> 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 >>> f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 >>> RSP: 002b:00007fffa3dcd328 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >>> RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7471f1ff53 >>> RDX: 000000000000000c RSI: 00005614d615a770 RDI: 0000000000000001 >>> RBP: 00005614d615a770 R08: 000000000000000a R09: 00007f7471fb20c0 >>> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c >>> R13: 00007f7471fee520 R14: 000000000000000c R15: 00007f7471fee720 >>> Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage usb_f_eem >>> u_ether usb_f_serial u_serial libcomposite rfcomm iptable_nat bnep >>> snd_sof_nocodec spi_pxa2xx_platform dw_dmac smsc snd_sof_pci_intel_tng >>> snd_sof_pci snd_sof_acpi_intel_byt snd_sof_intel_ipc snd_sof_acpi >>> smsc95xx snd_sof pwm_lpss_pci pwm_lpss snd_sof_xtensa_dsp >>> snd_intel_dspcfg snd_soc_acpi_intel_match snd_soc_acpi dw_dmac_pci >>> intel_mrfld_pwrbtn intel_mrfld_adc dw_dmac_core spi_pxa2xx_pci brcmfmac >>> brcmutil leds_gpio hci_uart btbcm ti_ads7950 >>> industrialio_triggered_buffer kfifo_buf ledtrig_timer ledtrig_heartbeat >>> mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class >>> intel_soc_pmic_mrfld mmc_core btrfs libcrc32c xor zstd_compress >>> zlib_deflate raid6_pq >>> CR2: 00000000202043f2 >>> ---[ end trace 5c11fe50dca92ad4 ]--- >>> >>>>> No I didn't. Afaik we don't have a devicetree property to set. >>>>> >>>>> But I'd be happy to test that as well. But where to set the property? >>>>> >>>>> dwc3_pci_mrfld_properties[] in dwc3-pci? >>>>> >>>> Hi Ferry, >>>> >>>> Not too sure which DWC3 driver is used for the Intel platform, but I >>>> believe that should be the one. (if that's what is normally used)  We'd >>>> just need to add an entry w/ the below: >>>> >>>> PROPERTY_ENTRY_BOOL("tx-fifo-resize") >>>> >>>> Thanks >>>> Wesley Cheng >>>> >>>>>> Thanks >>>>>> Wesley Cheng >>>>>> >>>>>>>>>>> Now with endpoint bursting, if the function notifies the host that >>>>>>>>>>> bursting is supported, when the host sends the ACK for the Data >>>>>>>>>>> Packet, >>>>>>>>>>> it should have a NumP value equal to the bMaxBurst reported in >>>>>>>>>>> the EP >>>>>>>>>> Yes and no. Looking back at the history, we used to configure NUMP >>>>>>>>>> based >>>>>>>>>> on bMaxBurst, but it was changed later in commit >>>>>>>>>> 4e99472bc10bda9906526d725ff6d5f27b4ddca1 by yours truly because >>>>>>>>>> of a >>>>>>>>>> problem reported by John Youn. >>>>>>>>>> >>>>>>>>>> And now we've come full circle. Because even if I believe more >>>>>>>>>> requests >>>>>>>>>> are enough for bursting, NUMP is limited by the RxFIFO size. This >>>>>>>>>> ends >>>>>>>>>> up supporting your claim that we need RxFIFO resizing if we want to >>>>>>>>>> squeeze more throughput out of the controller. >>>>>>>>>> >>>>>>>>>> However, note that this is about RxFIFO size, not TxFIFO size. In >>>>>>>>>> fact, >>>>>>>>>> looking at Table 8-13 of USB 3.1 r1.0, we read the following about >>>>>>>>>> NumP >>>>>>>>>> (emphasis is mine): >>>>>>>>>> >>>>>>>>>>          "Number of Packets (NumP). This field is used to >>>>>>>>>> indicate the >>>>>>>>>>          number of Data Packet buffers that the **receiver** can >>>>>>>>>>          accept. The value in this field shall be less than or >>>>>>>>>> equal to >>>>>>>>>>          the maximum burst size supported by the endpoint as >>>>>>>>>> determined >>>>>>>>>>          by the value in the bMaxBurst field in the Endpoint >>>>>>>>>> Companion >>>>>>>>>>          Descriptor (refer to Section 9.6.7)." >>>>>>>>>> >>>>>>>>>> So, NumP is for the receiver, not the transmitter. Could you >>>>>>>>>> clarify >>>>>>>>>> what you mean here? >>>>>>>>>> >>>>>>>>>> /me keeps reading >>>>>>>>>> >>>>>>>>>> Hmm, table 8-15 tries to clarify: >>>>>>>>>> >>>>>>>>>>          "Number of Packets (NumP). >>>>>>>>>> >>>>>>>>>>          For an OUT endpoint, refer to Table 8-13 for the >>>>>>>>>> description of >>>>>>>>>>          this field. >>>>>>>>>> >>>>>>>>>>          For an IN endpoint this field is set by the endpoint to >>>>>>>>>> the >>>>>>>>>>          number of packets it can transmit when the host resumes >>>>>>>>>>          transactions to it. This field shall not have a value >>>>>>>>>> greater >>>>>>>>>>          than the maximum burst size supported by the endpoint as >>>>>>>>>>          indicated by the value in the bMaxBurst field in the >>>>>>>>>> Endpoint >>>>>>>>>>          Companion Descriptor. Note that the value reported in this >>>>>>>>>> field >>>>>>>>>>          may be treated by the host as informative only." >>>>>>>>>> >>>>>>>>>> However, if I remember correctly (please verify dwc3 databook), >>>>>>>>>> NUMP in >>>>>>>>>> DCFG was only for receive buffers. Thin, John, how does dwc3 >>>>>>>>>> compute >>>>>>>>>> NumP for TX/IN endpoints? Is that computed as a function of >>>>>>>>>> DCFG.NUMP or >>>>>>>>>> TxFIFO size? >>>>>>>>>> >>>>>>>>>>> desc.  If we have a TXFIFO size of 2, then normally what I have >>>>>>>>>>> seen is >>>>>>>>>>> that after 2 data packets, the device issues a NRDY.  So then we'd >>>>>>>>>>> need >>>>>>>>>>> to send an ERDY once data is available within the FIFO, and the >>>>>>>>>>> same >>>>>>>>>>> sequence happens until the USB request is complete.  With this >>>>>>>>>>> constant >>>>>>>>>>> NRDY/ERDY handshake going on, you actually see that the bus is >>>>>>>>>>> under >>>>>>>>>>> utilized.  When we increase an EP's FIFO size, then you'll see >>>>>>>>>>> constant >>>>>>>>>>> bursts for a request, until the request is done, or if the host >>>>>>>>>>> runs out >>>>>>>>>>> of RXFIFO. (ie no interruption [on the USB protocol level] during >>>>>>>>>>> USB >>>>>>>>>>> request data transfer) >>>>>>>>>> Unfortunately I don't have access to a USB sniffer anymore :-( >>>>>>>>>> >>>>>>>>>>>>>>>> Good points. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Wesley, what kind of testing have you done on this on >>>>>>>>>>>>>>>> different devices? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As mentioned above, these changes are currently present on end >>>>>>>>>>>>>>> user >>>>>>>>>>>>>>> devices for the past few years, so its been through a lot of >>>>>>>>>>>>>>> testing :). >>>>>>>>>>>>>> all with the same gadget driver. Also, who uses USB on android >>>>>>>>>>>>>> devices >>>>>>>>>>>>>> these days? Most of the data transfer goes via WiFi or >>>>>>>>>>>>>> Bluetooth, anyway >>>>>>>>>>>>>> :-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess only developers are using USB during development to >>>>>>>>>>>>>> flash dev >>>>>>>>>>>>>> images heh. >>>>>>>>>>>>>> >>>>>>>>>>>>> I used to be a customer facing engineer, so honestly I did see >>>>>>>>>>>>> some >>>>>>>>>>>>> really interesting and crazy designs.  Again, we do have >>>>>>>>>>>>> non-Android >>>>>>>>>>>>> products that use the same code, and it has been working in >>>>>>>>>>>>> there >>>>>>>>>>>>> for a >>>>>>>>>>>>> few years as well.  The TXFIFO sizing really has helped with >>>>>>>>>>>>> multimedia >>>>>>>>>>>>> use cases, which use isoc endpoints, since esp. in those lower >>>>>>>>>>>>> end CPU >>>>>>>>>>>>> chips where latencies across the system are much larger, and a >>>>>>>>>>>>> missed >>>>>>>>>>>>> ISOC interval leads to a pop in your ear. >>>>>>>>>>>> This is good background information. Thanks for bringing this >>>>>>>>>>>> up. Admitedly, we still have ISOC issues with dwc3. I'm >>>>>>>>>>>> interested in >>>>>>>>>>>> knowing if a deeper request queue would also help here. >>>>>>>>>>>> >>>>>>>>>>>> Remember dwc3 can accomodate 255 requests + link for each >>>>>>>>>>>> endpoint. If >>>>>>>>>>>> our gadget driver uses a low number of requests, we're never >>>>>>>>>>>> really >>>>>>>>>>>> using the TRB ring in our benefit. >>>>>>>>>>>> >>>>>>>>>>> We're actually using both a deeper USB request queue + TX fifo >>>>>>>>>>> resizing. :). >>>>>>>>>> okay, great. Let's see what John and/or Thinh respond WRT dwc3 TX >>>>>>>>>> Burst >>>>>>>>>> behavior. >>>>>>>>> --  >>>>>>>>> heikki -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project