Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3988986pxj; Tue, 15 Jun 2021 12:55:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz0VZSfHybeJiI9R50/sQDn5iTS2fxLyPj5WDn61M611EUA8x0Py3Cb74HZBzvY0DeMEPSR X-Received: by 2002:a05:6638:24d0:: with SMTP id y16mr767335jat.41.1623786911578; Tue, 15 Jun 2021 12:55:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623786911; cv=none; d=google.com; s=arc-20160816; b=lnWmW94ME/cyW9WC5YsOYHB9wuc81+hl1Rhi1ItU/p9jM5fjvHHqqi1y9ssbv9RGVW RYNH1POjQWo8cPT6HgDXSFqTgzCpLGSuWQwIcfcuYxbD9Y/6KHs3x0ZuGB//EgJQp3O7 C5PJ7apRLTJub7LIPNUsGBnR9DkI4jb6GmCrxQPkUUNWZZLYAAiAs3R+yTP1P0By2U45 U2E8kUZHL9INSohuSDySJuSxwtEEHSykXrrJFNSR6ABtg8Q4l4DclNEICiRqiSPTD95S YI2kxM7bOfyzAdMSGE6l0V5HvfmRT3REntTlea4+JizpR0YQsffYwpl9dPgMCQvQJtUC fz2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=G187zacbYCpZlvGG2MlKE9QPCKsMgMxqPghSo/2Ovwo=; b=jSh0L0iKWRDX9hufQ6NpJe/Ylk7J9gmbfMVf7J/0mTWQufjh+AxlsdiJeN9mkSX0tU jCYbKbHietLO33tKYBgwFj/H/NT1FhltkbtkpcUghuRq2YqAd26856Yx+hAdaS9ZFwln 4Fi1IzpI3uejIbLc7SaMdo2QHF94ABoMsoB5Pc1Q8d0tzbOIyVeZ8p7WyULZkO/Ltasa y1NpJ2/7ZVs5vQmdGGb/iHQpwCXUT3XZ9ZA1sPapMck7XCW7ZGaCOcC9FKbZpiRxV3yy vOxfvB9ccoBZ0yAYmqZbw8Kd3raM/NjC1Mhym5wHvJcmz0oNgt+bjgDxIqWtL8wACogW bMTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=sFZvoZdT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e7si11836851ilq.68.2021.06.15.12.54.58; Tue, 15 Jun 2021 12:55:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=sFZvoZdT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230295AbhFOTzK (ORCPT + 99 others); Tue, 15 Jun 2021 15:55:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbhFOTzJ (ORCPT ); Tue, 15 Jun 2021 15:55:09 -0400 Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F20BC061574; Tue, 15 Jun 2021 12:53:04 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id ce15so24420344ejb.4; Tue, 15 Jun 2021 12:53:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=G187zacbYCpZlvGG2MlKE9QPCKsMgMxqPghSo/2Ovwo=; b=sFZvoZdT4JkETUat7VJjRXK54Qroqh7eTuNe+5HdljhXsR2HYrZMyl22XRNwCJrqva hFLE3KJl8IzgLpqp+3MzUxOck1fA9RqTuZu9QPO5/SI0gL5DJex4czn7oE2oTdDvmEU8 FCw+fVM9wRXgTY84elCxY8/Bn4yAaDWZVt59lolAhPtCnbFg3mpFnplQfaHwmaL29xKJ QPhRk5KI4xQuQ1KYx3ygzIi/6Rfj3FGQV5HDnHDkBynoa+/5ODzlzeWf69d9jPxs588n QTITfMGobCadP/GGQpyzA4NgLK8fiXKh3P07jqQheUWTPh1c61h5Lj8klns5NV+CIWSW LmKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=G187zacbYCpZlvGG2MlKE9QPCKsMgMxqPghSo/2Ovwo=; b=K5KaY+ex5tAT7tJpnOQAjBu38K21X7ZyHyhIPimEJsdvyElwVqMcfO8/BKnXCqI4df gsz6bxPBfbJs3Hf5p0A+7UzsYRw5nsHP+nR238eCLmF7Jr6it+Zf2cbMIoG1FpR5O9YK XThWUpIX077eQE+KgKlrs2xaOZvIwvR/NLzxy2NTjEortymzvmPwM63D8CYnV4+loqki 6MtQhtszMZ4pAwgEOXEBdVjIHEULKO3RTXHoA2E9oTXR/KbyvtyiOsYKPE2Sk8Wj7ylI 4xgnidMBjrLgilrJLfQDGXAoebife2o0Ns6oaexXCuG4ou5lPs6pAr7Psqb6kLi9KAwm VAiA== X-Gm-Message-State: AOAM532U3BCpoVxfKP19383F309lzqhad42iIRlIUHHZTsH6DghdAfci BVxTAfmdD9va/pT8DECTeX8= X-Received: by 2002:a17:906:19cc:: with SMTP id h12mr1324647ejd.306.1623786782613; Tue, 15 Jun 2021 12:53:02 -0700 (PDT) Received: from ?IPv6:2001:981:6fec:1:3d6d:2993:bad5:c4a3? ([2001:981:6fec:1:3d6d:2993:bad5:c4a3]) by smtp.gmail.com with ESMTPSA id gx28sm10751215ejc.107.2021.06.15.12.53.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 15 Jun 2021 12:53:01 -0700 (PDT) Subject: Re: [PATCH v9 0/5] Re-introduce TX FIFO resize for larger EP bursting To: Wesley Cheng , Andy Shevchenko , Heikki Krogerus Cc: Felipe Balbi , Andy Shevchenko , Greg KH , Andy Gross , Bjorn Andersson , Rob Herring , USB , Linux Kernel Mailing List , linux-arm-msm@vger.kernel.org, devicetree , Jack Pham , Thinh Nguyen , John Youn References: <1621410561-32762-1-git-send-email-wcheng@codeaurora.org> <87k0n9btnb.fsf@kernel.org> <874ke62i0v.fsf@kernel.org> <8735to29tt.fsf@kernel.org> <87tum4zhc9.fsf@kernel.org> <33c6cecf-3dce-8c4f-2be2-55cc3c6c6830@gmail.com> <098b2211-c3cf-e4c3-c0bd-9b4f8253389b@gmail.com> <0e9373ec-931b-f96a-b2c9-dbd532a823a6@codeaurora.org> From: Ferry Toth Message-ID: Date: Tue, 15 Jun 2021 21:53:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <0e9373ec-931b-f96a-b2c9-dbd532a823a6@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Op 15-06-2021 om 06:22 schreef Wesley Cheng: > > On 6/14/2021 12:30 PM, Ferry Toth wrote: >> Op 14-06-2021 om 20:58 schreef Wesley Cheng: >>> On 6/12/2021 2:27 PM, Ferry Toth wrote: >>>> Hi >>>> >>>> Op 11-06-2021 om 15:21 schreef Andy Shevchenko: >>>>> On Fri, Jun 11, 2021 at 4:14 PM Heikki Krogerus >>>>> wrote: >>>>>> On Fri, Jun 11, 2021 at 04:00:38PM +0300, Felipe Balbi wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Wesley Cheng writes: >>>>>>>>>>>>>> to be honest, I don't think these should go in (apart from >>>>>>>>>>>>>> the build >>>>>>>>>>>>>> failure) because it's likely to break instantiations of the >>>>>>>>>>>>>> core with >>>>>>>>>>>>>> differing FIFO sizes. Some instantiations even have some >>>>>>>>>>>>>> endpoints with >>>>>>>>>>>>>> dedicated functionality that requires the default FIFO size >>>>>>>>>>>>>> configured >>>>>>>>>>>>>> during coreConsultant instantiation. I know of at OMAP5 and >>>>>>>>>>>>>> some Intel >>>>>>>>>>>>>> implementations which have dedicated endpoints for processor >>>>>>>>>>>>>> tracing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With OMAP5, these endpoints are configured at the top of the >>>>>>>>>>>>>> available >>>>>>>>>>>>>> endpoints, which means that if a gadget driver gets loaded >>>>>>>>>>>>>> and takes >>>>>>>>>>>>>> over most of the FIFO space because of this resizing, >>>>>>>>>>>>>> processor tracing >>>>>>>>>>>>>> will have a hard time running. That being said, processor >>>>>>>>>>>>>> tracing isn't >>>>>>>>>>>>>> supported in upstream at this moment. >>>>>>>>>>>>>> >>>>>>>>>>>> I agree that the application of this logic may differ between >>>>>>>>>>>> vendors, >>>>>>>>>>>> hence why I wanted to keep this controllable by the DT >>>>>>>>>>>> property, so that >>>>>>>>>>>> for those which do not support this use case can leave it >>>>>>>>>>>> disabled.  The >>>>>>>>>>>> logic is there to ensure that for a given USB configuration, >>>>>>>>>>>> for each EP >>>>>>>>>>>> it would have at least 1 TX FIFO.  For USB configurations which >>>>>>>>>>>> don't >>>>>>>>>>>> utilize all available IN EPs, it would allow re-allocation of >>>>>>>>>>>> internal >>>>>>>>>>>> memory to EPs which will actually be in use. >>>>>>>>>>> The feature ends up being all-or-nothing, then :-) It sounds >>>>>>>>>>> like we can >>>>>>>>>>> be a little nicer in this regard. >>>>>>>>>>> >>>>>>>>>> Don't get me wrong, I think once those features become available >>>>>>>>>> upstream, we can improve the logic.  From what I remember when >>>>>>>>>> looking >>>>>>>>> sure, I support that. But I want to make sure the first cut isn't >>>>>>>>> likely >>>>>>>>> to break things left and right :) >>>>>>>>> >>>>>>>>> Hence, let's at least get more testing. >>>>>>>>> >>>>>>>> Sure, I'd hope that the other users of DWC3 will also see some >>>>>>>> pretty >>>>>>>> big improvements on the TX path with this. >>>>>>> fingers crossed >>>>>>> >>>>>>>>>> at Andy Shevchenko's Github, the Intel tracer downstream changes >>>>>>>>>> were >>>>>>>>>> just to remove physical EP1 and 2 from the DWC3 endpoint list. >>>>>>>>>> If that >>>>>>>>> right, that's the reason why we introduced the endpoint feature >>>>>>>>> flags. The end goal was that the UDC would be able to have custom >>>>>>>>> feature flags paired with ->validate_endpoint() or whatever before >>>>>>>>> allowing it to be enabled. Then the UDC driver could tell UDC >>>>>>>>> core to >>>>>>>>> skip that endpoint on that particular platform without >>>>>>>>> interefering with >>>>>>>>> everything else. >>>>>>>>> >>>>>>>>> Of course, we still need to figure out a way to abstract the >>>>>>>>> different >>>>>>>>> dwc3 instantiations. >>>>>>>>> >>>>>>>>>> was the change which ended up upstream for the Intel tracer >>>>>>>>>> then we >>>>>>>>>> could improve the logic to avoid re-sizing those particular EPs. >>>>>>>>> The problem then, just as I mentioned in the previous paragraph, >>>>>>>>> will be >>>>>>>>> coming up with a solution that's elegant and works for all >>>>>>>>> different >>>>>>>>> instantiations of dwc3 (or musb, cdns3, etc). >>>>>>>>> >>>>>>>> Well, at least for the TX FIFO resizing logic, we'd only be >>>>>>>> needing to >>>>>>>> focus on the DWC3 implementation. >>>>>>>> >>>>>>>> You bring up another good topic that I'll eventually needing to be >>>>>>>> taking a look at, which is a nice way we can handle vendor specific >>>>>>>> endpoints and how they can co-exist with other "normal" >>>>>>>> endpoints.  We >>>>>>>> have a few special HW eps as well, which we try to maintain >>>>>>>> separately >>>>>>>> in our DWC3 vendor driver, but it isn't the most convenient, or most >>>>>>>> pretty method :). >>>>>>> Awesome, as mentioned, the endpoint feature flags were added >>>>>>> exactly to >>>>>>> allow for these vendor-specific features :-) >>>>>>> >>>>>>> I'm more than happy to help testing now that I finally got our SM8150 >>>>>>> Surface Duo device tree accepted by Bjorn ;-) >>>>>>> >>>>>>>>>> However, I'm not sure how the changes would look like in the end, >>>>>>>>>> so I >>>>>>>>>> would like to wait later down the line to include that :). >>>>>>>>> Fair enough, I agree. Can we get some more testing of $subject, >>>>>>>>> though? >>>>>>>>> Did you test $subject with upstream too? Which gadget drivers >>>>>>>>> did you >>>>>>>>> use? How did you test >>>>>>>>> >>>>>>>> The results that I included in the cover page was tested with the >>>>>>>> pure >>>>>>>> upstream kernel on our device.  Below was using the ConfigFS gadget >>>>>>>> w/ a >>>>>>>> mass storage only composition. >>>>>>>> >>>>>>>> Test Parameters: >>>>>>>>    - Platform: Qualcomm SM8150 >>>>>>>>    - bMaxBurst = 6 >>>>>>>>    - USB req size = 256kB >>>>>>>>    - Num of USB reqs = 16 >>>>>>> do you mind testing with the regular request size (16KiB) and 250 >>>>>>> requests? I think we can even do 15 bursts in that case. >>>>>>> >>>>>>>>    - USB Speed = Super-Speed >>>>>>>>    - Function Driver: Mass Storage (w/ ramdisk) >>>>>>>>    - Test Application: CrystalDiskMark >>>>>>>> >>>>>>>> Results: >>>>>>>> >>>>>>>> TXFIFO Depth = 3 max packets >>>>>>>> >>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>> ------------------------------------------- >>>>>>>> Sequential|1 GB x     | >>>>>>>> Read      |9 loops    | 193.60 >>>>>>>>             |           | 195.86 >>>>>>>>             |           | 184.77 >>>>>>>>             |           | 193.60 >>>>>>>> ------------------------------------------- >>>>>>>> >>>>>>>> TXFIFO Depth = 6 max packets >>>>>>>> >>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>> ------------------------------------------- >>>>>>>> Sequential|1 GB x     | >>>>>>>> Read      |9 loops    | 287.35 >>>>>>>>           |           | 304.94 >>>>>>>>             |           | 289.64 >>>>>>>>             |           | 293.61 >>>>>>> I remember getting close to 400MiB/sec with Intel platforms without >>>>>>> resizing FIFOs and I'm sure the FIFO size was set to 2x1024, >>>>>>> though my >>>>>>> memory could be failing. >>>>>>> >>>>>>> Then again, I never ran with CrystalDiskMark, I was using my own tool >>>>>>> (it's somewhere in github. If you care, I can look up the URL). >>>>>>> >>>>>>>> We also have internal numbers which have shown similar >>>>>>>> improvements as >>>>>>>> well.  Those are over networking/tethering interfaces, so testing >>>>>>>> IPERF >>>>>>>> loopback over TCP/UDP. >>>>>>> loopback iperf? That would skip the wire, no? >>>>>>> >>>>>>>>>> size of 2 and TX threshold of 1, this would really be not >>>>>>>>>> beneficial to >>>>>>>>>> us, because we can only change the TX threshold to 2 at max, >>>>>>>>>> and at >>>>>>>>>> least in my observations, once we have to go out to system >>>>>>>>>> memory to >>>>>>>>>> fetch the next data packet, that latency takes enough time for the >>>>>>>>>> controller to end the current burst. >>>>>>>>> What I noticed with g_mass_storage is that we can amortize the >>>>>>>>> cost of >>>>>>>>> fetching data from memory, with a deeper request queue. Whenever I >>>>>>>>> test(ed) g_mass_storage, I was doing so with 250 requests. And >>>>>>>>> that was >>>>>>>>> enough to give me very good performance. Never had to poke at TX >>>>>>>>> FIFO >>>>>>>>> resizing. Did you try something like this too? >>>>>>>>> >>>>>>>>> I feel that allocating more requests is a far simpler and more >>>>>>>>> generic >>>>>>>>> method that changing FIFO sizes :) >>>>>>>>> >>>>>>>> I wish I had a USB bus trace handy to show you, which would make it >>>>>>>> very >>>>>>>> clear how the USB bus is currently utilized with TXFIFO size 2 vs >>>>>>>> 6.  So >>>>>>>> by increasing the number of USB requests, that will help if there >>>>>>>> was a >>>>>>>> bottleneck at the SW level where the application/function driver >>>>>>>> utilizing the DWC3 was submitting data much faster than the HW was >>>>>>>> processing them. >>>>>>>> >>>>>>>> So yes, this method of increasing the # of USB reqs will definitely >>>>>>>> help >>>>>>>> with situations such as HSUSB or in SSUSB when EP bursting isn't >>>>>>>> used. >>>>>>>> The TXFIFO resize comes into play for SSUSB, which utilizes endpoint >>>>>>>> bursting. >>>>>>> Hmm, that's not what I remember. Perhaps the TRB cache size plays a >>>>>>> role >>>>>>> here too. I have clear memories of testing this very scenario of >>>>>>> bursting (using g_mass_storage at the time) because I was curious >>>>>>> about >>>>>>> it. Back then, my tests showed no difference in behavior. >>>>>>> >>>>>>> It could be nice if Heikki could test Intel parts with and without >>>>>>> your >>>>>>> changes on g_mass_storage with 250 requests. >>>>>> Andy, you have a system at hand that has the DWC3 block enabled, >>>>>> right? Can you help out here? >>>>> I'm not sure if i will have time soon, I Cc'ed to Ferry who has a few >>>>> more test cases (I have only one or two) and maybe can help. But I'll >>>>> keep this in mind. >>>> I just tested on 5.13.0-rc4 on Intel Edison (x86_64). All 5 patches >>>> apply. Switching between host/gadget works, no connections dropping, no >>>> errors in dmesg. >>>> >>>> In host mode I connect a smsc9504 eth+4p hub. In gadget mode I have >>>> composite device created from configfs with gser / eem / mass_storage / >>>> uac2. >>>> >>>> Tested with iperf3 performance in host (93.6Mbits/sec) and gadget >>>> (207Mbits/sec) mode. Compared to v5.10.41 without patches host >>>> (93.4Mbits/sec) and gadget (198Mbits/sec). >>>> >>>> Gadget seems to be a little faster with the patches, but that might also >>>> be caused  by something else, on v5.10.41 I see the bitrate bouncing >>>> between 207 and 199. >>>> >>>> I saw a mention to test iperf3 to self (loopback). 3.09 Gbits/sec. With >>>> v5.10.41 3.07Gbits/sec. Not bad for a 500MHz device. >>>> >>>> With gnome-disks I did a read access benchmark 35.4MB/s, with v5.10.41 >>>> 34.7MB/s. This might be limited by Edison's internal eMMC speed (when >>>> booting U-Boot reads the kernel with 21.4 MiB/s). >>>> >>> Hi Ferry, >>> >>> Thanks for the testing.  Just to double check, did you also enable the >>> property, which enabled the TXFIFO resize feature on the platform?  For >>> example, for the QCOM SM8150 platform, we're adding the following to our >>> device tree node: >>> >>> tx-fifo-resize >>> >>> If not, then your results at least confirms that w/o the property >>> present, the changes won't break anything :).  Thanks again for the >>> initial testing! I applied the patch now to 5.13.0-rc5 + the following: --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -124,6 +124,7 @@ static const struct property_entry dwc3_pci_mrfld_properties[] = {      PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),      PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),      PROPERTY_ENTRY_BOOL("snps,usb2-gadget-lpm-disable"), +    PROPERTY_ENTRY_BOOL("tx-fifo-resize"),      PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),      {}  };  and when switching to gadget mode unfortunately received the following oops: BUG: unable to handle page fault for address: 00000000202043f2 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 617 Comm: conf-gadget.sh Not tainted 5.13.0-rc5-edison-acpi-standard #1 Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 RIP: 0010:dwc3_gadget_check_config+0x33/0x80 Code: 59 04 00 00 04 74 61 48 c1 ee 10 48 89 f7 f3 48 0f b8 c7 48 89 c7 39 81 60 04 00 00 7d 4a 89 81 60 04 00 00 8b 81 08 04 00 00 <81> b8 e8 03 00 00 32 33 00 00 0f b6 b0 09 04 00 00 75 0d 8b 80 20 RSP: 0018:ffffb5550038fda0 EFLAGS: 00010297 RAX: 000000002020400a RBX: ffffa04502627348 RCX: ffffa04507354028 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000004 RBP: ffffa04508ac0550 R08: ffffa04503a75b2c R09: 0000000000000000 R10: 0000000000000216 R11: 000000000002eba0 R12: ffffa04508ac0550 R13: dead000000000100 R14: ffffa04508ac0600 R15: ffffa04508ac0520 FS:  00007f7471e2f740(0000) GS:ffffa0453e200000(0000) knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000202043f2 CR3: 0000000003f38000 CR4: 00000000001006f0 Call Trace:  configfs_composite_bind+0x2f4/0x430 [libcomposite]  udc_bind_to_driver+0x64/0x180  usb_gadget_probe_driver+0x114/0x150  gadget_dev_desc_UDC_store+0xbc/0x130 [libcomposite]  configfs_write_file+0xcd/0x140  vfs_write+0xbb/0x250  ksys_write+0x5a/0xd0  do_syscall_64+0x40/0x80  entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7471f1ff53 Code: 8b 15 21 cf 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 RSP: 002b:00007fffa3dcd328 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7471f1ff53 RDX: 000000000000000c RSI: 00005614d615a770 RDI: 0000000000000001 RBP: 00005614d615a770 R08: 000000000000000a R09: 00007f7471fb20c0 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007f7471fee520 R14: 000000000000000c R15: 00007f7471fee720 Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial libcomposite rfcomm iptable_nat bnep snd_sof_nocodec spi_pxa2xx_platform dw_dmac smsc snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt snd_sof_intel_ipc snd_sof_acpi smsc95xx snd_sof pwm_lpss_pci pwm_lpss snd_sof_xtensa_dsp snd_intel_dspcfg snd_soc_acpi_intel_match snd_soc_acpi dw_dmac_pci intel_mrfld_pwrbtn intel_mrfld_adc dw_dmac_core spi_pxa2xx_pci brcmfmac brcmutil leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer kfifo_buf ledtrig_timer ledtrig_heartbeat mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class intel_soc_pmic_mrfld mmc_core btrfs libcrc32c xor zstd_compress zlib_deflate raid6_pq CR2: 00000000202043f2 ---[ end trace 5c11fe50dca92ad4 ]--- >> No I didn't. Afaik we don't have a devicetree property to set. >> >> But I'd be happy to test that as well. But where to set the property? >> >> dwc3_pci_mrfld_properties[] in dwc3-pci? >> > Hi Ferry, > > Not too sure which DWC3 driver is used for the Intel platform, but I > believe that should be the one. (if that's what is normally used) We'd > just need to add an entry w/ the below: > > PROPERTY_ENTRY_BOOL("tx-fifo-resize") > > Thanks > Wesley Cheng > >>> Thanks >>> Wesley Cheng >>> >>>>>>>> Now with endpoint bursting, if the function notifies the host that >>>>>>>> bursting is supported, when the host sends the ACK for the Data >>>>>>>> Packet, >>>>>>>> it should have a NumP value equal to the bMaxBurst reported in >>>>>>>> the EP >>>>>>> Yes and no. Looking back at the history, we used to configure NUMP >>>>>>> based >>>>>>> on bMaxBurst, but it was changed later in commit >>>>>>> 4e99472bc10bda9906526d725ff6d5f27b4ddca1 by yours truly because of a >>>>>>> problem reported by John Youn. >>>>>>> >>>>>>> And now we've come full circle. Because even if I believe more >>>>>>> requests >>>>>>> are enough for bursting, NUMP is limited by the RxFIFO size. This >>>>>>> ends >>>>>>> up supporting your claim that we need RxFIFO resizing if we want to >>>>>>> squeeze more throughput out of the controller. >>>>>>> >>>>>>> However, note that this is about RxFIFO size, not TxFIFO size. In >>>>>>> fact, >>>>>>> looking at Table 8-13 of USB 3.1 r1.0, we read the following about >>>>>>> NumP >>>>>>> (emphasis is mine): >>>>>>> >>>>>>>         "Number of Packets (NumP). This field is used to indicate the >>>>>>>         number of Data Packet buffers that the **receiver** can >>>>>>>         accept. The value in this field shall be less than or >>>>>>> equal to >>>>>>>         the maximum burst size supported by the endpoint as >>>>>>> determined >>>>>>>         by the value in the bMaxBurst field in the Endpoint Companion >>>>>>>         Descriptor (refer to Section 9.6.7)." >>>>>>> >>>>>>> So, NumP is for the receiver, not the transmitter. Could you clarify >>>>>>> what you mean here? >>>>>>> >>>>>>> /me keeps reading >>>>>>> >>>>>>> Hmm, table 8-15 tries to clarify: >>>>>>> >>>>>>>         "Number of Packets (NumP). >>>>>>> >>>>>>>         For an OUT endpoint, refer to Table 8-13 for the >>>>>>> description of >>>>>>>         this field. >>>>>>> >>>>>>>         For an IN endpoint this field is set by the endpoint to the >>>>>>>         number of packets it can transmit when the host resumes >>>>>>>         transactions to it. This field shall not have a value greater >>>>>>>         than the maximum burst size supported by the endpoint as >>>>>>>         indicated by the value in the bMaxBurst field in the Endpoint >>>>>>>         Companion Descriptor. Note that the value reported in this >>>>>>> field >>>>>>>         may be treated by the host as informative only." >>>>>>> >>>>>>> However, if I remember correctly (please verify dwc3 databook), >>>>>>> NUMP in >>>>>>> DCFG was only for receive buffers. Thin, John, how does dwc3 compute >>>>>>> NumP for TX/IN endpoints? Is that computed as a function of >>>>>>> DCFG.NUMP or >>>>>>> TxFIFO size? >>>>>>> >>>>>>>> desc.  If we have a TXFIFO size of 2, then normally what I have >>>>>>>> seen is >>>>>>>> that after 2 data packets, the device issues a NRDY.  So then we'd >>>>>>>> need >>>>>>>> to send an ERDY once data is available within the FIFO, and the same >>>>>>>> sequence happens until the USB request is complete.  With this >>>>>>>> constant >>>>>>>> NRDY/ERDY handshake going on, you actually see that the bus is under >>>>>>>> utilized.  When we increase an EP's FIFO size, then you'll see >>>>>>>> constant >>>>>>>> bursts for a request, until the request is done, or if the host >>>>>>>> runs out >>>>>>>> of RXFIFO. (ie no interruption [on the USB protocol level] during >>>>>>>> USB >>>>>>>> request data transfer) >>>>>>> Unfortunately I don't have access to a USB sniffer anymore :-( >>>>>>> >>>>>>>>>>>>> Good points. >>>>>>>>>>>>> >>>>>>>>>>>>> Wesley, what kind of testing have you done on this on >>>>>>>>>>>>> different devices? >>>>>>>>>>>>> >>>>>>>>>>>> As mentioned above, these changes are currently present on end >>>>>>>>>>>> user >>>>>>>>>>>> devices for the past few years, so its been through a lot of >>>>>>>>>>>> testing :). >>>>>>>>>>> all with the same gadget driver. Also, who uses USB on android >>>>>>>>>>> devices >>>>>>>>>>> these days? Most of the data transfer goes via WiFi or >>>>>>>>>>> Bluetooth, anyway >>>>>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>> I guess only developers are using USB during development to >>>>>>>>>>> flash dev >>>>>>>>>>> images heh. >>>>>>>>>>> >>>>>>>>>> I used to be a customer facing engineer, so honestly I did see >>>>>>>>>> some >>>>>>>>>> really interesting and crazy designs.  Again, we do have >>>>>>>>>> non-Android >>>>>>>>>> products that use the same code, and it has been working in there >>>>>>>>>> for a >>>>>>>>>> few years as well.  The TXFIFO sizing really has helped with >>>>>>>>>> multimedia >>>>>>>>>> use cases, which use isoc endpoints, since esp. in those lower >>>>>>>>>> end CPU >>>>>>>>>> chips where latencies across the system are much larger, and a >>>>>>>>>> missed >>>>>>>>>> ISOC interval leads to a pop in your ear. >>>>>>>>> This is good background information. Thanks for bringing this >>>>>>>>> up. Admitedly, we still have ISOC issues with dwc3. I'm >>>>>>>>> interested in >>>>>>>>> knowing if a deeper request queue would also help here. >>>>>>>>> >>>>>>>>> Remember dwc3 can accomodate 255 requests + link for each >>>>>>>>> endpoint. If >>>>>>>>> our gadget driver uses a low number of requests, we're never really >>>>>>>>> using the TRB ring in our benefit. >>>>>>>>> >>>>>>>> We're actually using both a deeper USB request queue + TX fifo >>>>>>>> resizing. :). >>>>>>> okay, great. Let's see what John and/or Thinh respond WRT dwc3 TX >>>>>>> Burst >>>>>>> behavior. >>>>>> -- >>>>>> heikki