Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1426843AbdD2OZX (ORCPT ); Sat, 29 Apr 2017 10:25:23 -0400 Received: from mail-qk0-f177.google.com ([209.85.220.177]:35123 "EHLO mail-qk0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1426823AbdD2OZL (ORCPT ); Sat, 29 Apr 2017 10:25:11 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170427102033.5500-1-matwey@sai.msu.ru> <20170427153506.GA25403@uda0271908> <20170427171345.GB25403@uda0271908> <20170428115824.GC25403@uda0271908> <20170428124328.GD25403@uda0271908> <20170428133011.GE25403@uda0271908> From: "Matwey V. Kornilov" Date: Sat, 29 Apr 2017 17:24:50 +0300 X-Google-Sender-Auth: 8q4ywDo0bgg-5RebKhw6_K5egbU Message-ID: Subject: Re: [PATCH] usb: musb: musb_host: Introduce postponed URB giveback To: Bin Liu , "Matwey V. Kornilov" , gregkh@linuxfoundation.org, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11329 Lines: 238 2017-04-29 11:16 GMT+03:00 Matwey V. Kornilov : > 2017-04-28 16:30 GMT+03:00 Bin Liu : >> On Fri, Apr 28, 2017 at 04:15:09PM +0300, Matwey V. Kornilov wrote: >>> 2017-04-28 15:43 GMT+03:00 Bin Liu : >>> > On Fri, Apr 28, 2017 at 03:13:55PM +0300, Matwey V. Kornilov wrote: >>> >> which i >>> >> >>> >> 2017-04-28 14:58 GMT+03:00 Bin Liu : >>> >> > On Fri, Apr 28, 2017 at 10:04:30AM +0300, Matwey V. Kornilov wrote: >>> >> >> 2017-04-27 20:13 GMT+03:00 Bin Liu : >>> >> >> > On Thu, Apr 27, 2017 at 07:26:31PM +0300, Matwey V. Kornilov wrote: >>> >> >> >> 2017-04-27 18:35 GMT+03:00 Bin Liu : >>> >> >> >> > Hi Matwey, >>> >> >> >> > >>> >> >> >> > On Thu, Apr 27, 2017 at 01:20:33PM +0300, Matwey V. Kornilov wrote: >>> >> >> >> >> This commit changes the order of actions undertaken in >>> >> >> >> >> musb_advance_schedule() in order to overcome issue with broken >>> >> >> >> >> isochronous transfer [1]. >>> >> >> >> >> >>> >> >> >> >> There is no harm to split musb_giveback into two pieces. The first >>> >> >> >> >> unlinks finished urb, the second givebacks it. The issue here that >>> >> >> >> >> givebacking may be quite time-consuming due to urb->complete() call. >>> >> >> >> >> As it happens in case of pwc-driven web cameras. It may take about 0.5 >>> >> >> >> >> ms to call __musb_giveback() that calls urb->callback() internally. >>> >> >> >> >> Under specific circumstances setting MUSB_RXCSR_H_REQPKT in subsequent >>> >> >> >> >> musb_start_urb() for the next urb will be too late to produce physical >>> >> >> >> >> IN packet. Since auto req is not used by this module the exchange >>> >> >> >> >> would be as the following: >>> >> >> >> >> >>> >> >> >> >> [ ] 7.220456 d= 0.000997 [182 + 3.667] [ 3] IN : 4.5 >>> >> >> >> >> [ T ] 7.220459 d= 0.000003 [182 + 7.000] [800] DATA0: [skipped] >>> >> >> >> >> [ ] 7.222456 d= 0.001997 [184 + 3.667] [ 3] IN : 4.5 >>> >> >> >> >> [ ] 7.222459 d= 0.000003 [184 + 7.000] [ 3] DATA0: 00 00 >>> >> >> >> >> >>> >> >> >> >> It is known that missed IN in isochronous mode makes some >>> >> >> >> >> perepherial broken. For instance, pwc-driven or uvc-driven >>> >> >> >> >> web cameras. >>> >> >> >> >> In order to workaround this issue we postpone calling >>> >> >> >> >> urb->callback() after setting MUSB_RXCSR_H_REQPKT for the >>> >> >> >> >> next urb if there is the next urb pending in queue. >>> >> >> >> >> >>> >> >> >> >> [1] https://www.spinics.net/lists/linux-usb/msg145747.html >>> >> >> >> >> >>> >> >> >> >> Fixes: f551e1352983 ("Revert "usb: musb: musb_host: Enable HCD_BH flag to handle urb return in bottom half"") >>> >> >> >> >> Signed-off-by: Matwey V. Kornilov >>> >> >> >> > >>> >> >> >> > Thanks for the effort of working on this long standing issue, I know you >>> >> >> >> > have spent alot of time on it, but what I am thinking is instead of >>> >> >> >> > workaround the problem we need to understand the root cause - why >>> >> >> >> > uvc-video takes longer to exec the urb callback, why only am335x >>> >> >> >> > reported this issue. This is on my backlog, just seems never got time >>> >> >> >> > for it... >>> >> >> >> >>> >> >> >> Have you tried other SoCs with Invetra MUSB? >>> >> >> > >>> >> >> > That is the plan, I got an A20 board, but haven't bring it up yet. >>> >> >> > >>> >> >> >> >>> >> >> >> > >>> >> >> >> > Ideally MUSB driver should be just using HCD_BH flag and let the core to >>> >> >> >> > handle the urb callback, regardless the usb transfer types. >>> >> >> >> >>> >> >> >> I think the only reason why everything worked before with HCD_BH is >>> >> >> >> that execution of urb->callback() was placed after musb_start(). The >>> >> >> >> order of operations matters. >>> >> >> >> However, you said that something was also wrong with HCD_BH and other >>> >> >> >> peripherals. >>> >> >> > >>> >> >> > HCD_BH flag cause some issues which are docummented in the commit log of >>> >> >> > f551e1352983. >>> >> >> > But even with HCD_BH flag, it didn't work for uvc webcams, it still misses >>> >> >> > IN tokens. It might helps pwc webcams though. >>> >> >> >>> >> >> pwc webcams work with HCD_BH fine indeed. >>> >> > >>> >> > yeah, this is what you told long time ago. >>> >> > >>> >> >> >>> >> >> > >>> >> >> >> > The MUSB drivers are already messy and complicated enough for >>> >> >> >> > maintenance, I'd like to understand the root cause of the delay first >>> >> >> >> > before decide how to solve the issue. >>> >> >> >> > >>> >> >> >> >>> >> >> >> I feel from playing with OpenVizsla that REQPKT should be set well in >>> >> >> >> advance. So, time window to set the flag is actually smaller than 1 >>> >> >> >> ms. >>> >> >> >> urb->callback() is never takes longer than 0.4 ms for pwc driver, but >>> >> >> >> INs are skipped. >>> >> >> > >>> >> >> > Setting REQPKT in advance might be the solution, but I'd like to >>> >> >> > understand why only Isoch transfer shows such issue, and why only am335x >>> >> >> > reports this issue. The later concerns me more if this would be a >>> >> >> > system issue not only in usb module. >>> >> >> >>> >> >> 0.4 ms is about 100000 CPU cycles given that CPU is running at 275Mhz >>> >> >> (which is the lowest cpufreq). Long time ago, I run pwc webcam with >>> >> >> SIS Vortex86 at 200Mhz It worked fine. I would not say that it is >>> >> >> extraordinary value. >>> >> >> Do you think that somewhere CPU cycles are wasted globally for some reason? >>> >> > >>> >> > Depends on how to interpret 'wasted', my understanding the issue is the >>> >> > core urb giveback holds a spinlock and in uvc case the giveback takes >>> >> > longer to finish (I guess the same in pwc case), so the musb driver >>> >> > doesn't get a chance to re-program the controller on time, which causes >>> >> > missing issuingIN tokens. >>> >> > >>> >> > The questions are, why uvc takes longer to run urb giveback (which holds >>> >> > a spinlock), and is it am335x specific issue... >>> >> >>> >> It is not clear how does it hold a spinlock on a BeagleBone which is >>> >> single-core system. >>> > >>> > Sorry, I have to take it back, urb giveback doesn't hold a spinlock, but >>> > disables irq. It has been a while (a year?) since the last time I looked >>> > this issue. Please see the call below flow. >>> > >>> > musb_giveback() --> >>> > usb_hcd_giveback_urb() --> >>> > __usb_hcd_giveback_urb() --> # it gets here regardless HCD_BH flag >>> > 1765 local_irq_save(flags); >>> > 1766 urb->complete(urb); >>> > 1767 local_irq_restore(flags); >>> > >>> > so musb driver only gets a chance to re-program the controller after >>> > line 1766 returns, which is the urb callback in the class driver (uvc in >>> > this case). If urb->complete() takes too long, the controller will miss >>> > the IN tokens. >>> > >>> > HCD_BH flag could help the situation only if urb->complete() doesn't >>> > take that long. >>> >>> Sure, I think that the question is why urb->complete() itself takes so >>> long only (?) at am335x. >> >> This is what I meant. I haven't checked other platforms yet, but at >> least it seems the issue only reported on am335x as far as I am aware. > > Attached here are ftraces for am335x and x86 respectively. > pwc_isoc_handler() normally spends most of the time in usb_submit_urb() at x86. > same for am335x, expect that from some point pwc_isoc_handler() starts > spending much time for something else. Believe you or not, it takes 30usec to mempcy 950 bytes in pwc_isoc_handler. Given urb has 10 packages, it takes 300usec only just to copy memory from urb to internal data structures. -0 [000] dnh1 53.428680: pwc_isoc_handler_pre_compact: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428683: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428713: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428715: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428743: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428746: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428775: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428777: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428807: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428809: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428838: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428840: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428869: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428871: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428900: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428902: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428932: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428934: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428963: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428966: pwc_isoc_handler_memcpy_begin: urb=cf816e00 len=956 -0 [000] dnh1 53.428995: pwc_isoc_handler_memcpy_end: urb=cf816e00 status=0 nop=10 -0 [000] dnh1 53.428997: pwc_isoc_handler_pre_submit_urb: urb=cf816e00 status=0 nop=10 > >> >>> >>> > >>> >> I mean if it is waiting for blocked lock it will be waiting forever, >>> >> because it is in interrupt context on single core system. >>> > >>> > Hope my message above explains it. >>> > >>> >> Here it is waiting for finite time amount. It is also quite unlikely >>> >> that spinlocks are implemented inefficiently for arm architecture. >>> >> So, I tested with CONFIG_SMP=y which is default in my distro kernel. >>> >> Do you think it is worth to try CONFIG_SMP=n or have you already tried >>> >> UP kernel configuration? >>> > >>> > CONFIG_SMP is irrelevant here. BTY, because of my laziness, I use >>> > CONFIG_SMP in most times, if not all. >>> > >>> > Regards, >>> > -Bin. >>> > >>> >>> >>> >>> -- >>> With best regards, >>> Matwey V. Kornilov. >>> Sternberg Astronomical Institute, Lomonosov Moscow State University, Russia >>> 119234, Moscow, Universitetsky pr-k 13, +7 (495) 9392382 >> > > > > -- > With best regards, > Matwey V. Kornilov. > Sternberg Astronomical Institute, Lomonosov Moscow State University, Russia > 119234, Moscow, Universitetsky pr-k 13, +7 (495) 9392382 -- With best regards, Matwey V. Kornilov. Sternberg Astronomical Institute, Lomonosov Moscow State University, Russia 119234, Moscow, Universitetsky pr-k 13, +7 (495) 9392382