Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754852AbcDKMhf (ORCPT ); Mon, 11 Apr 2016 08:37:35 -0400 Received: from bear.ext.ti.com ([192.94.94.41]:48514 "EHLO bear.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753517AbcDKMhc (ORCPT ); Mon, 11 Apr 2016 08:37:32 -0400 Subject: Re: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit() To: Felipe Balbi , John Youn References: <1458133551-3071-1-git-send-email-rogerq@ti.com> <1458133551-3071-2-git-send-email-rogerq@ti.com> <87r3fah8h3.fsf@intel.com> <87lh5ih6hs.fsf@intel.com> <56EC5452.8030509@synopsys.com> <56F03FD7.30202@synopsys.com> <874mbzyq1l.fsf@intel.com> <56F3376E.8050305@synopsys.com> <87d1qkxtbb.fsf@intel.com> <56FC1E92.60403@synopsys.com> <56FD2E13.2010903@ti.com> <87r3eqg1us.fsf@intel.com> <57021CB9.3090803@ti.com> <87inzx7q0l.fsf@intel.com> CC: "nsekhar@ti.com" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Shankar, Abhishek" , "B, Ravi" From: Roger Quadros Message-ID: <570B9A84.1020607@ti.com> Date: Mon, 11 Apr 2016 15:37:24 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <87inzx7q0l.fsf@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10139 Lines: 227 On 04/04/16 11:10, Felipe Balbi wrote: > > Hi, > > Roger Quadros writes: >>>>>>>>>>>>> We will need this function for a workaround. >>>>>>>>>>>>> The function issues a softreset only to the device >>>>>>>>>>>>> controller and performs minimal re-initialization >>>>>>>>>>>>> so that the device controller can be usable. >>>>>>>>>>>>> >>>>>>>>>>>>> As some code is similar to dwc3_core_init() take out >>>>>>>>>>>>> common code into dwc3_get_gctl_quirks(). >>>>>>>>>>>>> >>>>>>>>>>>>> We add a new member (prtcap_mode) to struct dwc3 to >>>>>>>>>>>>> keep track of the current mode in the PRTCAPDIR register. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Roger Quadros >>>>>>>>>>>> >>>>>>>>>>>> I must say, I don't like this at all :-p There's ONE known silicon which >>>>>>>>>>>> needs this because of a poor silicon integration which took an IP with a >>>>>>>>>>>> known erratum where it can't be made to work on lower speeds and STILL >>>>>>>>>>>> was integrated without a superspeed PHY. >>>>>>>>>>>> >>>>>>>>>>>> There's a reason why I never tried to push this upstream myself ;-) >>>>>>>>>>>> >>>>>>>>>>>> I'm really thinking we might be better off adding a quirk flag to skip >>>>>>>>>>>> the metastability workaround and allow this ONE silicon to set the >>>>>>>>>>>> controller to lower speed. >>>>>>>>>>>> >>>>>>>>>>>> John, can you check with your colleagues if we would ever fall into >>>>>>>>>>>> STAR#9000525659 if we set maximum speed to high speed during driver >>>>>>>>>>>> probe and never touch it again ? I would assume we don't really fall >>>>>>>>>>>> into the metastability workaround, right ? We're not doing any sort of >>>>>>>>>>>> PM for dwc3... >>>>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Felipe, >>>>>>>>> >>>>>>>>> Do you mean to keep DCFG.speed to SS and set dwc->maximum_speed to HS? >>>>>>>>> I don't see an issue with that as long as we always ignore >>>>>>>>> dwc->maximum_speed when programming DCFG.speed for all affected >>>>>>>>> versions of the core. As long as the DCFG.speed = SS, you should not >>>>>>>>> hit the STAR. >>>>>>>> >>>>>>>> I actually mean changing DCFG.speed during driver probe and never >>>>>>>> touching it again. Would that still cause problems ? >>>>>>>> >>>>>>> >>>>>>> In that case I'm not sure. The engineer who would know is off until >>>>>>> next week so I'll get back to you as soon as I can talk to him about >>>>>>> it. >>>>>> >>>>> >>>>> So the engineers said that this issue can occur while set to HS and >>>>> the run/stop bit is changed so it seems that won't work. >>>> >>>> Thanks John. >>>> >>>> Felipe, >>>> >>>> any suggestion how we can fix this upstream? >>> >>> no idea, I don't have a lot of memory about this problem. I really don't >>> remember the details about this, is there an openly available errata >>> document which I could read ? /me goes search for it. >>> >>> I found [1] which tells me, the following: >>> >>> >>> | i819 | A Device Control Bit Meta-Stability for USB3.0 Controller in USB2.0 Mode | >>> |-------------+----------------------------------------------------------------------------| >>> | Criticality | Medium | >>> | | | >>> | Descritiion | When USB3.0 controller core is programmed to be a USB 2.0-only device | >>> | | hardware meta-stability on USB_DCTL[31]RUNSTOP bit causing the core to | >>> | | attempt high speed as well as SuperSpeed connection or completely miss | >>> | | the attach request. | >>> | | | >>> | Workaround | If the requirement is to always function in USB 2.0 mode, there is no | >>> | | workaround. | >>> | | Otherwise, you can always program the USB controller core to be SuperSpeed | >>> | | 3.0 capable (USB_DCFG[2:0]DEVSPD = 0x4) | >>> | | | >>> | Revisions | SR 1.1, 2.0 | >>> | Impacted | | >>> |-------------+----------------------------------------------------------------------------| >>> >>> So, TI's own documentation says that there is _no_ workaround. My >> >> We are working on updating that document. Apparently Synopsis has suggested this workaround. >> pasting verbatim >> >> " >> - As last step of device configuration we set the RUNSTOP bit in DCTL. >> >> - Once we set the RUNSTOP bit, we need to monitor GDBGLTSSM for 100 ms until one of the two below happens:. >> >> o We see the GDBGLTSSM.LTDB_LINK_STATE changing from 4 >> >> o We receive the USB 2.0 reset interrupt. >> >> If none of above happens then we can stop monitoring it. >> >> - If state change from 4 occurs issue a SoftReset thru DCTL.CSftRst and reconfigure Device. This time it is guaranteed that no metastability will occur so no need to do the 100ms timeout. >> " > > I don't have this text anywhere so I don't know. Is this something TI > came up with or Synopsys ? Unless I can see a document (preferrably from > Synopsys) stating this, I can't really accept $subject. OK. I'll try to find out if there is an official document about this. > > Another question is: if all it takes is an extra SoftReset, why don't we > just reset it during probe() if max_speed < SUPER and we're running on > rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ? The issue might happen on any Run/Stop transition so not sure if doing it SoftReset just at probe fixes anything. On DRA7x it is rev 2.02a. > >>> question is, then: How are you sure that resetting the device actually >>> solves the issue ? Did you really hit the metastability problem and >>> noted that it works after a soft-reset ? How did you verify that >> >> I don't know if it solves the issue or not. It was suggested by >> Synopsis to TI's silicon team. > > now that's a bummer ;-) > >> I never hit the metastability problem detection condition in my >> overnight tests (i.e. LTDB_LINK_STATE != 4). > > overnight is not enough. You need to keep this running for weeks. how many weeks is acceptable for you? I can run for that long, no problem. And what if the issue doesn't happen in that time frame, would you still consider this case? > >> I have verified that things work after a soft-reset by faking that the >> error happens. > > but if you never hit the actual failure, you have verified that it works > _without_ the workaround. I mean, if you can't be sure RUN/STOP went > metastable, you can't really be sure you're working around the Erratum. > >>> Run/Stop was in a metastable state, considering that Run/Stop signal is >>> not visible outside the die ? >> >> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to >> detect that the issue occurred. > > this doesn't prove anything. This just means that your 100ms timer > expired. Unless you can verify that Run/Stop is in metastability, you > cannot be sure this workaround works. > > Did anybody run silicon simulation to verify this case ? It's really the > only way to be sure. AFAIK this wasn't reproducible during silicon simulation either. > >>> It seems to me that resetting the IP is just as "dangerous" as setting >>> the IP to High-speed in the first place. No ? >> >> The soft-reset is just a recovery mechanism if that error is ever hit. > > but you don't know if that's a *proper* recovery mechanism because you > never even *hit* the error. > >> Putting the controller in reset state means it is in a known >> state. Why do you say it would be dangerous? > > Because you can't predict the systems' behavior. If the flip-flop didn't > have time to settle into 0 or 1 state, you don't know what the > combinatorial part of the IP will do with that bogus value. It's truly > unpredictable. You also cannot know, for sure, that a SoftReset will be > enough to bring that flip-flop out of metastability. I'm not an expert in this area and can only follow the advice the Silicon team gives. > >> The original workaround i.e. setting the High-speed instance to >> Super-speed to avoid this errata is causing another side >> effect. i.e. erratic interrupts happen and more than 2 seconds delay > > this should have been an expected side-effect when you design a > SuperSpeed controller without a SuperSpeed PHY and don't properly > terminate inputs. What you have is a floating PIPE3 interface not > properly terminated and capturing random noise (basically acting as a > very poor antenna inside your die). Of course the IP will go bonkers and > give you "erratic error" interrupts. It has no idea what the hell this > "PHY" on the PIPE3 interface is doing. We know that. The damage is already done. :) > >> to enumerations. This problem is more serious and so we have to keep >> the controller in High-speed and tackle the meta-stability condition >> if it happens. > > you have to tackle the meta-stability, sure, but we need guarantee that > $subject IS indeed tackling that problem. Unless there's proof that this > really solves the meta-stability issue, I won't take $subject. Sorry > dude, but I don't want regressions elsewhere because of a badly > validated patch. I understand. I will see if someone from TI can provide me official documentation about the workaround. > > Bottomline, it's not enough to _state_ that it solves the problem. You > need to first *trigger* the issue without the workaround, then apply > workaround and trigger it again. Then, and only then, you can be certain > you're fixing the problem. > > After that, we will look into how to make sure this has no impact to > other users. > OK, Thanks. cheers, -roger