Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7242036rwr; Tue, 25 Apr 2023 10:01:23 -0700 (PDT) X-Google-Smtp-Source: AKy350alHlJ9D5tnsjWj20HWpdDU9nCw8BI9tFFlfFLuIF28SrDHfvbb4gnICT/cWEaE02130bBx X-Received: by 2002:a05:6a00:22ca:b0:634:4dea:81c6 with SMTP id f10-20020a056a0022ca00b006344dea81c6mr25966764pfj.0.1682442082490; Tue, 25 Apr 2023 10:01:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682442082; cv=none; d=google.com; s=arc-20160816; b=jbUGuCJrfQe1qCg2/yrM+6jEpts0c9wmFYCdxTrBvQeHEA4ZQWsGFDRTJuNe5G90DQ aKnD5LcPm21g6rhE71XQkWpTI4da+brRV0eZR5O9ARZRxKT+ZJjjX6sX6KR5lw09QnGo nL4V6BK+8ChY2Vol44L9OrMvRrZ7YpDaScXXGZT4mVDqNOygJv3oWbn1ZRoheWzHaOq1 O7kYGQqOQ6gUBL7F83k6s0S3V1nHQkhpzezmn/qCjOeyZ+mWSeNbYwfWo0okRVS+dzdb W6FhEusXEwSLPcGWetPSfh3kE5qzXQcRj/Z393I5ZrVLYkNQI7htcIVJvwHOv7FAGUBL PDBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=vIzM4wMULdSE7yQDpAk7izMtRgAMW3ZgoxM+xzIC9wc=; b=mUh8shH81do4L/8eI6g5fuzEGjIAHzFNeTPzekrYZbC7AJEGHIydGZp2P3CPBVkyDd B6OSa9PATw0AcDUXKk4C7GqO75KRaf71krHGq2NaVrf0jOBImvR9lvLkb1M8RYd7cqxQ +80dkxky/4TQeNqIr8CBuoaBJPaNKizhvdp/apw3xRPt/i21CTPaCwLqR7ajom37CJ1d 2NFfECIynvUPY+bomvZOk+BkyRqoWmkm26ZGogJ2STjC8MSc7xXLs9o/R6mgSYomlGbz pjg4O8kIIc+LJv5oRdhgJysDs9cURtGO1Zg9fpfCv48ToXfJ3TxStTjSYGiLWqwEjXRm 9Osw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=UEWy98v7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x26-20020aa79a5a000000b0063b78674625si13817748pfj.327.2023.04.25.10.01.08; Tue, 25 Apr 2023 10:01:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=UEWy98v7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234878AbjDYQ7S (ORCPT + 99 others); Tue, 25 Apr 2023 12:59:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234866AbjDYQ7P (ORCPT ); Tue, 25 Apr 2023 12:59:15 -0400 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 772E9194; Tue, 25 Apr 2023 09:59:08 -0700 (PDT) Received: from pps.filterd (m0279871.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33PBaMC9015340; Tue, 25 Apr 2023 16:58:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=message-id : date : mime-version : subject : from : to : cc : references : in-reply-to : content-type : content-transfer-encoding; s=qcppdkim1; bh=vIzM4wMULdSE7yQDpAk7izMtRgAMW3ZgoxM+xzIC9wc=; b=UEWy98v7JAKa44S6rtT2ukGitUB+g2l6TTuvPrVHCnHp7Qp76f9nIwcHVG0RaoOi1t4O sSZ3hyvR97wx5jSmNdu1JMqHGXtcIjV/lZOMjzIctaUZSamFBtFa+W+YRwEL633vHyT4 YLakJkZ0kY9r3QGQoetcOOmyOCnZi1je9rh8BR3/y3ASlDILcXzvQl4mZC/mU5mEN8zs VbSP7+NsJdT1yiCdGMvSSjn4Jaqw/P27uZ+r+RN8Z3FsmF+UrpquiIXxjB4PmCjnmzFF EJusIfavCuziW8ta8nS3pgDZ8XPo8YTOC6VAoE4+jLMVhSfUZ0Pp4TjoU16m9tzlY36W nw== Received: from nalasppmta01.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3q5qv6uyc2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Apr 2023 16:58:50 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA01.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 33PGwnvJ013999 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Apr 2023 16:58:49 GMT Received: from [10.216.20.150] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.42; Tue, 25 Apr 2023 09:58:44 -0700 Message-ID: <55cfaedc-bd33-610c-143e-0342a14a41b6@quicinc.com> Date: Tue, 25 Apr 2023 22:28:41 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH v2 1/2] usb: dwc3: gadget: Bail out in pullup if soft reset timeout happens Content-Language: en-US From: Krishna Kurapati PSSNV To: Thinh Nguyen CC: Greg Kroah-Hartman , Alan Stern , Geert Uytterhoeven , Colin Ian King , Jiantao Zhang , "Rafael J . Wysocki" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "quic_ppratap@quicinc.com" , "quic_wcheng@quicinc.com" , "quic_jackp@quicinc.com" , "quic_ugoswami@quicinc.com" References: <20230328160756.30520-1-quic_kriskura@quicinc.com> <20230328160756.30520-2-quic_kriskura@quicinc.com> <20230328212048.rpm4ly265etahwm3@synopsys.com> <0aa1caae-ab09-2ef9-0ec8-3dd01a3f3af0@quicinc.com> <20230330001003.rwzo4n2hl6i22vh3@synopsys.com> <4f473d08-156a-622f-2ef2-ad11caa7ccda@quicinc.com> <20230403234937.sotn3pznu4winlkw@synopsys.com> <20230404214347.mi6rmksr7pym74rp@synopsys.com> <20230406004510.yxoxby2rwkalpqlh@synopsys.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: XiNzZBeQbPlhXbqU2Zs7O5_g_uc_aoV3 X-Proofpoint-ORIG-GUID: XiNzZBeQbPlhXbqU2Zs7O5_g_uc_aoV3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-25_08,2023-04-25_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 adultscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 bulkscore=0 mlxscore=0 suspectscore=0 malwarescore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304250153 X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote: > > > On 4/6/2023 6:15 AM, Thinh Nguyen wrote: >> On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: >>> >>> >>> On 4/5/2023 3:13 AM, Thinh Nguyen wrote: >>>> On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: >>>>> >>>>> >>>>> On 4/4/2023 5:19 AM, Thinh Nguyen wrote: >>>>>> On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >>>>>>> >>>>>>> >>>>>>> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>>>>>>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>>>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>>>>>>> buffers and starting gadget as the writes to these registers >>>>>>>>>>> may not reflect when in reset and setting the run stop bit >>>>>>>>>>> can lead the controller to access wrong event buffer address >>>>>>>>>>> resulting in a crash. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Krishna Kurapati >>>>>>>>>>> --- >>>>>>>>>>>       drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>>>>>>>       1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c >>>>>>>>>>> b/drivers/usb/dwc3/gadget.c >>>>>>>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>>>>>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct >>>>>>>>>>> usb_gadget *g, int is_on) >>>>>>>>>>>                * device-initiated disconnect requires a core >>>>>>>>>>> soft reset >>>>>>>>>>>                * (DCTL.CSftRst) before enabling the run/stop >>>>>>>>>>> bit. >>>>>>>>>>>                */ >>>>>>>>>>> -        dwc3_core_soft_reset(dwc); >>>>>>>>>>> +        ret = dwc3_core_soft_reset(dwc); >>>>>>>>>>> +        if (ret) >>>>>>>>>>> +            goto done; >>>>>>>>>>>               dwc3_event_buffers_setup(dwc); >>>>>>>>>>>               __dwc3_gadget_start(dwc); >>>>>>>>>>>               ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>>>>>>>           } >>>>>>>>>>> +done: >>>>>>>>>>>           pm_runtime_put(dwc->dev); >>>>>>>>>>>           return ret; >>>>>>>>>>> -- >>>>>>>>>>> 2.40.0 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think there's one more place that may needs this check. Can >>>>>>>>>> you also >>>>>>>>>> add this check in __dwc3_set_mode()? >>>>>>>>> >>>>>>>>> Hi Thinh, >>>>>>>>> >>>>>>>>>       Sure. Will do it. >>>>>>>>> Will the below be good enough ? Or would it be good to add an >>>>>>>>> error/warn log >>>>>>>>> there> >>>>>>>> >>>>>>>> There's already a warning message in dwc3_core_soft_reset() if >>>>>>>> it fails. >>>>>>>> >>>>>>>>> >>>>>>>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>>>>>>> git diff drivers/usb/ >>>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>>>>> index 476b63618511..8d1d213d1dcd 100644 >>>>>>>>> --- a/drivers/usb/dwc3/core.c >>>>>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>>>>> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct >>>>>>>>> work_struct *work) >>>>>>>>>                     } >>>>>>>>>                     break; >>>>>>>>>             case DWC3_GCTL_PRTCAP_DEVICE: >>>>>>>>> -               dwc3_core_soft_reset(dwc); >>>>>>>>> +               ret = dwc3_core_soft_reset(dwc); >>>>>>>>> +               if (ret) >>>>>>>>> +                       goto out; >>>>>>>>> >>>>>>>>>                     dwc3_event_buffers_setup(dwc); >>>>>>>>> >>>>>>>> >>>>>>>> If soft-reset failed, the controller is in a bad state. We >>>>>>>> should not >>>>>>>> perform any further operation until the next hard reset. We >>>>>>>> should flag >>>>>>>> the controller as dead. I don't think we have the equivalent of the >>>>>>>> host's HCD_FLAG_DEAD. It may require some work in the UDC core. >>>>>>>> Perhaps >>>>>>>> we can flag within dwc3 for now and prevent any further >>>>>>>> operation for a >>>>>>>> simpler fix. >>>>>>>> >>>>>>> Hi Thinh, >>>>>>> >>>>>>>     Are you referring that if __dwc3_set_mode failed with core >>>>>>> soft reset >>>>>>> timing out, the caller i.e., dwc3_set_mode who queues the work >>>>>>> need to know >>>>>>> that the operation actually failed. So we can add a flag to >>>>>>> indicate that >>>>>>> gadget is dead and the caller of dwc3_set_mode can check the flag >>>>>>> to see if >>>>>>> the operation is successful or not. >>>>>>> >>>>>>> Or am I misunderstanding your comment ? >>>>>>> >>>>>> >>>>>> Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset >>>>>> fails, then we set this flag. So that it can prevent the user calling >>>>>> any gadget ops causing more crashes/invalid behavior. The >>>>>> dwc->softconnect is already wrong on pullup() on failure. >>>>>> >>>>>> So that we can have a check in different gadget ops. For pullup(): >>>>>> >>>>>> static int dwc3_gadget_pullup() { >>>>>>     if (dwc->udc_is_dead) { >>>>>>         dev_err(dev, "reset me. x_x \n"); >>>>>>         return; >>>>>>     } >>>>>> >>>>>>     abc(); >>>>>> } >>>>>> >>>>>> Perhaps the effort is probably the same if we enhance the UDC core >>>>>> for >>>>>> this? In any case, I'm fine either way. >>>>>> >>>>>> Thanks, >>>>>> Thinh >>>>> >>>>> Hi Thinh, >>>>> >>>>>    So you don't want UDC to retry pullup if it fails the first time >>>>> ? As per >>>>> patch-2 of this series, I was trying to propagate the EITMEDOUT to >>>>> UDC so >>>>> that the caller (most probably configfs) can take appropriate >>>>> action as to >>>>> whether it must retry pullup or not. >>>>> >>>> >>>> Now I'm confused. If the soft-reset times out, that means that the >>>> soft-reset (self-clearing) bit isn't cleared. How can we retry if it's >>>> stuck in this state? My impression is that soft-reset would not >>>> complete >>>> at all. Is that not the case for you, or it's simply because we need a >>>> longer soft-reset timeout? >>>> >>>> Thanks, >>>> Thinh >>> >>> Hi Thinh, >>> >>>    Sorry for not being clear. The intention of patch-1 was to ensure >>> we don't >>> start the controller if reset times out and patch-2 was to ensure >>> that UDC >>> is in sync with controller by understanding that gadget_connect has >>> failed >>> and necessary cleanup has to be done in gadget_bind_driver. >> >> That should still be there. >> >>> >>> But usually since the UDC_store is the one that is causing pullup to be >>> called, the error value is propagated back to UDC_store. If it sees a >>> failure, it does a retry to pullup. >>> >>> I didn't check  whether subsequent retries by UDC to pullup are helping >>> clear the reset bit or not. But I thought retrying pullup won't be of >>> any >>> harm. >> >> It's fine to retry. I'm thinking that the controller is in a bad state >> at this point that we can't recover (hopefully that's not the case). >> >>> >>> I now get that my patches are incomplete w.r.t handling the timeout. >>> >>> IIRC one of the following is what you are suggesting we need to do: >>> >>> Option-1: >>> Set a flag when reset times out and clear it upon core_exit / >>> core_init. If >>> the flag is set, block calls to all the gadget_ops in dwc3. Basically >>> even >>> if retry happens from configfs/UDC, we bail out in pullup/udc_start even >>> without trying the requested gadget operation. >>> >>> Option-2: >>> If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and >>> implement the same flag in UDC and don't even call any gadget_ops. >>> >> > Hi Thinh, > > Thanks for the review. > >> I'm thinking of option-1. For option-2, we can't control if the >> gadget_ops will be called. We only have control how we will respond in >> case they get called again. >> >> But now I'm thinking again, I think it may be ok without adding the >> flag. The UDC core and gadget driver won't do anything else before >> pullup(1) is successful. Calling other gadget_ops should be harmless >> (ie. it won't crash/break the system)? >> > I can give this a try in long run testing (For 7-14 days) and see if > anything else is breaking. > > Most probably we do a composition switch / PIPO in between which can > call usb_gadget_unregister_driver which might invoke a pullup(0) > followed by udc_stop() and like you mentioned must not be a problem. > >> Sorry for the noise, but I think it may be ok without marking the >> controller dead. I wonder if we can confirm my suspiction on retry? I >> believe this is not easy to reproduce on your setup? If not, I think we >> can take your change as is. Hi Thinh, I got this patch tested on two diff Gen-2 targets for around 10 days and no issues were seen. (No SMMU fault seen on a 10 day run). Let me know of any other concerns that might come up with this patch. Else I can rebase it to get merged. Regards, Krishna,