Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3121136rwd; Mon, 22 May 2023 08:57:54 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6DUPmjFcwuOMMgKuiJ7/z4tXBBxgBdaRhH4og96m3hdSwd4LY3aVLsRliRBK4DZE/d16gU X-Received: by 2002:a05:6a00:124a:b0:643:96e:666b with SMTP id u10-20020a056a00124a00b00643096e666bmr15533969pfi.34.1684771074509; Mon, 22 May 2023 08:57:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684771074; cv=none; d=google.com; s=arc-20160816; b=j6z+98TGH9PA58n8WwxLerpAlwUbtnCyqfps66th51Lkc7mBuaiD8n9SJslgHmyhZd pdMk+wXICAp/LMb22eqxC9ztlxkLs/w6zjeg66Is8g5QURCsZS9Hx/2OrRFytV2ln0V6 mxqjj1ng8phKOReKi6+xsFq1ms61VO2LjIB3Z612V7DQTt4z086d2oUGeOEyCWtDoGDf 8+7x+lqb3qGAiLZmGab9WtXivZ47ZzMiYGdmlKdNk99GykjWjgF+U252tMeiXSOKUHCX Cz6W1t+T6Yq4uZHQ3FN5rrgvTgq2vXZU/jAG3KESiqmD21XgirpZotoeN3Nxc6WcjqUv Eeyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=DbvPqYx7U9Es1GRW9hTXfEdi2feJMauh2Y4S/m1Azr0=; b=B9kgy4qIJ1WkYGNDnMkzCg4qleqUIDZ8zuZiabL4/4Ud0mp/IHiboQAiFKb3Lbz8iY 3SLCYCdSLOzuG9aIlXBKvHOMXagmxr71ZdcSiGdzUsH2tXP+jvQVGIfUkfFAvJVPtnAX Lo75SmNy+MzZYABaq65YwnuB1Sxa0kLQH6j9syP/aFQWtvUyJQmM0IIvLvLM8kxBg36Q YqviY4I/VSzmbrvNF80usuDVk2IpUGY7YxRb+fsJQ7R/NGQBt60QC4hVI8MPqSGATGux j6bLDuwd9YYQ60ZYOxIVXMyS3NQfwKT+LyYqE8wjFtl8/AEKlKp4R+r38mnqCwIWYhNc ZlsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y1-20020a626401000000b0064729e5a04dsi4815108pfb.388.2023.05.22.08.57.41; Mon, 22 May 2023 08:57:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233802AbjEVPzx (ORCPT + 99 others); Mon, 22 May 2023 11:55:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231797AbjEVPzv (ORCPT ); Mon, 22 May 2023 11:55:51 -0400 Received: from netrider.rowland.org (netrider.rowland.org [192.131.102.5]) by lindbergh.monkeyblade.net (Postfix) with SMTP id 1BBB1118 for ; Mon, 22 May 2023 08:55:46 -0700 (PDT) Received: (qmail 140276 invoked by uid 1000); 22 May 2023 11:55:45 -0400 Date: Mon, 22 May 2023 11:55:45 -0400 From: Alan Stern To: Badhri Jagan Sridharan Cc: gregkh@linuxfoundation.org, colin.i.king@gmail.com, xuetao09@huawei.com, quic_eserrao@quicinc.com, water.zhangjiantao@huawei.com, peter.chen@freescale.com, francesco@dolcini.it, alistair@alistair23.me, stephan@gerhold.net, bagasdotme@gmail.com, luca@z3ntu.xyz, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, Francesco Dolcini Subject: Re: [PATCH v2] usb: gadget: udc: core: Offload usb_udc_vbus_handler processing Message-ID: <547ecbb2-921d-4714-82b7-066202ccf292@rowland.harvard.edu> References: <20230519043041.1593578-1-badhri@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 22, 2023 at 12:48:39AM -0700, Badhri Jagan Sridharan wrote: > Hi Alan, > > Thanks for taking the time out to share more details ! > +1 on your comment: " A big problem with the USB gadget > framework is that it does not clearly state which routines have to run > in process context and which have to run in interrupt/atomic context." > > > I started to work on allow_connect and other suggestions that you had made. > In one of the previous comments you had mentioned that the > connect_lock should be a spinlock and not a mutex. Yeah, I changed my mind about that. > Right now there are four conditions that seem to be deciding whether > pullup needs to be enabled or disabled through gadget->ops->pullup(). > 1. Gadget not deactivated through usb_gadget_deactivate() > 2. Gadget has to be started through usb_gadget_udc_start(). > soft_connect_store() can start/stop gadget. > 3. usb_gadget has been connected through usb_gadget_connect(). This is > assuming we are getting rid of usb_udc_vbus_handler. > 4. allow_connect is true > > I have so far identified two constraints here: > a. gadget->ops->pullup() can sleep in some implementations. > For instance: > BUG: scheduling while atomic: init/1/0x00000002 > .. > [ 26.990631][ T1] Call trace: > [ 26.993759][ T1] dump_backtrace+0x104/0x128 > [ 26.998281][ T1] show_stack+0x20/0x30 > [ 27.002279][ T1] dump_stack_lvl+0x6c/0x9c > [ 27.006627][ T1] __schedule_bug+0x84/0xb4 > [ 27.010973][ T1] __schedule+0x6f0/0xaec > [ 27.015147][ T1] schedule+0xc8/0x134 > [ 27.019059][ T1] schedule_timeout+0x98/0x134 > [ 27.023666][ T1] msleep+0x34/0x4c > [ 27.027317][ T1] dwc3_core_soft_reset+0xf0/0x354 > [ 27.032273][ T1] dwc3_gadget_pullup+0xec/0x1d8 > [ 27.037055][ T1] usb_gadget_pullup_update_locked+0xa0/0x1e0 > [ 27.042967][ T1] udc_bind_to_driver+0x1e4/0x30c > [ 27.047835][ T1] usb_gadget_probe_driver+0xd0/0x178 > [ 27.053051][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c > [ 27.058442][ T1] configfs_write_iter+0x100/0x178 > [ 27.063399][ T1] vfs_write+0x278/0x3c4 > [ 27.067483][ T1] ksys_write+0x80/0xf4 What kernel was this trace made with? I don't see udc_bind_to_driver appearing anywhere in 6.4-rc3. > b. gadget->ops->udc_start can also sleep in some implementations. > For example: > [ 28.024255][ T1] BUG: scheduling while atomic: init/1/0x00000002 > .... > [ 28.324996][ T1] Call trace: > [ 28.328126][ T1] dump_backtrace+0x104/0x128 > [ 28.332647][ T1] show_stack+0x20/0x30 > [ 28.336645][ T1] dump_stack_lvl+0x6c/0x9c > [ 28.340993][ T1] __schedule_bug+0x84/0xb4 > [ 28.345340][ T1] __schedule+0x6f0/0xaec > [ 28.349513][ T1] schedule+0xc8/0x134 > [ 28.353425][ T1] schedule_timeout+0x4c/0x134 > [ 28.358033][ T1] wait_for_common+0xac/0x13c > [ 28.362554][ T1] wait_for_completion_killable+0x20/0x3c > [ 28.368118][ T1] __kthread_create_on_node+0xe4/0x1ec > [ 28.373422][ T1] kthread_create_on_node+0x54/0x80 > [ 28.378464][ T1] setup_irq_thread+0x50/0x108 > [ 28.383072][ T1] __setup_irq+0x90/0x87c > [ 28.387245][ T1] request_threaded_irq+0x144/0x180 > [ 28.392287][ T1] dwc3_gadget_start+0x50/0xac > [ 28.396866][ T1] udc_bind_to_driver+0x14c/0x31c > [ 28.401763][ T1] usb_gadget_probe_driver+0xd0/0x178 > [ 28.406980][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c > [ 28.412370][ T1] configfs_write_iter+0x100/0x178 > [ 28.417325][ T1] vfs_write+0x278/0x3c4 > [ 28.421411][ T1] ksys_write+0x80/0xf4 > > static int dwc3_gadget_start(struct usb_gadget *g, > struct usb_gadget_driver *driver) > { > struct dwc3 *dwc = gadget_to_dwc(g); > ... > irq = dwc->irq_gadget; > ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt, > IRQF_SHARED, "dwc3", dwc->ev_buf); > > Given that "1016fc0c096c USB: gadget: Fix obscure lockdep violation > for udc_mutex" has been there for a while and no one has reported > issues so far, perhaps ->disconnect() callback is no longer being > invoked in atomic context and the documentation is what that needs to > be updated ? That's part of what I'm trying to figure out. However, some UDC drivers call ->disconnect() directly when they detect loss of VBUS power, instead of going through the core. So disconnect handlers will have remain capable of running in interrupt context until those UDC drivers are changed. Getting back to your first point, it looks like we need to assume any routine that needs to communicate with the UDC hardware (such as the ->pullup callback used in usb_gadget_{dis}connect()) must always be called in process context. This means that usb_udc_connect_control() always has to run in process context, since it will do either a connect or a disconnect. On the other hand, some routines -- in particular, usb_udc_vbus_handler() -- may be called by a UDC driver's interrupt handler and therefore may run in interrupt context. (This fact should be noted in that routine's kerneldoc, by the way.) So here's the problem: usb_udc_vbus_handler() running in interrupt context calls usb_udc_connect_control(), which has to run in process context. And this is not just a simple issue caused by the ->disconnect() callback or use of mutexes; it's more fundamental. I'm led to conclude that you were right to offload part of usb_udc_vbus_handler()'s job to a workqueue. It's an awkward thing to do, because you have to make sure to cancel the work item at times when it's no longer needed. But there doesn't seem to be any other choice. Here's two related problems for you to think about: 1. Once gadget_unbind_driver() has called usb_gadget_disconnect(), we don't want a VBUS change to cause usb_udc_vbus_handler()'s work routine to turn the pullup back on. How can we prevent this? 2. More generally, suppose usb_udc_vbus_handler() gets called at exactly the same time that some other pathway (either gadget_bind_driver() or soft_connect_store()) tries to do a connect or disconnect. What should happen then? Alan Stern