Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4273526ybl; Mon, 13 Jan 2020 10:47:04 -0800 (PST) X-Google-Smtp-Source: APXvYqx/bpmSBPsDOzJsZ/wLj4DDk67hd54CfCMA2UZ0J5b1/fBK1iKRg9Lh9rORHpoUcpv+AD02 X-Received: by 2002:aca:cf58:: with SMTP id f85mr14006487oig.6.1578941224700; Mon, 13 Jan 2020 10:47:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578941224; cv=none; d=google.com; s=arc-20160816; b=lRta0WwiZYsVlB3l7Qj6K4kKCZLmw+qIs9rpkGGFfH9p63mcy5Cxwy8TmvR4oNooAn YlsFgNI0WY5TP05uqMBVld+kbPgbfwFzOTj6lEGCmSSBvBaWqyYA4xI8eL/NWF0JpMGY eA15NeJv34b+JsTX27y9ynIl/K8IGqXcXLfAt08kiOwr7OZIZEe9EYHw74fBSGO+q5TX v3Ib13tQ4iexm4+V/yIYszAtqUqzWLBh1EEhxSs1Enm0L4VLwFMfQtcPsk+Br5blCF/I 3j1CSTdV72oUIQrZwMCCo77X7LCJLYxqfC7cp+muDeq9oLhZd1n7N8EArqT5Hp8S7f1H 2Q6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:in-reply-to :subject:cc:to:from:date; bh=s0OyvqWptlf+8uwuFHCJmVHsphI+ROjG1A77sIF+enQ=; b=tmpoblwQxpuMaX7k02FiVNpbiqMv7IITOB8kFGQr519kexH0mrZ6AvvRwhzef/BWvn vx5+5S7soOumz39uPPFgx6LOs5KRd+TUcFo28E7NLMhqCJVvkf+pMRxIF6vUkeSCARQx GoLcK+cvmNq9MiNVZgChYIfc/4d/jWTLqkonpvREGdTVkFp5GAuCmPvg42lnWX22lPq/ bU08JuBZhRwViMm5Ak/NlW78c4LYiRhtC7DJdsjM2o+bE8k0BIkx0Kr5AojXk9ZzLgfY bW1CiE6CULnySWX3TKklYjhXzfJcrCqj6MVBGy7nhCCrTBFhT3ngC6wcLG3WInTQ4E69 Aj2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 46si7570351otu.45.2020.01.13.10.46.52; Mon, 13 Jan 2020 10:47:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728885AbgAMSpi (ORCPT + 99 others); Mon, 13 Jan 2020 13:45:38 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:56806 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1727726AbgAMSph (ORCPT ); Mon, 13 Jan 2020 13:45:37 -0500 Received: (qmail 4729 invoked by uid 2102); 13 Jan 2020 13:45:35 -0500 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 13 Jan 2020 13:45:35 -0500 Date: Mon, 13 Jan 2020 13:45:35 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Andrey Konovalov , Felipe Balbi cc: Greg Kroah-Hartman , USB list , LKML , Jonathan Corbet , Dmitry Vyukov , Alexander Potapenko , Marco Elver Subject: Re: [PATCH v4 1/1] usb: gadget: add raw-gadget interface In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 13 Jan 2020, Andrey Konovalov wrote: > On Mon, Jan 13, 2020 at 6:34 PM Andrey Konovalov wrote: > > > > On Mon, Jan 13, 2020 at 5:50 PM Alan Stern wrote: > > > > > > On Mon, 13 Jan 2020, Andrey Konovalov wrote: > > > > > > > I've also found an issue, but I'm not sure if that is the bug in Raw > > > > Gadget, or in the gadget layer (in the former case I'll add this fix > > > > to v5 as well). What I believe I'm seeing is > > > > __fput()->usb_gadget_unregister_driver()->usb_gadget_remove_driver()->gadget_unbind() > > > > racing with dummy_timer()->gadget_setup(). In my case it results in > > > > gadget_unbind() doing set_gadget_data(gadget, NULL), and then > > > > gadget_setup() dereferencing get_gadget_data(gadget). > > > > > > > > Alan, does it look possible for those two functions to race? Should > > > > this be prevented by the gadget layer, or should I use some kind of > > > > locking in my gadget driver to prevent this? > > > > > > In your situation this race shouldn't happen, because before > > > udc->driver->unbind() is invoked we call usb_gadget_disconnect(). If > > > that routine succeeds -- which it always does under dummy-hcd -- then > > > there can't be any more setup callbacks, because find_endpoint() will > > > always return NULL (the is_active() test fails; see the various > > > set_link_state* routines). So I don't see how you could have ended up > > > with the race you describe. > > > > I've managed to reproduce the race by adding an mdelay() into the > > beginning of the setup() callback. AFAIU what happens is setup() gets > > called (and waits on the mdelay()), then unbind() comes in and does > > set_gadget_data(NULL), and then setup() proceeds, gets NULL through > > get_gadget_data() and crashes on null-ptr-deref. I've got the same > > crash a few times after many days of fuzzing, so I assume it can > > happen without the mdelay() as well. > > > > > However, a real UDC might not be able to perform a disconnect under > > > software control. In that case usb_gadget_disconnect() would not > > > change the pullup state, and there would be a real possibility of a > > > setup callback racing with an unbind callback. This seems like a > > > genuine problem and I can't think of a solution offhand. > > > > > > What we would need is a way to tell the UDC driver to stop invoking > > > gadget callbacks, _before_ the UDC driver's stop callback gets called. > > > Maybe this should be merged into the pullup callback somehow. > > Perhaps for the dummy driver we need to wait for setup() to finish if > it's being executed and then stop the dummy timer in dummy_pullup()? Yes, we need to wait for a setup callback to finish. But dummy_timer should not be stopped; otherwise URBs that have already been submitted would never be given back. The big problem is that usb_gadget_disconnect() can be called in interrupt context. In general, a UDC driver will need to call synchronize_irq() to make sure there aren't any callbacks still running, and that can be done only in process context. dummy-hcd is slightly different since it doesn't manage actual hardware; it calls usleep_range() instead of synchronize_irq() -- but that also requires process context. We could change the gadget API to require that usb_gadget_disconnect() and related routines always be called in process context. I don't know if that's such a good idea though. A gadget driver might want to disconnect from within its setup handler or a completion routine, for example. A better approach would be to add a new synchronize_callbacks() pointer in the usb_gadget_ops structure. But to work properly it would have to be mandatory for every UDC driver, and that's not so easy to accomplish. Suggestions? Alan Stern