Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3350379pxv; Mon, 28 Jun 2021 02:38:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyM8K7ZU5UbpH1fMBr3oKfU+uCpQ+dnh5XtzAukIORrKMunZh1H3Kd8xiVmnXIMnDrW38ZD X-Received: by 2002:a05:6402:350e:: with SMTP id b14mr31234716edd.286.1624873105393; Mon, 28 Jun 2021 02:38:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624873105; cv=none; d=google.com; s=arc-20160816; b=KmeUfnhax5fG1DQn5XNFAcWTk7Elcm+znKVnBelT4+hw9zImtuCZCKmz8q3IXHy/S6 +wZ9agRo1cXYmFHyEdwBYLuyY3itiXgVZ9iUY4OTtSalBPJAeLBYoXY7JxQqdEl6HG/s nXcu98gS8unAdrm7VWXGQjzdoSXvIbvj2k1BdEgiiZV/tCYZ8nk8WpezM+hIyC22pUzh IYztgMECo+IysI1Ugwe1Srxn37kNWSa5u3HnXCWuoD0TroOh1dhx58AzuJle3jx8o8OO 9jNnjLMY3yssipBFoOcbz+4vDcskwv79PkNsxFrJ3TaWBvkSb6SQ+4wq/8hMMQ9/UqQS Qu3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:message-id:references:in-reply-to :subject:cc:to:from:date:content-transfer-encoding:mime-version :sender:dkim-signature; bh=Ix1sjBb6toi9CPLsHl8ZR2Ej/57+qjdBVCoGLPP6RZY=; b=tLO6cstrStod4zEwlUnIf/bh1F/7oTzxAGLOBYhE9dFmUIOXqrsLMB4PN57dbftflO kpxYGud3ZlIjAqVXR6Sv/x8Ko3g1OsXWR+wrNFOYq9/U7TTTyV1icWoZ3doxC0y5y49C WqU7ID8hOPLs5KHdF4uPDLt7LEXaeKVYBIJiZN3o2dsyA/Y+A6DPajlA2CGY50n8cX3v 5Q+lfeBbXTzn+rfNPVU/8UsLITcg11wWB799dw8zGv7HKYpbeI1/SKdOIyKOOst1hJai 9wcaKbHXZgF0zpHwvXrRaXTGNNgBo+smWR+p9kqzo7CrFeutDira+TCFUMSxUPJ6UgDp 97rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=RRz+vDhf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g21si10375219edb.437.2021.06.28.02.37.59; Mon, 28 Jun 2021 02:38:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=RRz+vDhf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232483AbhF1JjO (ORCPT + 99 others); Mon, 28 Jun 2021 05:39:14 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:63600 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230256AbhF1JjN (ORCPT ); Mon, 28 Jun 2021 05:39:13 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1624873008; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=Ix1sjBb6toi9CPLsHl8ZR2Ej/57+qjdBVCoGLPP6RZY=; b=RRz+vDhfXJb7RQfmHd5OG3JXSPNZst6cDZN5gfyTkC2/PkZXNu3e+sHK6ZMy4lxZpicQmP2K fZLamiQG/kgiKE1F1FAIFcS1WqxD9ueV6g4D3yej0y3Zofy34kkbaaAbU0UkHC2yp66qXsq8 +iL2vAvwPFGBpRSsTzK/PnxbGVk= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n05.prod.us-west-2.postgun.com with SMTP id 60d998173a8b6d0a4531dde8 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 28 Jun 2021 09:36:23 GMT Sender: linyyuan=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id B83FAC43217; Mon, 28 Jun 2021 09:36:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: linyyuan) by smtp.codeaurora.org (Postfix) with ESMTPSA id C05F4C433D3; Mon, 28 Jun 2021 09:36:22 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 28 Jun 2021 17:36:22 +0800 From: linyyuan@codeaurora.org To: Alan Stern Cc: Felipe Balbi , Thinh Nguyen , Greg Kroah-Hartman , linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, Jack Pham Subject: Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation In-Reply-To: <20210627140903.GB624763@rowland.harvard.edu> References: <20210625104415.8072-1-linyyuan@codeaurora.org> <20210625163707.GC574023@rowland.harvard.edu> <20210626150304.GA601624@rowland.harvard.edu> <1d1f06763c7cdeb67264128537c6a8f4@codeaurora.org> <20210627140903.GB624763@rowland.harvard.edu> Message-ID: X-Sender: linyyuan@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-06-27 22:09, Alan Stern wrote: > On Sun, Jun 27, 2021 at 10:48:56AM +0800, linyyuan@codeaurora.org > wrote: >> On 2021-06-26 23:03, Alan Stern wrote: >> > On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@codeaurora.org wrote: >> > > On 2021-06-26 00:37, Alan Stern wrote: > >> > > > Here and in the other places, you should test dwc->async_callbacks >> > > > _before_ dropping the spinlock. Otherwise there is a race (the flag >> > > > could be written at about the same time it is checked). >> > > thanks for your comments, >> > > >> > > if you think there is race here, how to make sure gadget_driver >> > > pointer is >> > > safe, >> > > this is closest place where we can confirm it is non-NULL by checking >> > > async_callbacks ? >> > >> > I explained this twice already: We know that gadget_driver is not >> > NULL because usb_gadget_remove_driver calls synchronize_irq before >> > doing usb_gadget_udc_stop. >> > >> > Look at this timing diagram: >> > >> > CPU0 CPU1 >> > ---- ---- >> > IRQ happens for setup packet >> > Handler sees async_callbacks >> > is enabled >> > Handler unlocks dwc->lock >> > usb_gadget_remove_driver runs >> > Disables async callbacks >> > Calls synchronize_irq >> > Handler calls dwc-> . waits for IRQ handler to >> > gadget_driver->setup . return >> > Handler locks dwc-lock . >> > ... . >> > Handler returns . >> > . synchronize_irq returns >> > Calls usb_gadget_udc_stop >> > dwc->gadget_driver is >> > set to NULL >> > >> > As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it, >> > even though async_callbacks gets cleared during the time when the >> > lock is released. >> thanks for your patient explanation, >> but from this part, seem it is synchronize_irq() help to avoid NULL >> pointer >> crash. > > That's right. > >> can you also explain how async_callbacks flag help here ? > > It doesn't help in the situation shown above, but it does help in other > situations. Consider this timing diagram: > > CPU0 CPU1 > ---- ---- > usb_gadget_remove_driver runs > Disables async callbacks > Calls synchronize_irq > synchronize_irq returns > Calls udc_driver_unbind > IRQ happens for disconnect > Handler sees async_callbacks > is disabled > Handler returns > Calls usb_gadget_udc_stop > dwc->gadget_driver is > set to NULL > > With the async_callbacks check, everything works okay. But now look at > what would happen without the async_callbacks mechanism: > > CPU0 CPU1 > ---- ---- > usb_gadget_remove_driver runs > Calls synchronize_irq > synchronize_irq returns > Calls udc_driver_unbind > IRQ happens for disconnect > Handler unlocks dwc->lock > Calls dwc->gadget_driver->disconnect > Gadget driver has already been unbound > and is not prepared to handle a > callback, so it crashes > Calls usb_gadget_udc_stop > dwc->gadget_driver is > set to NULL > > Without the async_callbacks mechanism, the gadget driver can get a > callback at the wrong time (after it has been unbound), which might > cause it to crash. 1. do you think we need to back to my original patch, https://lore.kernel.org/linux-usb/20210619154309.52127-1-linyyuan@codeaurora.org/T/#t i think we can add the spin lock or mutex lock to protect this kind of race from UDC layer, it will be easy understanding. 2. if you insist this kind of change, how to change following code in dwc3 ? if (dwc->gadget_driver && dwc->gadget_driver->disconnect) { 2.1 if (dwc->async_callbacks && dwc->gadget_driver->disconnect) { or 2.2 if (dwc->async_callbacks && vdwc->gadget_driver && dwc->gadget_driver->disconnect) { > > Alan Stern