Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp228806lqs; Mon, 4 Mar 2024 23:20:20 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCXP11gtdeuf+ahPC6y0+F0i8VLYtz1dUu9x6SgLwGYILX3SjSURkgGqJYmHa+Pn7+cCueOb4KWsCPSgf33ZZqcSLHNGrs992Ps3mgSLLg== X-Google-Smtp-Source: AGHT+IHMprlJwHCDOctkl6mi+jvKkEbPJ3b3zoEb3sTgCEGmBcV8mqy/ycDhw97P8VTHYnh/5v+v X-Received: by 2002:a17:906:d156:b0:a41:392d:e11c with SMTP id br22-20020a170906d15600b00a41392de11cmr9511698ejb.26.1709623220624; Mon, 04 Mar 2024 23:20:20 -0800 (PST) Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id h4-20020a1709063b4400b00a441e6636c6si4751438ejf.251.2024.03.04.23.20.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 23:20:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-91789-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@canonical.com header.s=20210705 header.b=gUvpE67s; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-91789-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-91789-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 35E511F21735 for ; Tue, 5 Mar 2024 07:20:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 00DFE7E590; Tue, 5 Mar 2024 07:20:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="gUvpE67s" Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1040F4C637 for ; Tue, 5 Mar 2024 07:20:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.188.123 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709623208; cv=none; b=P/boHtU9U76htuNUfIFbUouMIg/2zLNF+0RHx7Twp7DG3uBtwFF1psG/kObaxJXhtY6wh0+t9BmvS4chq9/5IUZVLKxDh+ugItAy87DpjufzU2OiQCtb6F8kLTAlYctVO611tkXHF1HXs7bn2Qy6O6TogHiztxS98COgW64RJ5o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709623208; c=relaxed/simple; bh=zjRVqQ3MKLC2BuD5Q1xI+k8vnbV92CXcW95Tg8GFIOU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=P2CkdXvuZHJiOo4LMo8AMT9BUjvzwRKu3j/yYBt+o28EbCqGgZomFOuur29H08hxfAPg8soQbdMJXsxRi2ZIsQPNN0r4sM18ie2iYxBAfuCmg22YVpNBvlVxulV+NHO0qW9zZZ4DrxbQGqfjUaL7a5uZGKVM8qKyKMcVq/ewz6c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com; spf=pass smtp.mailfrom=canonical.com; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b=gUvpE67s; arc=none smtp.client-ip=185.125.188.123 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=canonical.com Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 49DDA3FB6D for ; Tue, 5 Mar 2024 07:20:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1709623203; bh=D6usFkdRzvkX3msLdSSaO/8gvWBvV/3rw0fRdwxUjj4=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=gUvpE67sS4UKZu0xR3BaLeBaGIa6IBuVb5SjI+RtIM8XNcAz7PiTSM/agMTCWTMC8 GKbIcja6ZcaPRszm+3UCIQy1yFZZPNlR1UsFv/Jovp/VtbK6QfDDcr/doZn06svQb3 J13XVlbIHKaTQsRMLm2lnDDz4in+KRnq1TlfCyL12OUZaWx77JLXjEyO7nN/Y0t6ku xXrd/y/7jcvgIFHR96oqvXrvpY2MgfkxSXJemK7GDe3zQ8AK3oRSBBFW/aEE5Rm6yC UVyP3r9nt14c3DQNnwA57jnki/Xr+vns05/LvOUURVIvNt9ypcWOXI4ez6BNOilfoL Zfwgvbm0AQmig== Received: by mail-pl1-f198.google.com with SMTP id d9443c01a7336-1dc685df4adso4356025ad.2 for ; Mon, 04 Mar 2024 23:20:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709623202; x=1710228002; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D6usFkdRzvkX3msLdSSaO/8gvWBvV/3rw0fRdwxUjj4=; b=cSm+4XxkRq5Gokp8v5pSk4YRsZnDAsMW9qGnQdGy/Jxuqrs/3Mv9YYWkn+4ZKxE/8K ph7g0b7OsEddQ89p+Bu5agIWCyz0cbZQZolAya95ulxADIqSuJKDpRx2nyQyuvG736vy eRLtRrU/8eKhd3MRmSQ3uQAdpTo+tD8x7Slld6YH59e94GtnFsZwqRj570K2fglqEVJ3 g3g2QnqpeNy1lAykg5HipEzLNSRSzjl4j+lbCABZjxVuoJ2FHFrZh8ol4CLfY/KSOy6j 2lQpE4wEP31uVd8JkoB9sADVIjcJCEcbpxPQdydX07jJnLoJaOXUmXPmohN7UuBH+u77 1t3g== X-Forwarded-Encrypted: i=1; AJvYcCWcJ7odDHB9pT3ekrYYwZ/IxWQF7dltxGLVXAoZTxEEGQicqe7KRi2btUGFAKmxwsgZL5s6fb9e1eDXPRb1ijWmZh4bFwjX+dV52/VY X-Gm-Message-State: AOJu0YyvAidcm9DT1PRgFbvbamQ5/8pmfIrS4F60kZ1Wx/wVe+6a0Gpf +mHIJ81uoh/63YnFlIgHHn65BCEPdUAAfgMU/AST47GQltIxW/Z4qGxAp5lnEuRnkN1icbALnJ6 7zsj9PYbt9UNysVUkfjc6/CmbiCp44zSjhsRXIRVYdEQr91qNdbtHh5lIiUyExOZ5mZf+mR7kb4 LhgTs4WuY3KXz41iCaMEUOfsyOKeDWFmd+WGuToEHOX7KLOPhDAJ3v X-Received: by 2002:a17:903:4281:b0:1dc:db95:fd24 with SMTP id ju1-20020a170903428100b001dcdb95fd24mr829120plb.55.1709623201643; Mon, 04 Mar 2024 23:20:01 -0800 (PST) X-Received: by 2002:a17:903:4281:b0:1dc:db95:fd24 with SMTP id ju1-20020a170903428100b001dcdb95fd24mr829100plb.55.1709623201268; Mon, 04 Mar 2024 23:20:01 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240304155138.GA482969@bhelgaas> In-Reply-To: From: Kai-Heng Feng Date: Tue, 5 Mar 2024 15:19:49 +0800 Message-ID: Subject: Re: [PATCH v3] driver core: Cancel scheduled pm_runtime_idle() on device removal To: "Rafael J. Wysocki" Cc: Bjorn Helgaas , gregkh@linuxfoundation.org, bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Ricky Wu , Kees Cook , Tony Luck , "Guilherme G. Piccoli" , linux-hardening@vger.kernel.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 5, 2024 at 2:10=E2=80=AFAM Rafael J. Wysocki wrote: > > On Mon, Mar 4, 2024 at 6:00=E2=80=AFPM Rafael J. Wysocki wrote: > > > > On Mon, Mar 4, 2024 at 5:41=E2=80=AFPM Rafael J. Wysocki wrote: > > > > > > On Mon, Mar 4, 2024 at 4:51=E2=80=AFPM Bjorn Helgaas wrote: > > > > > > > > On Mon, Mar 04, 2024 at 03:38:38PM +0100, Rafael J. Wysocki wrote: > > > > > On Thu, Feb 29, 2024 at 7:23=E2=80=AFAM Kai-Heng Feng > > > > > wrote: > > > > > > > > > > > > When inserting an SD7.0 card to Realtek card reader, the card r= eader > > > > > > unplugs itself and morph into a NVMe device. The slot Link down= on hot > > > > > > unplugged can cause the following error: > > > > > > > > > > > > pcieport 0000:00:1c.0: pciehp: Slot(8): Link Down > > > > > > BUG: unable to handle page fault for address: ffffb24d403e5010 > > > > > > PGD 100000067 P4D 100000067 PUD 1001fe067 PMD 100d97067 PTE 0 > > > > > > Oops: 0000 [#1] PREEMPT SMP PTI > > > > > > CPU: 3 PID: 534 Comm: kworker/3:10 Not tainted 6.4.0 #6 > > > > > > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H3= 70M Pro4, BIOS P3.40 10/25/2018 > > > > > > Workqueue: pm pm_runtime_work > > > > > > RIP: 0010:ioread32+0x2e/0x70 > > > > > ... > > > > > > Call Trace: > > > > > > > > > > > > ? show_regs+0x68/0x70 > > > > > > ? __die_body+0x20/0x70 > > > > > > ? __die+0x2b/0x40 > > > > > > ? page_fault_oops+0x160/0x480 > > > > > > ? search_bpf_extables+0x63/0x90 > > > > > > ? ioread32+0x2e/0x70 > > > > > > ? search_exception_tables+0x5f/0x70 > > > > > > ? kernelmode_fixup_or_oops+0xa2/0x120 > > > > > > ? __bad_area_nosemaphore+0x179/0x230 > > > > > > ? bad_area_nosemaphore+0x16/0x20 > > > > > > ? do_kern_addr_fault+0x8b/0xa0 > > > > > > ? exc_page_fault+0xe5/0x180 > > > > > > ? asm_exc_page_fault+0x27/0x30 > > > > > > ? ioread32+0x2e/0x70 > > > > > > ? rtsx_pci_write_register+0x5b/0x90 [rtsx_pci] > > > > > > rtsx_set_l1off_sub+0x1c/0x30 [rtsx_pci] > > > > > > rts5261_set_l1off_cfg_sub_d0+0x36/0x40 [rtsx_pci] > > > > > > rtsx_pci_runtime_idle+0xc7/0x160 [rtsx_pci] > > > > > > ? __pfx_pci_pm_runtime_idle+0x10/0x10 > > > > > > pci_pm_runtime_idle+0x34/0x70 > > > > > > rpm_idle+0xc4/0x2b0 > > > > > > pm_runtime_work+0x93/0xc0 > > > > > > process_one_work+0x21a/0x430 > > > > > > worker_thread+0x4a/0x3c0 > > > > > ... > > > > > > > > > > This happens because scheduled pm_runtime_idle() is not cancell= ed. > > > > > > > > > > But rpm_resume() changes dev->power.request to RPM_REQ_NONE and i= f > > > > > pm_runtime_work() sees this, it will not run rpm_idle(). > > > > > > > > > > However, rpm_resume() doesn't deactivate the autosuspend timer if= it > > > > > is running (see the comment in rpm_resume() regarding this), so i= t may > > > > > queue up a runtime PM work later. > > > > > > > > > > If this is not desirable, you need to stop the autosuspend timer > > > > > explicitly in addition to calling pm_runtime_get_sync(). > > > > > > > > I don't quite follow all this. I think the race is between > > > > rtsx_pci_remove() (not resume) and rtsx_pci_runtime_idle(). > > > > > > I think so too and the latter is not expected to run. > > > > > > > rtsx_pci_remove() > > > > { > > > > pm_runtime_get_sync() > > > > pm_runtime_forbid() > > > > ... > > > > > > > > If this is an rtsx bug, what exactly should be added to > > > > rtsx_pci_remove()? > > > > > > > > Is there ever a case where we want any runtime PM work to happen > > > > during or after a driver .remove()? If not, maybe the driver core > > > > should prevent that, which I think is basically what this patch doe= s. > > > > > > No, it is not, because it doesn't actually prevent the race from > > > occurring, it just narrows the window quite a bit. > > > > > > It would be better to call pm_runtime_dont_use_autosuspend() instead > > > of pm_runtime_barrier(). > > > > > > > If this is an rtsx driver bug, I'm concerned there may be many othe= r > > > > drivers with a similar issue. rtsx exercises this path more than m= ost > > > > because the device switches between card reader and NVMe SSD using > > > > hotplug add/remove based on whether an SD card is inserted (see [1]= ). > > > > > > This is a valid concern, so it is mostly a matter of where to disable > > > autosuspend. > > > > > > It may be the driver core in principle, but note that it calls > > > ->remove() after invoking pm_runtime_put_sync(), so why would it > > > disable autosuspend when it allows runtime PM to race with device > > > removal in general? > > > > > > Another way might be to add a pm_runtime_dont_use_autosuspend() call > > > at the beginning of pci_device_remove(). > > > > > > Or just remove the optimization in question from rpm_resume() which i= s > > > quite confusing and causes people to make assumptions that lead to > > > incorrect behavior in this particular case. > > > > Well, scratch this. > > > > If rpm_idle() is already running at the time rpm_resume() is called, > > the latter may return right away without waiting, which is incorrect. > > > > rpm_resume() needs to wait for the "idle" callback to complete, so > > this (again, modulo GMail-induced whitespace mangling) should help: > > > > --- > > drivers/base/power/runtime.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > Index: linux-pm/drivers/base/power/runtime.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- linux-pm.orig/drivers/base/power/runtime.c > > +++ linux-pm/drivers/base/power/runtime.c > > @@ -798,7 +798,8 @@ static int rpm_resume(struct device *dev > > } > > > > if (dev->power.runtime_status =3D=3D RPM_RESUMING || > > - dev->power.runtime_status =3D=3D RPM_SUSPENDING) { > > + dev->power.runtime_status =3D=3D RPM_SUSPENDING || > > + dev->power.idle_notification) { > > DEFINE_WAIT(wait); > > > > if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) { > > @@ -826,7 +827,8 @@ static int rpm_resume(struct device *dev > > prepare_to_wait(&dev->power.wait_queue, &wait, > > TASK_UNINTERRUPTIBLE); > > if (dev->power.runtime_status !=3D RPM_RESUMING && > > - dev->power.runtime_status !=3D RPM_SUSPENDING) > > + dev->power.runtime_status !=3D RPM_SUSPENDING && > > + !dev->power.idle_notification) > > break; > > > > spin_unlock_irq(&dev->power.lock); > > Well, not really. > > The problem is that rtsx_pci_runtime_idle() is not expected to be > running after pm_runtime_get_sync(), but the latter doesn't really > guarantee that. It only guarantees that the suspend/resume callbacks > will not be running after it returns. > > As I said above, if the ->runtime_idle() callback is already running > when pm_runtime_get_sync() runs, the latter will notice that the > status is RPM_ACTIVE and will return right away without waiting for > the former to complete. In fact, it cannot wait for it to complete, > because it may be called from a ->runtime_idle() callback itself (it > arguably does not make much sense to do that, but it is not strictly > forbidden). > > So whoever is providing a ->runtime_idle() callback, they need to > protect it from running in parallel with whatever code runs after > pm_runtime_get_sync(). Note that ->runtime_idle() will not start > after pm_runtime_get_sync(), but it may continue running then if it > has started earlier already. > > Calling pm_runtime_barrier() after pm_runtime_get_sync() (not before > it) should suffice, but once the runtime PM usage counter is dropped, > rpm_idle() may run again, so this is only effective until the usage > counter is greater than 1. This means that > __device_release_driver(() is not the right place to call it, because > the usage counter is dropped before calling device_remove() in that > case. > > The PCI bus type can prevent the race between driver-provided > ->runtime_idle() and ->remove() from occurring by adding a > pm_runtime_probe() call in the following way: Thanks for the detailed explanation. Does this mean only PCI bus needs this fix because other subsystems are not affected, or this needs to be implemented for each different bus types? Kai-Heng > > --- > drivers/pci/pci-driver.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > Index: linux-pm/drivers/pci/pci-driver.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-pm.orig/drivers/pci/pci-driver.c > +++ linux-pm/drivers/pci/pci-driver.c > @@ -473,6 +473,13 @@ static void pci_device_remove(struct dev > > if (drv->remove) { > pm_runtime_get_sync(dev); > + /* > + * If the driver provides a .runtime_idle() callback and it has > + * started to run already, it may continue to run in parallel > + * with the code below, so wait until all of the runtime PM > + * activity has completed. > + */ > + pm_runtime_barrier(dev); > drv->remove(pci_dev); > pm_runtime_put_noidle(dev); > }