Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7035717rdb; Wed, 3 Jan 2024 02:18:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFRaK0bmAZLHK3OsIWzXUqQb0F9DgrwCyrsikkkatEAQhR8n4uqMk664rNKVwE+ldC9k+E/ X-Received: by 2002:a05:6a00:6c9e:b0:6da:13c0:dba1 with SMTP id jc30-20020a056a006c9e00b006da13c0dba1mr6315236pfb.13.1704277092217; Wed, 03 Jan 2024 02:18:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704277092; cv=none; d=google.com; s=arc-20160816; b=DjaIeHFqdSDQyX4qpogD/v3mj+2GDo6Su8VT7EprTYQ1FWeAKifsvgyB0HfO3gsO7K uDRx8xvV9uoPkLepXHL1FxV+4D2Ui39SdDfE0VCJUxeBmBOH+kFwZERwBm+o/sCt1YKM BZldpAzQxmGR/YinQw6Cvf/BbbjZtTjtyLdYIUccCj5kfbD2u4yZBXhM+d6MjRWDdLkr YoXE2R7OjvUKb/p6ffgpnU/yvDz3vVEtmhWcIsveS72JNpIBCgc3Wb8vuuLmViD5v3Rz nTHbel0yuJscMDdy4O79MW4ofS9nCvFk17649Sqo7HD7q/1t4+xpIQKdljZrUHYp1MHN kGaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=Scmrhih7hxA5GuetUV6f9gA0fz7HYsSC+1TAzsNPLsk=; fh=OGmWqbEmwvlUwGHBMlrXwaRDWnZv0+GUpXo2A8Uic9M=; b=cn+sZHcEBrz/7oQedq2450SfJb7XEEUajF4rnG3tBFpvMuF++r71frf5rXovTOvGDj XQS1SqOwriJpfOkL+GZsLAGfkTOxSw8Nn7l4MMnxOUOe+MHAoLteFcQI859HrP+B6hFQ /RMutj27vg0pffIXguCAYNVTydS6BcOALTB5TOtiKMUW+83CSWDgtszAHx8+5hLEg/ms NyFbx09Kt3aAf0yuhXrmsY69o5AVtWMxX6obdrOWf08NdK9f5JmR7jruoa4WfMaBJXfZ b4XiXqjye2gx/bG34+zYsBtD1GF2Hyn7G3Vfllr4HpA3rmSDLvXTWEXOnSZnZpuvm6tR xuEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cEOKkcE3; spf=pass (google.com: domain of linux-kernel+bounces-15378-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-15378-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id fb27-20020a056a002d9b00b006d9b56a8629si12647181pfb.45.2024.01.03.02.18.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jan 2024 02:18:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-15378-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cEOKkcE3; spf=pass (google.com: domain of linux-kernel+bounces-15378-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-15378-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 3E7CBB22DEC for ; Wed, 3 Jan 2024 10:18:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EAB6218AEB; Wed, 3 Jan 2024 10:17:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="cEOKkcE3" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AC4918656 for ; Wed, 3 Jan 2024 10:17:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-dbdb2433800so7027756276.3 for ; Wed, 03 Jan 2024 02:17:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1704277075; x=1704881875; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Scmrhih7hxA5GuetUV6f9gA0fz7HYsSC+1TAzsNPLsk=; b=cEOKkcE3JR/pRGpKa7F606jO9SVa371a340Sp/pDSrFKCRyiWvzr5POLJoafRC9Cie WNPh/PruI6GWYzFs5k8jRwbBIQqgDCHH58BbZS3G4Yqn0LBMUW4hkpZFeOpNwgSA9NUY 9EaYoQJQEFF34dBu9bHsnDDQMyL06VPEHSoekNfCrzmmHaN+TiqdWDaJstnMgk+cjGHw ZI3/6DWRsILfgrJ6krLmJM7/rUHpv/AY3E4QwhZqQfS1LpOTZwDGQRrKckYaO7MhjalZ vvdgqutDe8rm9INtXDfOk7neky7Hy0NMWjiEll2QOiXmPavesJI5+ENMq5jSHSxAX2hL r4FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704277075; x=1704881875; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Scmrhih7hxA5GuetUV6f9gA0fz7HYsSC+1TAzsNPLsk=; b=df33lUdEkrKaLcjyzTkNiWiH5Wn8lpxQYFF7Y0uf8IjiJb+wUEj+cWUJFde4XoSaiL 8nIYyzmNMJnSvKJLlSC7q0rdzYhYJuf4ltPd4gyRoEzTIFhLvyWVw4AbJWb+e5J2TIap lj7ltWgRHJTYwjyQmMYd+0bNesEY5GVPXOMsncS0dXZbEuUHRM1YYegORMzy/JGihUCH AB0sKdXTuhAuFRtmGH3e5QVp7Anasqgl7vn731JQX/vdtesyJyobWU4bxtqPuoAXVDiA OPA0fIvmygru4EP+u3VDK2Ou+VkQeuVLI3jKEqpcIvDT9eSueX/nqO4ZvLaLLFbqpxMi LWCA== X-Gm-Message-State: AOJu0YwoCcyH/qFsfXONZMPR+3QdatbsNoW0V8B2ZKTD/arkPGKHxOEE v9bEd3OvFnywYqnnepMzhPBdlVLDnjJAdwVOeYooruaeZvDkHg== X-Received: by 2002:a25:c7d1:0:b0:db5:4e80:54c9 with SMTP id w200-20020a25c7d1000000b00db54e8054c9mr9529450ybe.42.1704277075476; Wed, 03 Jan 2024 02:17:55 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <5754861.DvuYhMxLoT@kreacher> <6019796.lOV4Wx5bFT@kreacher> <13435856.uLZWGnKmhe@kreacher> In-Reply-To: From: Ulf Hansson Date: Wed, 3 Jan 2024 11:17:18 +0100 Message-ID: Subject: Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Greg KH , linux-pm@vger.kernel.org, Youngmin Nam , linux-kernel@vger.kernel.org, d7271.choe@samsung.com, janghyuck.kim@samsung.com, hyesoo.yu@samsung.com, Alan Stern Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2 Jan 2024 at 14:54, Rafael J. Wysocki wrote: > > On Tue, Jan 2, 2024 at 2:35=E2=80=AFPM Ulf Hansson wrote: > > > > On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki wro= te: > > > > > > From: Rafael J. Wysocki > > > > > > It is reported that in low-memory situations the system-wide resume c= ore > > > code deadlocks, because async_schedule_dev() executes its argument > > > function synchronously if it cannot allocate memory (an not only then= ) > > > and that function attempts to acquire a mutex that is already held. > > > > > > Address this by changing the code in question to use > > > async_schedule_dev_nocall() for scheduling the asynchronous > > > execution of device suspend and resume functions and to directly > > > run them synchronously if async_schedule_dev_nocall() returns false. > > > > > > Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_= dev()") > > > Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/ > > > Reported-by: Youngmin Nam > > > Signed-off-by: Rafael J. Wysocki > > > --- > > > > > > The commit pointed to by the Fixes: tag is the last one that modified > > > the code in question, even though the bug had been there already befo= re. > > > > > > Still, the fix will not apply to the code before that commit. > > > > An option could be to just do "Cc: stable@vger.kernel.org # v5.7+" > > instead of pointing to a commit with a Fixes tag. > > Right, but one can argue that every commit with a "Cc: stable" tag is > a fix, so it should carry a Fixes: tag too anyway. Yes, certainly. But in this case it's more questionable as it's not really fixing the commit it points out. Note that, I have no strong opinion here, but maybe Greg has a preferred wa= y? > > > > > > > --- > > > drivers/base/power/main.c | 148 +++++++++++++++++++++--------------= ----------- > > > 1 file changed, 68 insertions(+), 80 deletions(-) > > > > > > Index: linux-pm/drivers/base/power/main.c > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > --- linux-pm.orig/drivers/base/power/main.c > > > +++ linux-pm/drivers/base/power/main.c > > > @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d > > > } > > > > > > /** > > > - * device_resume_noirq - Execute a "noirq resume" callback for given= device. > > > + * __device_resume_noirq - Execute a "noirq resume" callback for giv= en device. > > > * @dev: Device to handle. > > > * @state: PM transition of the system being carried out. > > > * @async: If true, the device is being resumed asynchronously. > > > @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d > > > * The driver of @dev will not receive interrupts while this functio= n is being > > > * executed. > > > */ > > > -static int device_resume_noirq(struct device *dev, pm_message_t stat= e, bool async) > > > +static void __device_resume_noirq(struct device *dev, pm_message_t s= tate, bool async) > > > { > > > pm_callback_t callback =3D NULL; > > > const char *info =3D NULL; > > > @@ -655,7 +655,13 @@ Skip: > > > Out: > > > complete_all(&dev->power.completion); > > > TRACE_RESUME(error); > > > - return error; > > > + > > > + if (error) { > > > + suspend_stats.failed_resume_noirq++; > > > + dpm_save_failed_step(SUSPEND_RESUME_NOIRQ); > > > + dpm_save_failed_dev(dev_name(dev)); > > > + pm_dev_err(dev, state, async ? " async noirq" : " noi= rq", error); > > > + } > > > } > > > > > > static bool is_async(struct device *dev) > > > @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device * > > > { > > > reinit_completion(&dev->power.completion); > > > > > > - if (is_async(dev)) { > > > - get_device(dev); > > > - async_schedule_dev(func, dev); > > > + if (!is_async(dev)) > > > + return false; > > > + > > > + get_device(dev); > > > + > > > + if (async_schedule_dev_nocall(func, dev)) > > > return true; > > > - } > > > + > > > + put_device(dev); > > > > > > return false; > > > } > > > @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device * > > > static void async_resume_noirq(void *data, async_cookie_t cookie) > > > { > > > struct device *dev =3D data; > > > - int error; > > > - > > > - error =3D device_resume_noirq(dev, pm_transition, true); > > > - if (error) > > > - pm_dev_err(dev, pm_transition, " async", error); > > > > > > + __device_resume_noirq(dev, pm_transition, true); > > > put_device(dev); > > > } > > > > > > +static void device_resume_noirq(struct device *dev) > > > +{ > > > + if (dpm_async_fn(dev, async_resume_noirq)) > > > + return; > > > + > > > + __device_resume_noirq(dev, pm_transition, false); > > > +} > > > + > > > static void dpm_noirq_resume_devices(pm_message_t state) > > > { > > > struct device *dev; > > > @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_ > > > mutex_lock(&dpm_list_mtx); > > > pm_transition =3D state; > > > > > > - /* > > > - * Advanced the async threads upfront, > > > - * in case the starting of async threads is > > > - * delayed by non-async resuming devices. > > > - */ > > > - list_for_each_entry(dev, &dpm_noirq_list, power.entry) > > > - dpm_async_fn(dev, async_resume_noirq); > > > - > > > > If I understand correctly, this means that we are no longer going to > > run the async devices upfront, right? > > Right. > > > Depending on how devices get ordered in the dpm_noirq_list, it sounds > > like the above could have a negative impact on the total resume time!? > > It could, but it is unclear at this time whether or not it will. > > > Of course, if all devices would be async capable this wouldn't be a > > problem... > > Sure. > > So the existing behavior can be restored with the help of an > additional device flag, but I didn't decide to add such a flag just > yet. > > I'll probably do it in 6.9, unless the performance impact is serious > enough, in which case it can be added earlier. > > I still would prefer to get to a point at which the suspend and resume > paths are analogous (from the async POV) and that's what happens after > this patch, so I'd say that IMO it is better to address any > performance regressions on top of it. Fair enough! [...] Feel free to add my Reviewed-by for the series! Kind regards Uffe