Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp134585iob; Tue, 17 May 2022 21:24:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzhNAXSt85Lif+jTiJnor1ELWCYdK8n49nDDHKtuSJ1yppkywHy17uGkcnIleE0LngMObHP X-Received: by 2002:a17:902:b58f:b0:15e:b2f4:b75 with SMTP id a15-20020a170902b58f00b0015eb2f40b75mr25977139pls.25.1652847894960; Tue, 17 May 2022 21:24:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652847894; cv=none; d=google.com; s=arc-20160816; b=W1evcWcgAtP8nDLuzAeV2xz4ZyxfoveKkPU3ky6WDZy6dweHL71WAWnxqm1BKdkz+3 6oGB/Tk2CgcPcHUUp1Kl6H/wE5+GQTBLA01HwFA1+Sgg4+cgZEHBu0bvGTjIVmYz+wGu s5H8REQt69Jk9a95VwcIiAQByWeuLExL73pXQ6He3yOk8G3QXjxFmqwCZD5kkG2JSGKn SZ80c76HjhCGvWASLKxITFCgZMqawWI1DuhN56HC9RCO6H9hCrRq4SVht0nubNjeXasZ a9hTw47KL3UO+w3m3sO1UTtHYcOqioNZeVQ25BOVyyogCsYBKY+sBXFJ1WRbduvOhvn3 LJRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=Hy9Fd7X4zWlZDVMRHmrsAnJCqrq0FuPgwnYlhjRMc+M=; b=XDJnNvA2Gp/Nfx81pFffiFDEGRVBDdZQZKEQ1X11qqeYetCyxnR5NVgH+Zwe3eR8pi XfTAHYYs3w++m9Ku+p75O1JQ9iSTz9CqrzNLiO4pRveeH2a3Rp4oLp9qyoC//fZEkscm v3DFQWYa1y0yj6ZKHArFOUSXFyziLgyom1UZ5iPDeeh4gcVd4y5XsJZCXm7Sd9fn0EaB pEV5V92wJJznRrFwUIvCDMIb6uErl6fPEU/f3lFWrebL1ur1yvK0uxislUuPz2oZqOGp depA3f3v0rdxpWV9xAFunKu7RbpV+aJKn7Cb1hMwt0jzF3DRXVGTLnH/Oa3DyMHctm+a NhwA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id e2-20020a656bc2000000b003db27cae6b4si1358549pgw.430.2022.05.17.21.24.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 21:24:54 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 78C884551B; Tue, 17 May 2022 20:48:50 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349533AbiEQPC3 (ORCPT + 70 others); Tue, 17 May 2022 11:02:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243507AbiEQPC1 (ORCPT ); Tue, 17 May 2022 11:02:27 -0400 Received: from mail-io1-f41.google.com (mail-io1-f41.google.com [209.85.166.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEB886410; Tue, 17 May 2022 08:02:25 -0700 (PDT) Received: by mail-io1-f41.google.com with SMTP id q203so3706208iod.0; Tue, 17 May 2022 08:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Hy9Fd7X4zWlZDVMRHmrsAnJCqrq0FuPgwnYlhjRMc+M=; b=h9kzDs1ENar/YlTwbHiTk4WwcfDyyY240uuoEgCYeCr8eoS6T7zDui13pa85nrKXa9 +B0AkXQD7WRf3mZCQnHb39HAVf1/4puk6itnRgX3gGn/GTOrMMK4c4ezlda22eciz4R4 GcaI5FqFGg+uf8DA2t83RXaLpJcmsGXiwPGUZFchEmRscr494VTj+yUtSXOJlBD9+eZO B720SCmoxIV7ETyqwzOfGsjssUeHn3iZ86uReJxo1qekMToIetVn8hnBe/mPUHzLjmNq bMwG7r5ohbouu1qSnVuqxLDaebioVB/Y6wZLQo5yDRbcXM9LEDb37v3+HJ6R9L0pghvI yd5A== X-Gm-Message-State: AOAM530/9NOuEfyM4A8pG/2FQXjM98E1yUp4TvCPjT005EgIHL0El7vE PT0eoPAYIYIgd++0h2lLaP7XfVawG4bDGWE2cXjzNhg+ X-Received: by 2002:a05:6602:2dc4:b0:648:adac:bae8 with SMTP id l4-20020a0566022dc400b00648adacbae8mr10802774iow.9.1652799745301; Tue, 17 May 2022 08:02:25 -0700 (PDT) MIME-Version: 1.0 References: <20220505015814.3727692-1-rui.zhang@intel.com> <20220505015814.3727692-3-rui.zhang@intel.com> In-Reply-To: <20220505015814.3727692-3-rui.zhang@intel.com> From: "Rafael J. Wysocki" Date: Tue, 17 May 2022 17:02:14 +0200 Message-ID: Subject: Re: [PATCH 2/7] thermal: intel: pch: enhance overheat handling To: Zhang Rui Cc: "Rafael J. Wysocki" , kvalo@kernel.org, Alexandre Belloni , Linux PM , ACPI Devel Maling List , linux-rtc@vger.kernel.org, "open list:NETWORKING DRIVERS (WIRELESS)" , Daniel Lezcano , merez@codeaurora.org, mat.jonczyk@o2.pl, Sumeet Pawnikar , Len Brown Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Thu, May 5, 2022 at 3:58 AM Zhang Rui wrote: > > Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH > temperature above threshold") introduces delay loop mechanism that allows > PCH temperature to go down below threshold during suspend so it won't > block S0ix. And the default overall delay timeout is 1 second. > > However, in practice, we found that the time it takes to cool the PCH down > below threshold highly depends on the initial PCH temperature when the > delay starts, as well as the ambient temperature. > And in some cases, the 1 second delay is not sufficient. As a result, the > system stays in a shallower power state like PCx instead of S0ix, and > drains the battery power, without user' notice. > > To make sure S0ix is not blocked by the PCH overheating, we > 1. expand the default overall timeout to 60 seconds. > 2. make sure the temperature is below threshold rather than equal to it. > 3. move the delay to .suspend_noirq phase instead, in order to > a) do cooling delay with a more quiescent system > b) be aware of wakeup events during the long delay, because some wakeup > events (ACPI Power button Press, USB mouse, etc) become valid only > in .suspend_noirq phase and later. > > This may introduce longer suspend time, but only in the cases when the > system overheats and Linux used to enter a shallower S2idle state, say, > PCx instead of S0ix. > > Signed-off-by: Zhang Rui > Tested-by: Sumeet Pawnikar > --- > drivers/thermal/intel/intel_pch_thermal.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/drivers/thermal/intel/intel_pch_thermal.c b/drivers/thermal/intel/intel_pch_thermal.c > index 527c91f5960b..b7b32e2f5ae2 100644 > --- a/drivers/thermal/intel/intel_pch_thermal.c > +++ b/drivers/thermal/intel/intel_pch_thermal.c > @@ -70,8 +70,8 @@ static unsigned int delay_timeout = 100; > module_param(delay_timeout, int, 0644); > MODULE_PARM_DESC(delay_timeout, "amount of time delay for each iteration."); > > -/* Number of iterations for cooling delay, 10 counts by default for now */ > -static unsigned int delay_cnt = 10; > +/* Number of iterations for cooling delay, 600 counts by default for now */ > +static unsigned int delay_cnt = 600; > module_param(delay_cnt, int, 0644); > MODULE_PARM_DESC(delay_cnt, "total number of iterations for time delay."); > > @@ -193,10 +193,11 @@ static int pch_wpt_get_temp(struct pch_thermal_device *ptd, int *temp) > return 0; > } > > +/* Cool the PCH when it's overheat in .suspend_noirq phase */ > static int pch_wpt_suspend(struct pch_thermal_device *ptd) > { > u8 tsel; > - u8 pch_delay_cnt = 1; > + int pch_delay_cnt = 1; > u16 pch_thr_temp, pch_cur_temp; > > /* Shutdown the thermal sensor if it is not enabled by BIOS */ > @@ -233,7 +234,10 @@ static int pch_wpt_suspend(struct pch_thermal_device *ptd) > * which helps to indentify the reason why S0ix entry was rejected. > */ > while (pch_delay_cnt <= delay_cnt) { > - if (pch_cur_temp <= pch_thr_temp) > + if (pch_cur_temp < pch_thr_temp) > + break; > + > + if (pm_wakeup_pending()) > break; > > dev_warn(&ptd->pdev->dev, > @@ -245,7 +249,7 @@ static int pch_wpt_suspend(struct pch_thermal_device *ptd) > pch_delay_cnt++; > } > > - if (pch_cur_temp > pch_thr_temp) > + if (pch_cur_temp >= pch_thr_temp) > dev_warn(&ptd->pdev->dev, > "CPU-PCH is hot [%dC] even after delay, continue to suspend. S0ix might fail\n", > pch_cur_temp); > @@ -455,7 +459,7 @@ static void intel_pch_thermal_remove(struct pci_dev *pdev) > pci_disable_device(pdev); > } > > -static int intel_pch_thermal_suspend(struct device *device) > +static int intel_pch_thermal_suspend_noirq(struct device *device) > { > struct pch_thermal_device *ptd = dev_get_drvdata(device); > > @@ -495,7 +499,7 @@ static const struct pci_device_id intel_pch_thermal_id[] = { > MODULE_DEVICE_TABLE(pci, intel_pch_thermal_id); > > static const struct dev_pm_ops intel_pch_pm_ops = { > - .suspend = intel_pch_thermal_suspend, > + .suspend_noirq = intel_pch_thermal_suspend_noirq, IMO it would be better to put this change into a separate patch and reorder the other changes after this one. It is valid by itself. > .resume = intel_pch_thermal_resume, > }; > > --