Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp135387iob; Tue, 17 May 2022 21:26:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzn09IGSrK8YKLEXxSZ6ESxzIpHB4HAOZvPpfMINvjJUtYwdLbX3QyivJs5bIcShj7Lkagw X-Received: by 2002:a63:ad4e:0:b0:3f5:cfa2:571d with SMTP id y14-20020a63ad4e000000b003f5cfa2571dmr4440100pgo.121.1652848004366; Tue, 17 May 2022 21:26:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652848004; cv=none; d=google.com; s=arc-20160816; b=vnamPaoJzRC/xvhZs304p0mRo7mYQ739a2QctnLYWBamlHe+FMvlKqktxe436JwWlf MnoS3o10bLnxMC6GNvfrRueBUmtPaKM5SocJpa+x7Rieop/rmrmyuqfVAegDHWVXciYt PrXPXfARoyKjLSfussNhc9VOQKQZlkd/qJD1JKIHgK2dh3xSa/6kojEzF8eFroD+pNNZ NfdIkiaCBZ+KfSamKLycdvjk4XDfpwJS5gCM+nz3sG8O5u3Nmwq8FJvSB6dJF6VxayUe P2k++Y9cPNT3o22A74UtAoro6xScPq6/elC34K0z9Hg2ZKQQVeX5Qj3rj0m18f9Cgu8h q4oQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=QGPIS3p0tWsSXcV5cfysBPCoWfQJFt5F5/zMc/X+4/o=; b=OGnh9MGr6nJ+mo7xCGSLQ7rBS6EqjrMWBBa3q6X6WagbZoqmHtyx4GgoG4HxkUzWvo p7x2K1JyPykncpKmzY6IWjoyBWFHQ11DFiwEwT3Vn4WfkWJ8y9s4ziE4AbFZ+Eyo4QqY gK62qkZFPCfLmzPc2aBuI/AhHii1VxRgy//NrebLrLjkGxxXtoY1wJRlsgPTpqFinz2f Y2ZV+j9ZwnMr9+S5cRpLoMu+V7Pz12Ynf+UVIk2JLjMHvQEvC3Ncs2kh6VlP7kUa0zPy 9EK+3d6FP48xb6l0xoJ2hzEwfYz47E9jhT5KFwrk0wLsj3UkzPkvgmXcQoE/jxEuBC1I sDJA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id z7-20020a63d007000000b003f288908c1esi1160058pgf.648.2022.05.17.21.26.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 21:26:44 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 59DC46FD36; Tue, 17 May 2022 20:50:10 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349697AbiEQPLT (ORCPT + 70 others); Tue, 17 May 2022 11:11:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230121AbiEQPLS (ORCPT ); Tue, 17 May 2022 11:11:18 -0400 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9748C3BF83; Tue, 17 May 2022 08:11:17 -0700 (PDT) Received: by mail-il1-f171.google.com with SMTP id o16so4316359ilq.8; Tue, 17 May 2022 08:11:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QGPIS3p0tWsSXcV5cfysBPCoWfQJFt5F5/zMc/X+4/o=; b=qCPylro6LUZ/DLS1aKo/X4Dg0CbYbgYkwmCsJkssMk6CEppAJ7QhuR/GhKxvcfOHkz 23ZOiW4ZhmxugJmad68ijjG0bM3eKBOyIEPQOWBadqYXq2+ID9L4Fv/CvqQe3feP0FuW O7RM2oBF6HEG9x9xtIQH4nsbtuPPIh7CQTya3TgSDK6k2+ZzBIrgDqCyE74hA97/rMOK he4LStNSfvpagEjDZ5I7XnrUDfSgRu+fVK2qBK3VZl7QLgvo5RF1BAD0nCE96mEArAY2 eZKez27GFXLdq5+Cfunlp66jcY+ZFNAjFSY0MY6sBfUoiWo522o7wQnKacKE8dXL7pyO HEeA== X-Gm-Message-State: AOAM532bl7B3NiP1mzaizbVDmrWon4NvqnSeyh9nksoMmSZAAniHbzoj 4sPDXO//Ahn/tOt7aUoR3G5SYCuHd+qH0ptLb08= X-Received: by 2002:a05:6e02:1568:b0:2cf:6711:c3c6 with SMTP id k8-20020a056e02156800b002cf6711c3c6mr12690956ilu.59.1652800276946; Tue, 17 May 2022 08:11:16 -0700 (PDT) MIME-Version: 1.0 References: <20220505015814.3727692-1-rui.zhang@intel.com> In-Reply-To: <20220505015814.3727692-1-rui.zhang@intel.com> From: "Rafael J. Wysocki" Date: Tue, 17 May 2022 17:11:05 +0200 Message-ID: Subject: Re: [PATCH 0/7] PM: Solution for S0ix failure caused by PCH overheating To: Zhang Rui Cc: "Rafael J. Wysocki" , kvalo@kernel.org, Alexandre Belloni , Linux PM , ACPI Devel Maling List , linux-rtc@vger.kernel.org, "open list:NETWORKING DRIVERS (WIRELESS)" , Daniel Lezcano , merez@codeaurora.org, mat.jonczyk@o2.pl, Sumeet Pawnikar , Len Brown Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Thu, May 5, 2022 at 3:58 AM Zhang Rui wrote: > > On some Intel client platforms like SKL/KBL/CNL/CML, there is a > PCH thermal sensor that monitors the PCH temperature and blocks the system > from entering S0ix in case it overheats. > > Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH > temperature above threshold") introduces a delay loop to cool the > temperature down for this purpose. > > However, in practice, we found that the time it takes to cool the PCH down > below threshold highly depends on the initial PCH temperature when the > delay starts, as well as the ambient temperature. > > For example, on a Dell XPS 9360 laptop, the problem can be triggered > 1. when it is suspended with heavy workload running. > or > 2. when it is moved from New Hampshire to Florida. > > In these cases, the 1 second delay is not sufficient. As a result, the > system stays in a shallower power state like PCx instead of S0ix, and > drains the battery power, without user' notice. > > In this patch series, we first fix the problem in patch 1/7 ~ 3/7, by > 1. expand the default overall cooling delay timeout to 60 seconds. > 2. make sure the temperature is below threshold rather than equal to it. > 3. move the delay to .suspend_noirq phase instead, in order to > a) do the cooling when the system is in a more quiescent state > b) be aware of wakeup events during the long delay, because some wakeup > events (ACPI Power button Press, USB mouse, etc) become valid only > in .suspend_noirq phase and later. > > However, this potential long delay introduces a problem to our suspend > stress automation test, because the delay makes it hard to predict how > much time it takes to suspend the system. > As we want to do as much suspend iterations as possible in limited time, > setting a 60+ seconds rtc alarm for suspend which usually takes shorter > than 1 second is far beyond overkill. > > Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which cancels > the armed rtc alarm in the beginning of suspend and then rearm the rtc > alarm with a short interval (say, 2 second) right before system suspended. > > By running > # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec > before suspend, the system can be resumed by RTC alarm right after it is > suspended, no matter how much time the suspend really takes. > > This patch series has been tested on the same Dell XPS 9360 laptop and > S0ix is 100% achieved across 1000+ s2idle iterations. Overall, the first three patches in the series can go in without the rest, so let's put them into a separate series. Patch [4/7] doesn't depend on the first three ones, so it can go in by itself. Patch [5/7] is to be dropped anyway as per the earlier discussion. Patch [6/7] is only needed to apply patch [7/7] which is controversial. I think that we can drop or defer patches [6-7/7] for now.