Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1132318lqo; Fri, 17 May 2024 11:37:58 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWrJ0t/e+IugHZGHioZy4l6w4OpiOND1gs/b/x0QedfPgsCBuYqISnKpPAyhKMQ/rP4ViSjJm4ZkdbvBUOKbE20eRYlNXI23Np4MN4X1w== X-Google-Smtp-Source: AGHT+IFRdwV80HeWDqcPeKufq/Be4idp+lighSNhUUgStu5lkC4SCThhC+SfN7FhLLKISDimkf4I X-Received: by 2002:a17:90a:5318:b0:2a5:8ff:9d1 with SMTP id 98e67ed59e1d1-2b6cc4547efmr21921572a91.14.1715971078286; Fri, 17 May 2024 11:37:58 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715971078; cv=pass; d=google.com; s=arc-20160816; b=SSORLnVB4zI7n4mbX7X/6Wg8Xj84EUms7oE+Ppl/eXDaYX/VnCKSSsK2J8SrpyPeC8 mcE5CfwgWOsDaN6O4dUUYtjixjZOXQpD//3XZz7M/cBNqnSubOzWylj55JLNUXtt9QC9 ca+BEtXCfCNsk8N+rwWZ4Gb48eEVPhYVKLav0AUs5yB/kpl8z5e2UJTJHP1PJJcP0Qo4 xJ319MUa4vemEp4brZ0D8/udJCJq4uzP/puj+IBdOTdWwOCVeh84hF0PEyScu+WSwd9L i1/D9jFA4j6kSWmyUGBi0FmvhKLP7KCeQolINwLBTrxRLtyvQkmPs5mcgnPAqByIOxUy Acug== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from:dkim-signature; bh=id+Lc8XGHDZ93VIyb7cJZw1xCZche/nnvBR1h9KXRpE=; fh=4WbAqchmdfmQMJA/eEH4pUFZKRUIFG2hKiynall3wNY=; b=vOlXcQ9AWMl5v2gowR+lHS4RkfOZpxSxvDc7GMawjtmQEgS+/uj1d/F9StPJo271ib O8b1XABNA15bGawSs6E1jjl51RQ0CgLhHxuE7wvw72Z4orX3tdIO1YSLNCxGwrq7g4K/ p+Yb0CdtHnkjBZgx518Z43uQ4QPqrtCsDj/dQe5hw7BV94OfVlfYT6M7cZw8Nl3u+EIa /Y6zDu1Fcl1lS+LHV4P/Im1gnU+XVJft/iXY1qHozHnQUt5BqC038azp127cN46oxO2X OdfUxUs+ucLOpezTux0/ViC3TDjqW987TfjDUtt9xCuoLNEtbYs+E5PHEyzKd1Gah9p/ PICg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=OgBZylgN; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-182521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182521-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2b628868771si20023669a91.12.2024.05.17.11.37.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 11:37:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=OgBZylgN; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-182521-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182521-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id D7B14284280 for ; Fri, 17 May 2024 18:37:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B9EF413E046; Fri, 17 May 2024 18:37:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OgBZylgN" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3E9D433A6; Fri, 17 May 2024 18:37:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715971069; cv=none; b=iGaX0C5I/qXzFg6RRuyO7rRm7sJinKAeyOynYKvF58DHzmPE+dJ6hXmkvTMeaOrIFl1/2RlqMNaw+qEKBbyGqRXKg12EfP3dOvdQOaMG2BO1plyl+onAPcChVI9W4Dm+jF5/G9AtWuRp4ShyQxUtNFfsrSf3N4gszIX7q06NUzs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715971069; c=relaxed/simple; bh=cDl+GeQ2P3waQU2VJqEU9acfUxXsJOs+MhfK/QhHSyc=; h=From:To:Cc:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=f9xLbj2j81Z6UopdyWEzUKzb/B3MnyIxcgZn0IZoxSfTB6RIW9CTpHMXrb4ZOxQx/vT1G7yj4MuwJp4hEstwO+ZBHy8MfksY1kfYu0JecwpjZHWkHu6B2JMAcWe/H8MD8scLLnM7SGdD4jSXRNWEYaab5t273jbXY5xvyJ0zAzw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OgBZylgN; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8DDEC2BD10; Fri, 17 May 2024 18:37:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715971069; bh=cDl+GeQ2P3waQU2VJqEU9acfUxXsJOs+MhfK/QhHSyc=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=OgBZylgN9zqWkDZhPWI5AGWnFINa0FhFeM1ReY7pspp2aFMbHSdYte6a4q0Uty5x7 uZ20F6/dFcD5zgGiknU07ckIywVb82jG0Y4vXFtnq11+JlMYPi3tDue5Alk/9ZTfS4 my2ejW1MdeoxqIVQYMZTfIUCBUqUL0YlVIl6v6RTgDpc5vPjxbojUXk0yUu1kRbhFr qsYV/Z4iBtcjSpwS1zpjhtBaMOsQZ9zWe00Af1TSvV4slW4ZuwGgI8D9o21PJATFD4 sB/ar8sL/Lp+/kCuMLH/1Wci3pQpyv01CC6ZYknCXCoPhOBT7bxFKtKtdcM/eECxUp 4bZBEtphTVVbA== From: Kalle Valo To: Dave Hansen Cc: Borislav Petkov , Pawan Gupta , Thomas Gleixner , Ingo Molnar , Dave Hansen , "Rafael J. Wysocki" , x86@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, regressions@lists.linux.dev, Jeff Johnson Subject: Re: [regression] suspend stress test stalls within 30 minutes References: <87o79cjjik.fsf@kernel.org> <20240511184847.GCZj-9j2sh1Akpt9iS@fat_crate.local> <20240511184945.GDZj-9yaOEWqf1ng8u@fat_crate.local> <87h6f4jdrq.fsf@kernel.org> <878r0djxgc.fsf@kernel.org> <874jb0jzx5.fsf@kernel.org> <20240514160555.GCZkOL41oB3hBt45eO@fat_crate.local> <87msoofjg1.fsf@kernel.org> <35086bb6-ee11-4ac6-b8ba-5fab20065b54@intel.com> Date: Fri, 17 May 2024 21:37:44 +0300 In-Reply-To: <35086bb6-ee11-4ac6-b8ba-5fab20065b54@intel.com> (Dave Hansen's message of "Fri, 17 May 2024 10:22:32 -0700") Message-ID: <871q60ffnr.fsf@kernel.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Dave Hansen writes: > On 5/17/24 10:15, Kalle Valo wrote: >> Borislav Petkov writes: >>> There might be some #GP or so in the logs in case we've managed to f*ck >>> up microcode application which emulates that IBRS MSR bit and the >>> actual toggling or so when suspending... >> So the weird part is that when the bug happens (ie. suspend stalls) I >> can access the box normally using ssh and I don't see anything special >> in dmesg. Below is a full copy of dmesg output after the suspend >> stalled. Do note that I copied this dmesg before I updated microcode so >> it will still show the old microcode version. >> >> Let me know if you need more info. > > Kalle, could you remind us what we're seeing here? Does this show 30 > working rtcwake tests followed by a failure at "rtcwake test 31" where > the system failed to suspend? Correct. So basically what I do is that I start the nuc box, ssh into it and run: sudo su for i in {1..400}; do echo "rtcwake test $i" > /dev/kmsg; rtcwake -m mem -s 10; sleep 10; done Here's the start of first loop: [ 54.945105] rtcwake test 1 [ 55.162603] PM: suspend entry (deep) [ 55.168875] Filesystems sync: 0.006 seconds [ 55.182427] Freezing user space processes [ 55.191498] Freezing user space processes completed (elapsed 0.008 seconds) [ 55.191711] OOM killer disabled. [ 55.191805] Freezing remaining freezable tasks [ 55.193507] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 55.194056] printk: Suspending console(s) (use no_console_suspend to debug) [ 55.244962] e1000e: EEE TX LPI TIMER: 00000011 Now I leave the box to run it's test. I come back later to see that the for loop has stalled and the box is not going into suspend gain. I ssh into the machine and see this in dmesg: [ 449.061525] rtcwake test 31 [ 449.176854] PM: suspend entry (deep) [ 449.179072] Filesystems sync: 0.002 seconds [ 632.961545] loop0: detected capacity change from 0 to 8 [ 637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left [ 738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left [ 932.951032] loop0: detected capacity change from 0 to 8 [ 1232.962610] loop0: detected capacity change from 0 to 8 The system tried go into suspend but after the "Filesystems sync:" message nothing happened for 10 minutes. I assume loop0 messages are from some Ubuntu daemon, maybe snapd or similar. I have seen them always, they are not specific to this issue. And now comes the really strange part: if I run 'shutdown -h now' the suspend continues apparently normally. Afterwards I checked from /var/log/kern.log and didn't see any errors: May 17 13:34:38 nuc2 kernel: [ 449.176854] PM: suspend entry (deep) May 17 13:34:38 nuc2 kernel: [ 449.179072] Filesystems sync: 0.002 seconds May 17 13:37:42 nuc2 kernel: [ 632.961545] loop0: detected capacity change from 0 to 8 May 17 13:37:46 nuc2 kernel: [ 637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left May 17 13:39:28 nuc2 kernel: [ 738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left May 17 13:42:42 nuc2 kernel: [ 932.951032] loop0: detected capacity change from 0 to 8 May 17 13:47:42 nuc2 kernel: [ 1232.962610] loop0: detected capacity change from 0 to 8 May 17 13:52:45 nuc2 kernel: [ 1527.307800] Freezing user space processes May 17 13:52:45 nuc2 kernel: [ 1527.334585] Freezing user space processes completed (elapsed 0.024 seconds) May 17 13:52:45 nuc2 kernel: [ 1527.336094] OOM killer disabled. May 17 13:52:45 nuc2 kernel: [ 1527.337562] Freezing remaining freezable tasks May 17 13:52:45 nuc2 kernel: [ 1527.340324] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) May 17 13:52:45 nuc2 kernel: [ 1527.342596] printk: Suspending console(s) (use no_console_suspend to debug) May 17 13:52:45 nuc2 kernel: [ 1527.380121] e1000e: EEE TX LPI TIMER: 00000011 May 17 13:52:45 nuc2 kernel: [ 1527.474981] ACPI: EC: interrupt blocked May 17 13:52:45 nuc2 kernel: [ 1527.540696] ACPI: PM: Preparing to enter system sleep state S3 May 17 13:52:45 nuc2 kernel: [ 1527.567302] ACPI: EC: event blocked May 17 13:52:45 nuc2 kernel: [ 1527.567307] ACPI: EC: EC stopped May 17 13:52:45 nuc2 kernel: [ 1527.567311] ACPI: PM: Saving platform NVS memory May 17 13:52:45 nuc2 kernel: [ 1527.567412] Disabling non-boot CPUs ... While writing this email I found another way to continue the suspend after a stall: terminate rtcwake with CTRL-C in the ssh session running the for loop. That explains why 'sudo shutdown -h now' makes the suspend go forward, it most likely kills the stalled rtcwake process.