Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1245407pxb; Wed, 3 Nov 2021 21:45:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyBcoQtkbxmuujsnrglZ+MHpLCa3sAjzolT0Ffqi86uoculiBKpaPBfHg2IVjAnlG9D3sEa X-Received: by 2002:a17:906:4f8c:: with SMTP id o12mr58977087eju.115.1636001112131; Wed, 03 Nov 2021 21:45:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636001112; cv=none; d=google.com; s=arc-20160816; b=xYYVItOmBDqgHLB/XgOtzbEKW302cfPAEy5K9T9kVcswjfeWogiuTgBqYouHHDD0Zr 0Y1WFbsCr4+qSFW+5bLMX9wF2RuZj3fmUw5VCOdA9XkvnpZjyfzvXlAJqggGL3kA22Ar vtHnkrG0Zi3hnuU31w+Np6VWhBRSkAyS6ac/B4Q0skcTFLoTHjoYGPaFB3X9/pm9rv1c kbni8vhL/fu+oXE+dIcEBLnIzTP6hNdXG21MFZPvdmDMECL54t8jbeq0r9qpyLrb+XsC PfV/LL9uA8s/V1ErAvLKsjuB3EyaVYK4n/XKsm231oXimzUGXU53xYRNWIrXbLld1P1D uDTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=j2ACuHRB4fV/3oEvvpdvuhpDflqn61TXJzQw7nnaLfA=; b=lILx6P/W//yR9NZ8BB8sgaGedVD6WISYGajeitvmtE7K/5Fx54166O3GK0na59NKoo kKgeWvuBcBlz+6VW+hieoF9aX4SJp3w/pEe/RB7qk62ANpXc0b9KqxOE0WytS/BQR8cg LncNu04tXlSg5iICQtwW8hrdYwqlDKSnjgoRJ/kWo2VQpAybrgOc2NMDz4uxLXAC7YSd Qok7wZB/+pNx6bOthdlZ/uafTFIwgf+MRO6LKHImBSgux2q4daajcIll7BtXfOzDyHiH laEHe9v9ZNt03eHdC3Gv1ZhwRpHYBdGonepTvzydiZy0NuW55PPfqGzSN7h6x24dXxFa 3hUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ht15si914287ejc.725.2021.11.03.21.44.47; Wed, 03 Nov 2021 21:45:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229871AbhKDEpm (ORCPT + 99 others); Thu, 4 Nov 2021 00:45:42 -0400 Received: from mga05.intel.com ([192.55.52.43]:51231 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbhKDEpl (ORCPT ); Thu, 4 Nov 2021 00:45:41 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10157"; a="317850208" X-IronPort-AV: E=Sophos;i="5.87,207,1631602800"; d="scan'208";a="317850208" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2021 21:43:03 -0700 X-IronPort-AV: E=Sophos;i="5.87,207,1631602800"; d="scan'208";a="667765567" Received: from rabentle-desk1.amr.corp.intel.com ([10.255.230.52]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2021 21:42:59 -0700 Message-ID: Subject: Re: [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification) From: Srinivas Pandruvada To: Linus Torvalds , Steven Rostedt Cc: LKML , Peter Zijlstra , pablomh@gmail.com, "Rafael J. Wysocki" Date: Wed, 03 Nov 2021 21:42:54 -0700 In-Reply-To: References: <20211103163340.435ba5b1@gandalf.local.home> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.0-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2021-11-03 at 14:58 -0700, Linus Torvalds wrote: > On Wed, Nov 3, 2021 at 1:33 PM Steven Rostedt > wrote: > > > > The RIP of cpuidle_enter_state+0xd6/0x3a0 gives me in gdb: > > That's not actually the error address - it's not an oops. That's just > the register data at the interrupt entry. > > The error was that triggered this was that > >     unchecked MSR access error: >         WRMSR to 0x773 (tried to write 0x0000000000000000) >         at rIP: 0xffffffffab06e824 (native_write_msr+0x4/0x20) > MSR 773 access must be protected by a CPUID flag. Sorry I missed this check during offline path. diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 349ddbaef796..4184073be2c4 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1621,7 +1621,8 @@ static void intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata) unsigned long flags; /* wrmsrl_on_cpu has to be outside spinlock as this can result in IPC */ - wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00); + if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY)) + wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00); spin_lock_irqsave(&hwp_notify_lock, flags); if (cpumask_test_and_clear_cpu(cpudata->cpu, &hwp_intr_enable_mask)) I will submit a patch to fix this. Thanks, Srinivas > and then it prints the stack frame - because we shouldn't be doing > random MSR writes. > > And part of printing the stack frame is printing the register state > at > kernel entries when switching to an interrupt or exception stack. > > So that RIP you point to is not really interesting, and it looks like > it's an entirely random harmless instruction, it's just that it is > preceded by that > >         sti >         nopw > > so it's not surprising that it's an interrupt point, because it's > exactly one instruction after interrupts were enabled on that CPU > (the > one instruction being the sti shadow that itself doesn't take > interrupts). > > So cpuidle_enter_state() itself is entirely harmless, and you see > that > in the call chain presumably simply because the CPU was idle when the > interrupt happened. Again, that's a very reasonable place for an > interrupt to happen. > > End result: all of this is simply due to an unexpected MSR write, and > the real issue is that interrupt from an IPI that caused the WRMSR > write on another CPU: > >     __wrmsr_on_cpu+0x33/0x40 >     flush_smp_call_function_queue+0x122/0x1a0 >     __sysvec_call_function_single+0x29/0xc0 >     sysvec_call_function_single+0x9d/0xd0 > > and the rest of the call chain is just noise from "whatever that CPU > happened to do when it got the IPI". > > You don't get the call chain of whatever actually did the IPI, I'm > afraid. That's the actual _cause_ of all this, but that happens on > another CPU entirely. > > > I bisected it down to: > > > >  57577c996d73 ("cpufreq: intel_pstate: Process HWP Guaranteed > > change notification") > > > > Removing the commit makes the issue go away. Adding it back brings > > it back' > > That commit does various odd things. > > The code seems to depend on the 'hwp_notify_lock' spinlock, but then > it does some work outside that locked region (exactly because that > work triggers an interrupt that needs that lock): > >       wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00); > > and that's exactly what will then warn because that MSR hasn't been > enabled on that CPU. > > But it also does randomly do some READ_ONCE/WRITE_ONCE magic that all > seems entirely bogus. Those values should be protected by the lock. > > The way it cancels delayed work is also very questionable: it does > >         spin_lock_irqsave(&hwp_notify_lock, flags); >         if (cpumask_test_and_clear_cpu(cpudata->cpu, > &hwp_intr_enable_mask)) >                 cancel_delayed_work(&cpudata->hwp_notify_work); >         spin_unlock_irqrestore(&hwp_notify_lock, flags); > > but if the delayed work was already started on that cpu, then it will > continue running concurrently, even if the caller now thinks it's all > disabled. > > You need the "sync" version to make sure the work is also waited on, > but you can't do that under the spinlock that the work in question > may > need, of course. > > > So I'm guessing that this commit calls some code that does not > > expect to be > > called on powering off, at least in debug mode, and is triggering > > this > > stack dump. Probably because there was some static call or > > something that > > isn't shutdown properly. > > It actually looks more direct than that to me: the code literally > seems to do that wrmsrl_on_cpu() that triggers the IPI that actually > causes the unchecked MSR write. > > I don't know this hardware, and I don't know that code, but there are > certainly a few red flags here. > > Srinivas? > >               Linus