Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp3058662pxb; Mon, 6 Sep 2021 11:18:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzmDH+oEem3cy4n0RnZrjaLWGatgtn77OOhU/35nOgxDpi1DKcuIOc4adVLSZz1SO8ywNc9 X-Received: by 2002:a17:906:6b0c:: with SMTP id q12mr15204146ejr.0.1630952287468; Mon, 06 Sep 2021 11:18:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630952287; cv=none; d=google.com; s=arc-20160816; b=dmv3vkfPBZiV0xvFOtHPNqXcEesTApaKdE+OIlom6cX38mVeWAUwfrHKawhdcpyGXA grbFn7TQOJ+hj6C92FT0D0qOuSim8/LDOPLBdJnytNhpKfRgOZdzLi55Bw7QFJ45h2Sv Wka5TiWo5cSVC3eI/t8JuBrm9SevLjDvT3Z+cKvbUMdBRjW4sTNuKh3bx4JUB0jdXREh VY6tXesf3dE4OGgMSObaVHk5QGS1Ks9g8v0WzaQ5EdcQBg/N/FkmU+AenGs3MDkC7gaj pj2uPVnKBKYx8ranw8PxHODSvj7m7Xjn8D1pjqgxG2XrfoVc9X/qAEJVpSQ4KxnPoDU3 l2jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=amFNfssAymWSkPqgU9rq0pzCSe7p1VtyBgtxzhEgIE0=; b=WdRi6rpflSXAuenAY0X3lvnwU07EunT+CD+BIAZ8r8PgSHLqXOxB6b/QxbmsFhuEMy uNxwhfSCi8jAHa8pp2Usu2IoaiYt6PBRMF3QqHL1ivvX7sDLJriLeMAaRDU2q0B3oE+B /RRUfSxXtXKmb0OuaDbYDoLlLuEuQWYIW3Sqgu8nXTWUWduMDpkVQUD9fWcqLT/m7zRy pAV35i9ESm9ngO+Gl4NoujSXk8LlyOeu9a4Jyr7l3yCnbi4rTLGh0KmFxFD0WwwpiZXS On9e8+MKBVDMqYW4R+w8E8g9b33XNkbDO09oevJpZ3og0wC2R3lBaddEMsjutwTE0ytU RxMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v9si8870802ejk.496.2021.09.06.11.17.43; Mon, 06 Sep 2021 11:18:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243640AbhIFSPT (ORCPT + 99 others); Mon, 6 Sep 2021 14:15:19 -0400 Received: from mga04.intel.com ([192.55.52.120]:38840 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230007AbhIFSPS (ORCPT ); Mon, 6 Sep 2021 14:15:18 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10099"; a="218159157" X-IronPort-AV: E=Sophos;i="5.85,272,1624345200"; d="scan'208";a="218159157" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Sep 2021 11:14:12 -0700 X-IronPort-AV: E=Sophos;i="5.85,272,1624345200"; d="scan'208";a="694402799" Received: from ibelagox-mobl1.gar.corp.intel.com ([10.213.76.130]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Sep 2021 11:14:09 -0700 Message-ID: <584a4fad09048e3ea0dbc3515b2e909b745177dd.camel@linux.intel.com> Subject: Re: [PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before driver is ready From: Srinivas Pandruvada To: "Rafael J. Wysocki" Cc: Jens Axboe , "Rafael J. Wysocki" , Len Brown , Viresh Kumar , Linux PM , Linux Kernel Mailing List Date: Mon, 06 Sep 2021 11:14:05 -0700 In-Reply-To: References: <20210904053703.581297-1-srinivas.pandruvada@linux.intel.com> <926ac4b9-1bb5-e96e-ded3-6461f7a215b7@kernel.dk> <8dc57921f157b154e4af2dba26ce697dc4d4fcc2.camel@linux.intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.0-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2021-09-06 at 19:54 +0200, Rafael J. Wysocki wrote: > On Mon, Sep 6, 2021 at 7:23 PM Srinivas Pandruvada > wrote: > > > > On Mon, 2021-09-06 at 18:58 +0200, Rafael J. Wysocki wrote: > > > On Mon, Sep 6, 2021 at 6:55 PM Srinivas Pandruvada > > > wrote: > > > > > > > > On Mon, 2021-09-06 at 10:43 -0600, Jens Axboe wrote: > > > > > On 9/6/21 10:17 AM, Rafael J. Wysocki wrote: > > > > > > On Sat, Sep 4, 2021 at 7:37 AM Srinivas Pandruvada > > > > > > wrote: > > > > > > > > > > > > > > In Lenovo X1 gen9 laptop, HWP interrupts arrive before > > > > > > > driver > > > > > > > is > > > > > > > ready > > > > > > > to handle on that CPU. Basically didn't even allocated > > > > > > > memory > > > > > > > for > > > > > > > per > > > > > > > cpu data structure and not even started interrupt enable > > > > > > > process > > > > > > > on that > > > > > > > CPU. So interrupt handler observes a NULL pointer to > > > > > > > schedule > > > > > > > work. > > > > > > > > > > > > > > This interrupt was probably for SMM, but since it is > > > > > > > redirected > > > > > > > to > > > > > > > OS by OSC call, OS receives it, but not ready to handle. > > > > > > > That > > > > > > > redirection > > > > > > > of interrupt to OS was also done to solve one SMM crash on > > > > > > > Yoga > > > > > > > 260 for > > > > > > > HWP interrupt a while back. > > > > > > > > > > > > > > To solve this the HWP interrupt handler should ignore such > > > > > > > request if the > > > > > > > driver is not ready. This will require some flag to wait > > > > > > > till > > > > > > > the > > > > > > > driver > > > > > > > setup a workqueue to handle on a CPU. We can't simply > > > > > > > assume > > > > > > > cpudata to > > > > > > > be NULL and avoid processing as it may not be NULL but data > > > > > > > structure is > > > > > > > not in consistent state. > > > > > > > > > > > > > > So created a cpumask which sets the CPU on which interrupt > > > > > > > was > > > > > > > setup. If > > > > > > > not setup, simply clear the interrupt status and return. > > > > > > > Since > > > > > > > the > > > > > > > similar issue can happen during S3 resume, clear the bit > > > > > > > during > > > > > > > offline. > > > > > > > > > > > > > > Since interrupt timing may be before HWP is enabled, use > > > > > > > safe > > > > > > > MSR > > > > > > > read > > > > > > > writes as before the change for HWP interrupt. > > > > > > > > > > > > > > Fixes: d0e936adbd22 ("cpufreq: intel_pstate: Process HWP > > > > > > > Guaranteed change notification") > > > > > > > Reported-and-tested-by: Jens Axboe > > > > > > > Signed-off-by: Srinivas Pandruvada < > > > > > > > srinivas.pandruvada@linux.intel.com> > > > > > > > --- > > > > > > >  drivers/cpufreq/intel_pstate.c | 23 > > > > > > > ++++++++++++++++++++++- > > > > > > >  1 file changed, 22 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/drivers/cpufreq/intel_pstate.c > > > > > > > b/drivers/cpufreq/intel_pstate.c > > > > > > > index b4ffe6c8a0d0..5ac86bfa1080 100644 > > > > > > > --- a/drivers/cpufreq/intel_pstate.c > > > > > > > +++ b/drivers/cpufreq/intel_pstate.c > > > > > > > @@ -298,6 +298,8 @@ static bool hwp_boost __read_mostly; > > > > > > > > > > > > > >  static struct cpufreq_driver *intel_pstate_driver > > > > > > > __read_mostly; > > > > > > > > > > > > > > +static cpumask_t hwp_intr_enable_mask; > > > > > > > + > > > > > > >  #ifdef CONFIG_ACPI > > > > > > >  static bool acpi_ppc; > > > > > > >  #endif > > > > > > > @@ -1067,11 +1069,15 @@ static void > > > > > > > intel_pstate_hwp_set(unsigned > > > > > > > int cpu) > > > > > > >         wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); > > > > > > >  } > > > > > > > > > > > > > > +static void intel_pstate_disable_hwp_interrupt(struct > > > > > > > cpudata > > > > > > > *cpudata); > > > > > > > + > > > > > > >  static void intel_pstate_hwp_offline(struct cpudata *cpu) > > > > > > >  { > > > > > > >         u64 value = READ_ONCE(cpu->hwp_req_cached); > > > > > > >         int min_perf; > > > > > > > > > > > > > > +       intel_pstate_disable_hwp_interrupt(cpu); > > > > > > > + > > > > > > >         if (boot_cpu_has(X86_FEATURE_HWP_EPP)) { > > > > > > >                 /* > > > > > > >                  * In case the EPP has been set to > > > > > > > "performance" > > > > > > > by the > > > > > > > @@ -1645,20 +1651,35 @@ void notify_hwp_interrupt(void) > > > > > > >         if (!hwp_active || > > > > > > > !boot_cpu_has(X86_FEATURE_HWP_NOTIFY)) > > > > > > >                 return; > > > > > > > > > > > > > > -       rdmsrl(MSR_HWP_STATUS, value); > > > > > > > +       rdmsrl_safe(MSR_HWP_STATUS, &value); > > > > > > >         if (!(value & 0x01)) > > > > > > >                 return; > > > > > > > > > > > > > > +       if (!cpumask_test_cpu(this_cpu, > > > > > > > &hwp_intr_enable_mask)) { > > > > > > > +               wrmsrl_safe(MSR_HWP_STATUS, 0); > > > > > > > +               return; > > > > > > > +       } > > > > > > > > > > > > Without additional locking, there is a race between this and > > > > > > intel_pstate_disable_hwp_interrupt(). > > > > > > > > > > > > 1. notify_hwp_interrupt() checks hwp_intr_enable_mask() and > > > > > > the > > > > > > target > > > > > > CPU is in there, so it will go for scheduling the delayed > > > > > > work. > > > > > > 2. intel_pstate_disable_hwp_interrupt() runs between the > > > > > > check > > > > > > and > > > > > > the > > > > > > cpudata load below. > > > > > > 3. hwp_notify_work is scheduled on the CPU that isn't there > > > > > > in > > > > > > the > > > > > > mask any more. > > > > > > > > > > I noticed that too, not clear to me how much of an issue that > > > > > is > > > > > in > > > > > practice. But there's definitely a race there. > > > > Glad to see how this is possible from code running in ISR > > > > context. > > > > > > intel_pstate_disable_hwp_interrupt() may very well run on a > > > different > > > CPU in parallel with the interrupt handler running on this CPU.  Or > > > is > > > this not possible for some reason? > > I see the offline callback is called from cpufreq core from hotplug > > online/offline callback. So this should run the call on the target > > CPU. > > From Documentation > > "The states CPUHP_AP_OFFLINE … CPUHP_AP_ONLINE are invoked just the > > after the CPU has been brought up. The interrupts are off and the > > scheduler is not yet active on this CPU. Starting with > > CPUHP_AP_OFFLINE > > the callbacks are invoked on the target CPU." > > > > The only other place it is called is from subsys remove callback. Not > > sure how can you remove cpufreq subsys on fly. > > cpufreq_unregister_driver() causes this to happen. > > > Let's say it is possible: > > While running ISR on a local CPU, how can someone pull the CPU before > > its completion? If the CPU is going away after that, the workqueue is > > unbounded. So it will run on some other CPU, here if that happens it > > will call cpufreq update policy, which will be harmless. > > Well, it looks to me like if you are super-unlucky, the ISR may run on > the local CPU in parallel with intel_pstate_update_status() running on > a remote one and so dereferencing cpudata from it is generally unsafe. > In theory.  In practice it is unlikely to become problematic for > timing reasons AFAICS. > > We are handling offline for other thermal interrupt sources from same interrupt in therm-throt.c, where we do similar in offline path (by TGLX). If cpufreq offline can cause such issue of changing CPU, I can call intel_pstate_disable_hwp_interrupt() via override from https://elixir.bootlin.com/linux/latest/C/ident/thermal_throttle_offline after masking APIC interrupt. Thanks, Srinivas > Anyway, I would consider using RCU here to stay on the safe side.