Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936790AbcCQUgZ (ORCPT ); Thu, 17 Mar 2016 16:36:25 -0400 Received: from mga01.intel.com ([192.55.52.88]:29572 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753117AbcCQUgX (ORCPT ); Thu, 17 Mar 2016 16:36:23 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,351,1455004800"; d="scan'208";a="936297811" Message-ID: <1458246992.3861.23.camel@linux.intel.com> Subject: Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling From: Srinivas Pandruvada To: Linda Knippers , rjw@rjwysocki.net Cc: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, lenb@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 17 Mar 2016 13:36:32 -0700 In-Reply-To: <56EB0D81.1050900@hpe.com> References: <1458239047-14259-1-git-send-email-srinivas.pandruvada@linux.intel.com> <56EB0D81.1050900@hpe.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.1 (3.18.5.1-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7134 Lines: 218 On Thu, 2016-03-17 at 16:03 -0400, Linda Knippers wrote: > On 3/17/2016 2:24 PM, Srinivas Pandruvada wrote: > > > > There are several reports of freeze on enabling HWP (Hardware > > PStates) > > feature on Skylake based systems by Intel P states driver. The root > > cause is identified as the HWP interrupts causing BIOS code to > > freeze. > > HWP interrupts uses thermal LVT. > > Linux natively handles thermal interrupts, but in Skylake based > > systems > > SMM will take control of thermal interrupts. This is a problem for > > several > > reasons: > > - It is freezing in BIOS when tries to handle thermal interrupt, > > which > > will require BIOS upgrade > > - With SMM handling thermal we loose all the reporting features of > > Linux arch/x86/kernel/cpu/mcheck/therm_throt driver > > - Some thermal drivers like x86-package-temp driver depends on the > > thermal > > threshold interrupts > > - The HWP interrupts are useful for debugging and tuning > > performance > > > > So we need native handling of thermal interrupts.  > Is this working around a firmware bug? Yes. But Linux always had capability to handle thermal interrupts natively, so we should continue to have this in OS control, when possible. > > > > > To inform SMM that > > OS will handle thermal interrupts, we need to use _OSC under > > processor > > scope very early in ACPI initialization flow.  > Does _OSC say that OS "will" handle thermal interrupts or that it > "can" > handle thermal interrupts? Can handle. >   Couldn't a platform decide it wants to handle > them, regardless of what the OS may be capable of? > Yes. In that case platform can change the delivery mode bits to SMM in LVT. > > > > This needs to be done > > before SMM code path looks for _OSC capabilities. The bit 12 of > > _OSC in processor scope defines whether OS will handle thermal > > interrupts. > > When bit 12 is set to 1, OS will handle thermal interrupts. > > Refer to this document for details on _OSC > > http://www.intel.com/content/www/us/en/standards/processor-vendor- > > specific-acpi-specification.html > Where is bit 12 documented? > In the above document. > > > > This change introduces a new function > > acpi_early_processor_set_osc(), > > which walks acpi name space and finds acpi processor object and > > set capability via _OSC method to take over thermal LVT. > Does this change just affect Skylake platforms or all platforms? Any platform which has Intel ® Speed Shift Technology (aka HWP) feature present and enabled. Thanks, Srinivas  > > -- ljk > > > > > > Also this change writes HWP status bits to 0 to clear any HWP > > status > > bits in intel_thermal_interrupt(). > > > > Signed-off-by: Srinivas Pandruvada > .com> > > --- > > v4: > > Suggestion by Boris for removing use of intermediate variable for > > clearing HWP status and using boot_cpu_has instead of > > static_cpu_has > > > > v3: > > - Added CONFIG_X86 around static_cpu_has() to fix compile error on > > ARCH=ia64, reported by kbuild test robot > > - return AE_CTRL_TERMINATE to terminate acpi name walk space, when > > _OSC > > is set already once. > > v2: > > Unnecessary newline was introduced, removed that in > > acpi_processor.c > > > >  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++ > >  drivers/acpi/acpi_processor.c            | 47 > > ++++++++++++++++++++++++++++++++ > >  drivers/acpi/bus.c                       |  3 ++ > >  drivers/acpi/internal.h                  |  2 ++ > >  4 files changed, 55 insertions(+) > > > > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c > > b/arch/x86/kernel/cpu/mcheck/therm_throt.c > > index 2c5aaf8..0553858 100644 > > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c > > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c > > @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void) > >  { > >   __u64 msr_val; > >   > > + if (static_cpu_has(X86_FEATURE_HWP)) > > + wrmsrl_safe(MSR_HWP_STATUS, 0); > > + > >   rdmsrl(MSR_IA32_THERM_STATUS, msr_val); > >   > >   /* Check for violation of core thermal thresholds*/ > > diff --git a/drivers/acpi/acpi_processor.c > > b/drivers/acpi/acpi_processor.c > > index 6979186..18da84f 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct > > acpi_device *device) > >  } > >  #endif /* CONFIG_ACPI_HOTPLUG_CPU */ > >   > > +#ifdef CONFIG_X86 > > +static bool acpi_hwp_native_thermal_lvt_set; > > +static acpi_status acpi_set_hwp_native_thermal_lvt_osc(acpi_handle > > handle, > > +        u32 lvl, > > void *context, > > +        void **rv) > > +{ > > + u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953"; > > + u32 capbuf[2]; > > + struct acpi_osc_context osc_context = { > > + .uuid_str = sb_uuid_str, > > + .rev = 1, > > + .cap.length = 8, > > + .cap.pointer = capbuf, > > + }; > > + > > + if (acpi_hwp_native_thermal_lvt_set) > > + return AE_CTRL_TERMINATE; > > + > > + capbuf[0] = 0x0000; > > + capbuf[1] = 0x1000; /* set bit 12 */ > > + > > + if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) { > > + acpi_hwp_native_thermal_lvt_set = true; > > + kfree(osc_context.ret.pointer); > > + } > > + > > + return AE_OK; > > +} > > + > > +void acpi_early_processor_set_osc(void) > > +{ > > + if (boot_cpu_has(X86_FEATURE_HWP)) { > > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, > > ACPI_ROOT_OBJECT, > > +     ACPI_UINT32_MAX, > > +     acpi_set_hwp_native_thermal_lv > > t_osc, > > +     NULL, NULL, NULL); > > + acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, > > +  acpi_set_hwp_native_thermal_lvt_o > > sc, > > +  NULL, NULL); > > + } > > +} > > +#else > > + > > +void acpi_early_processor_set_osc(void) {} > > + > > +#endif > > + > >  /* > >   * The following ACPI IDs are known to be suitable for > > representing as > >   * processor devices. > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > > index 891c42d..7e73aea 100644 > > --- a/drivers/acpi/bus.c > > +++ b/drivers/acpi/bus.c > > @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void) > >   goto error1; > >   } > >   > > + /* Set capability bits for _OSC under processor scope */ > > + acpi_early_processor_set_osc(); > > + > >   /* > >    * _OSC method may exist in module level code, > >    * so it must be run after ACPI_FULL_INITIALIZATION > > diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h > > index 1e6833a..5c787ac 100644 > > --- a/drivers/acpi/internal.h > > +++ b/drivers/acpi/internal.h > > @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void); > >  static inline void acpi_early_processor_set_pdc(void) {} > >  #endif > >   > > +void acpi_early_processor_set_osc(void); > > + > >  /* ------------------------------------------------------------- > > ------------- > >                                    Embedded Controller > >     --------------------------------------------------------------- > > ----------- */ > >