Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755543AbcCRSck (ORCPT ); Fri, 18 Mar 2016 14:32:40 -0400 Received: from mga02.intel.com ([134.134.136.20]:18030 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753404AbcCRScf convert rfc822-to-8bit (ORCPT ); Fri, 18 Mar 2016 14:32:35 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,356,1455004800"; d="scan'208";a="672026751" Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c From: Stephane Gasparini In-Reply-To: <1458323535.3861.76.camel@linux.intel.com> Date: Fri, 18 Mar 2016 19:32:31 +0100 Cc: "Rafael J. Wysocki" , Josh Boyer , Philippe Longepe , Len Brown , Viresh Kumar , Linux PM list , "Linux-Kernel@Vger. Kernel. Org" Content-Transfer-Encoding: 8BIT Message-Id: <2FBA4108-2F11-438D-B521-62B577967B94@linux.intel.com> References: <10670722.U2p4NnYGsS@vostro.rjw.lan> <6130650.QZM8FIW1Fl@vostro.rjw.lan> <1458323535.3861.76.camel@linux.intel.com> To: Srinivas Pandruvada X-Mailer: Apple Mail (2.3112) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14090 Lines: 408 — Steph > On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada wrote: > > On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote: >> Rafael, >> >> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not >> both >> changed to use wrmsrl ? > Initial Atom support was experimental as there were no users, till > Chrome started using. So it was just a miss. > > We should never have to use wrmsrl_on_cpu. But looks like > cpufreq_driver.init() can't guarantee that. > >> BTW, what is the interest of setting the pstate to LFM during >> initialization ? >> The BIOS is setting the pstate to either LFM, HFM or BFM, and why >> bothering >> changing it. > This is a different issue. BIOS has different configuration option to > enable fast boot modes which are not necessarily optimized for Linux. > Some aggressive setting will force system to reboot on boot. So I will > leave the way it is. > > Thanks, > Srinivas > Still it does not answer my question, why when implementing a4675fbc4a7a we did changed core for wrmsrl and not atom ? My point is that the issue was more due to a miss in the patch a4675fbc4a7a rather than a difference of behavior between atom and core. The commit message is a bit misleading around this. The wrmrl_on_cpu() is needed on both core and atom during init. >> >> — >> Steph >> >> >> >> >>> >>> On Mar 18, 2016, at 3:36 PM, Rafael J. Wysocki >>> wrote: >>> >>> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote: >>>> >>>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki >>> .net> wrote: >>>>> >>>>> On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote: >>>>>> >>>>>> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki >>>>> socki.net> wrote: >>>>>>> >>>>>>> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>> Hi, >>>>>>> >>>>>>>> >>>>>>>> I have an Intel Atom based NUC that is producing the >>>>>>>> following >>>>>>>> backtraces on boot of Linus' tree as of last >>>>>>>> evening. This does not >>>>>>>> happen with a tree with top level commit 271ecc5253e2, >>>>>>>> but does happen >>>>>>>> when using the tree mentioned in the subject with top >>>>>>>> level commit >>>>>>>> 63e30271b04c. >>>>>>>> >>>>>>>> The first backtrace appears to be a warning because the >>>>>>>> intel_pstate >>>>>>>> driver is calling wrmsrl_on_cpu when interrupts are >>>>>>>> disabled? Not >>>>>>>> sure on that one. >>>>>>>> >>>>>>>> The second backtrace is a lockdep report. Both are from >>>>>>>> the same boot. >>>>>>> OK, thanks for the report. >>>>>>> >>>>>>> Can you please try the patch below? >>>>>>> >>>>>>> I'm actually unsure if we can do that safely in general for >>>>>>> Atom because >>>>>>> of the initialization, but that's what Core does anyway. >>>>>>> >>>>>>> Srinivas, Philippe, why exactly do we need the >>>>>>> wrmsrl_on_cpu() in >>>>>>> atom_set_pstate()? core_set_pstate() uses wrmsrl() and >>>>>>> seems to be doing fine. >>>>>>> >>>>>>> --- >>>>>>> drivers/cpufreq/intel_pstate.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c >>>>>>> =========================================================== >>>>>>> ======== >>>>>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c >>>>>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c >>>>>>> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct >>>>>>> cpuda >>>>>>> >>>>>>> val |= vid; >>>>>>> >>>>>>> - wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, >>>>>>> val); >>>>>>> + wrmsrl(MSR_IA32_PERF_CTL, val); >>>>>>> } >>>>>>> >>>>>>> static int silvermont_get_scaling(void) >>>>>>> >>>>>> I applied this on top of commit 09fd671ccb24 and the >>>>>> backtrace and >>>>>> lockdep report both go away. So yes, this seems to clear up >>>>>> the >>>>>> issue. I tested it on a variety of different CPU types and >>>>>> didn't >>>>>> notice anything wrong on them either. >>>>> The problems may show up during initialization and cleanup >>>>> where one CPU >>>>> may be running code trying to configure a different one. In >>>>> those cases >>>>> wrmsrl_on_cpu() needs to be used. >>>>> >>>>> Let me cut a patch taking that into account. >>>> OK. Happy to test when you have it ready. >>> Thanks! >>> >>> Please test the patch below. >>> >>> --- >>> From: Rafael J. Wysocki >>> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with >>> disabled interrupts >>> >>> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers >>> with >>> utilization update callbacks) wrmsrl_on_cpu() cannot be called in >>> the >>> intel_pstate_adjust_busy_pstate() path as that is executed with >>> disabled interrupts. However, atom_set_pstate() called from there >>> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the >>> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in >>> smp_call_function_single(). >>> >>> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is >>> because intel_pstate_set_pstate() calling it is also invoked during >>> the initialization and cleanup of the driver and in those cases it >>> is >>> not guaranteed to be run on the CPU that is being >>> updated. However, >>> in the case when intel_pstate_set_pstate() is called by >>> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update >>> the register safely. Moreover, intel_pstate_set_pstate() already >>> contains code that only is executed if the function is called by >>> intel_pstate_adjust_busy_pstate() and there is a special argument >>> passed to it because of that. >>> >>> To fix the problem at hand, rearrange the code taking the above >>> observations into account. >>> >>> First, replace the ->set() callback in struct pstate_funcs with a >>> ->get_val() one that will return the value to be written to the >>> IA32_PERF_CTL MSR without updating the register. >>> >>> Second, split intel_pstate_set_pstate() into two functions, >>> intel_pstate_update_pstate() to be called by >>> intel_pstate_adjust_busy_pstate() that will contain all of the >>> intel_pstate_set_pstate() code which only needs to be executed in >>> that case and will use wrmsrl() to update the MSR (after obtaining >>> the value to write to it from the ->get_val() callback), and >>> intel_pstate_set_min_pstate() to be invoked during the >>> initialization and cleanup that will set the P-state to the >>> minimum one and will update the MSR using wrmsrl_on_cpu(). >>> >>> Finally, move the code shared between intel_pstate_update_pstate() >>> and intel_pstate_set_min_pstate() to a new static inline function >>> intel_pstate_record_pstate() and make them both call it. >>> >>> Signed-off-by: Rafael J. Wysocki >>> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with >>> utilization update callbacks) >>> --- >>> drivers/cpufreq/intel_pstate.c | 73 ++++++++++++++++++++++++----- >>> ------------ >>> 1 file changed, 43 insertions(+), 30 deletions(-) >>> >>> Index: linux-pm/drivers/cpufreq/intel_pstate.c >>> =================================================================== >>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c >>> +++ linux-pm/drivers/cpufreq/intel_pstate.c >>> @@ -134,7 +134,7 @@ struct pstate_funcs { >>> int (*get_min)(void); >>> int (*get_turbo)(void); >>> int (*get_scaling)(void); >>> - void (*set)(struct cpudata*, int pstate); >>> + u64 (*get_val)(struct cpudata*, int pstate); >>> void (*get_vid)(struct cpudata *); >>> int32_t (*get_target_pstate)(struct cpudata *); >>> }; >>> @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void) >>> return value & 0x7F; >>> } >>> >>> -static void atom_set_pstate(struct cpudata *cpudata, int pstate) >>> +static u64 atom_get_val(struct cpudata *cpudata, int pstate) >>> { >>> u64 val; >>> int32_t vid_fp; >>> @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda >>> if (pstate > cpudata->pstate.max_pstate) >>> vid = cpudata->vid.turbo; >>> >>> - val |= vid; >>> - >>> - wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); >>> + return val | vid; >>> } >>> >>> static int silvermont_get_scaling(void) >>> @@ -711,7 +709,7 @@ static inline int core_get_scaling(void) >>> return 100000; >>> } >>> >>> -static void core_set_pstate(struct cpudata *cpudata, int pstate) >>> +static u64 core_get_val(struct cpudata *cpudata, int pstate) >>> { >>> u64 val; >>> >>> @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda >>> if (limits->no_turbo && !limits->turbo_disabled) >>> val |= (u64)1 << 32; >>> >>> - wrmsrl(MSR_IA32_PERF_CTL, val); >>> + return val; >>> } >>> >>> static int knl_get_turbo_pstate(void) >>> @@ -750,7 +748,7 @@ static struct cpu_defaults core_params = >>> .get_min = core_get_min_pstate, >>> .get_turbo = core_get_turbo_pstate, >>> .get_scaling = core_get_scaling, >>> - .set = core_set_pstate, >>> + .get_val = core_get_val, >>> .get_target_pstate = get_target_pstate_use_performance, >>> }, >>> }; >>> @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa >>> .get_max_physical = atom_get_max_pstate, >>> .get_min = atom_get_min_pstate, >>> .get_turbo = atom_get_turbo_pstate, >>> - .set = atom_set_pstate, >>> + .get_val = atom_get_val, >>> .get_scaling = silvermont_get_scaling, >>> .get_vid = atom_get_vid, >>> .get_target_pstate = get_target_pstate_use_cpu_load, >>> @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param >>> .get_max_physical = atom_get_max_pstate, >>> .get_min = atom_get_min_pstate, >>> .get_turbo = atom_get_turbo_pstate, >>> - .set = atom_set_pstate, >>> + .get_val = atom_get_val, >>> .get_scaling = airmont_get_scaling, >>> .get_vid = atom_get_vid, >>> .get_target_pstate = get_target_pstate_use_cpu_load, >>> @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params = >>> .get_min = core_get_min_pstate, >>> .get_turbo = knl_get_turbo_pstate, >>> .get_scaling = core_get_scaling, >>> - .set = core_set_pstate, >>> + .get_val = core_get_val, >>> .get_target_pstate = get_target_pstate_use_performance, >>> }, >>> }; >>> @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str >>> *min = clamp_t(int, min_perf, cpu->pstate.min_pstate, >>> max_perf); >>> } >>> >>> -static void intel_pstate_set_pstate(struct cpudata *cpu, int >>> pstate, bool force) >>> +static inline void intel_pstate_record_pstate(struct cpudata *cpu, >>> int pstate) >>> { >>> - int max_perf, min_perf; >>> - >>> - if (force) { >>> - update_turbo_state(); >>> - >>> - intel_pstate_get_min_max(cpu, &min_perf, >>> &max_perf); >>> - >>> - pstate = clamp_t(int, pstate, min_perf, max_perf); >>> - >>> - if (pstate == cpu->pstate.current_pstate) >>> - return; >>> - } >>> trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu); >>> - >>> cpu->pstate.current_pstate = pstate; >>> +} >>> >>> - pstate_funcs.set(cpu, pstate); >>> +static void intel_pstate_set_min_pstate(struct cpudata *cpu) >>> +{ >>> + int pstate = cpu->pstate.min_pstate; >>> + >>> + intel_pstate_record_pstate(cpu, pstate); >>> + /* >>> + * Generally, there is no guarantee that this code will >>> always run on >>> + * the CPU being updated, so force the register update to >>> run on the >>> + * right CPU. >>> + */ >>> + wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL, >>> + pstate_funcs.get_val(cpu, pstate)); >>> } >>> >>> static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) >>> @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates >>> >>> if (pstate_funcs.get_vid) >>> pstate_funcs.get_vid(cpu); >>> - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, >>> false); >>> + >>> + intel_pstate_set_min_pstate(cpu); >>> } >>> >>> static inline void intel_pstate_calc_busy(struct cpudata *cpu) >>> @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_ >>> return cpu->pstate.current_pstate - pid_calc(&cpu->pid, >>> core_busy); >>> } >>> >>> +static inline void intel_pstate_update_pstate(struct cpudata *cpu, >>> int pstate) >>> +{ >>> + int max_perf, min_perf; >>> + >>> + update_turbo_state(); >>> + >>> + intel_pstate_get_min_max(cpu, &min_perf, &max_perf); >>> + pstate = clamp_t(int, pstate, min_perf, max_perf); >>> + if (pstate == cpu->pstate.current_pstate) >>> + return; >>> + >>> + intel_pstate_record_pstate(cpu, pstate); >>> + wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, >>> pstate)); >>> +} >>> + >>> static inline void intel_pstate_adjust_busy_pstate(struct cpudata >>> *cpu) >>> { >>> int from, target_pstate; >>> @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b >>> >>> target_pstate = pstate_funcs.get_target_pstate(cpu); >>> >>> - intel_pstate_set_pstate(cpu, target_pstate, true); >>> + intel_pstate_update_pstate(cpu, target_pstate); >>> >>> sample = &cpu->sample; >>> trace_pstate_sample(fp_toint(sample->core_pct_busy), >>> @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct >>> if (hwp_active) >>> return; >>> >>> - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, >>> false); >>> + intel_pstate_set_min_pstate(cpu); >>> } >>> >>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy) >>> @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate >>> pstate_funcs.get_min = funcs->get_min; >>> pstate_funcs.get_turbo = funcs->get_turbo; >>> pstate_funcs.get_scaling = funcs->get_scaling; >>> - pstate_funcs.set = funcs->set; >>> + pstate_funcs.get_val = funcs->get_val; >>> pstate_funcs.get_vid = funcs->get_vid; >>> pstate_funcs.get_target_pstate = funcs->get_target_pstate; >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-pm" >>> in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html