Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753929AbdCOLLU (ORCPT ); Wed, 15 Mar 2017 07:11:20 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39957 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753877AbdCOLLQ (ORCPT ); Wed, 15 Mar 2017 07:11:16 -0400 Date: Wed, 15 Mar 2017 16:41:04 +0530 From: Gautham R Shenoy To: Nicholas Piggin Cc: "Gautham R. Shenoy" , Michael Ellerman , Michael Neuling , Benjamin Herrenschmidt , "Shreyas B. Prabhu" , Shilpasri G Bhat , Vaidyanathan Srinivasan , Anton Blanchard , Balbir Singh , Akshay Adiga , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] powernv:idle: Don't override default/deepest directly in kernel Reply-To: ego@linux.vnet.ibm.com References: <20170315000543.599ac505@roar.ozlabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170315000543.599ac505@roar.ozlabs.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-TM-AS-GCONF: 00 x-cbid: 17031511-0008-0000-0000-000001C46221 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006786; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00834131; UDB=6.00409601; IPR=6.00611788; BA=6.00005213; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014654; XFM=3.00000013; UTC=2017-03-15 11:11:11 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17031511-0009-0000-0000-000033D6C9DA Message-Id: <20170315111104.GB23835@in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-15_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703150087 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8701 Lines: 240 Hi Nick, Thanks for reviewing the patch. On Wed, Mar 15, 2017 at 12:05:43AM +1000, Nicholas Piggin wrote: > On Mon, 13 Mar 2017 11:31:27 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > Currently during idle-init on power9, if we don't find suitable stop > > states in the device tree that can be used as the > > default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default > > stop state psscr to be used by power9_idle and deepest stop state > > which is used by CPU-Hotplug. > > > > However, if the platform firmware has not configured or enabled a stop > > state, the kernel should not make any assumptions and fallback to a > > default choice. > > > > If the kernel uses a stop state that is not configured by the platform > > firmware, it may lead to further failures which should be avoided. > > > > In this patch, we modify the init code to ensure that the kernel uses > > only the stop states exposed by the firmware through the device > > tree. When a suitable default stop state isn't found, we disable > > ppc_md.power_save for power9. Similarly, when a suitable > > deepest_stop_state is not found in the device tree exported by the > > firmware, fall back to the default busy-wait loop in the CPU-Hotplug > > code. > > Seems reasonable. I have a few comments that you may consider. Nothing > too major. > > Btw., it would be nice to move this hotplug idling selection code to > idle.c. Have the hotplug just ask to enter the best available idle mode > and that's it. I'm not asking you to do that for this series, but perhaps > consider it for the future. That's not a bad idea. I will do it in the respin of the patchset. > > > > diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c > > index 4ee837e..9fde6e4 100644 > > --- a/arch/powerpc/platforms/powernv/idle.c > > +++ b/arch/powerpc/platforms/powernv/idle.c > > @@ -147,7 +147,6 @@ u32 pnv_get_supported_cpuidle_states(void) > > } > > EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); > > > > - > > static void pnv_fastsleep_workaround_apply(void *info) > > > > { > > @@ -243,6 +242,7 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600, > > */ > > u64 pnv_default_stop_val; > > u64 pnv_default_stop_mask; > > +bool default_stop_found; > > > > /* > > * Used for ppc_md.power_save which needs a function with no parameters > > @@ -264,6 +264,7 @@ static void power9_idle(void) > > */ > > u64 pnv_deepest_stop_psscr_val; > > u64 pnv_deepest_stop_psscr_mask; > > +bool deepest_stop_found; > > > > /* > > * Power ISA 3.0 idle initialization. > > If the hotplug idle code was in idle.c, then all this deepest/default stop > logic and register settings would be static to idle.c, which would be nice. > > If you have a function to check if deepest stop is found, then you don't need > a non-static variable here (or for default_stop_found AFAIKS). Sure! > > > > @@ -352,7 +353,6 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags, > > u32 *residency_ns = NULL; > > u64 max_residency_ns = 0; > > int rc = 0, i; > > - bool default_stop_found = false, deepest_stop_found = false; > > > > psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL); > > psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL); > > @@ -433,20 +433,22 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags, > > } > > > > if (!default_stop_found) { > > - pnv_default_stop_val = PSSCR_HV_DEFAULT_VAL; > > - pnv_default_stop_mask = PSSCR_HV_DEFAULT_MASK; > > - pr_warn("Setting default stop psscr val=0x%016llx,mask=0x%016llx\n", > > + pr_warn("powernv:idle: Default stop not found. Disabling ppcmd.powersave\n"); > > + } else { > > + pr_info("powernv:idle: Default stop: psscr = 0x%016llx,mask=0x%016llx\n", > > pnv_default_stop_val, pnv_default_stop_mask); > > } > > > > if (!deepest_stop_found) { > > - pnv_deepest_stop_psscr_val = PSSCR_HV_DEFAULT_VAL; > > - pnv_deepest_stop_psscr_mask = PSSCR_HV_DEFAULT_MASK; > > - pr_warn("Setting default stop psscr val=0x%016llx,mask=0x%016llx\n", > > + pr_warn("powernv:idle: Deepest stop not found. CPU-Hotplug is affected\n"); > > I guess these warnings are meant for developers, but it might be nice > to make them slightly more meaningful? Prefix of choice seems to be > "cpuidle-powernv:" > > Then you could have "no suitable stop state provided by firmware. Disabling > idle power saving" and "no suitable stop state provided by firmware. Offlined > CPUs will busy-wait", perhaps? How about pr_warn("cpuidle-powernv: No suitable stop for CPU-Hotplug. Offlined CPUs will busy wait\n"); > > Just a suggestion. > > > + } else { > > + pr_info("powernv:idle: Deepest stop: psscr = 0x%016llx,mask=0x%016llx\n", > > pnv_deepest_stop_psscr_val, > > pnv_deepest_stop_psscr_mask); > > } > > > > + pr_info("powernv:idle: RL value of first deep stop = 0x%llx\n", > > + pnv_first_deep_stop_state); > > cpuidle-powernv: prefix for these too? Will fix. > > > out: > > kfree(psscr_val); > > kfree(psscr_mask); > > @@ -454,6 +456,12 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags, > > return rc; > > } > > > > +bool pnv_check_deepest_stop(void) > > +{ > > + return deepest_stop_found; > > +} > > +EXPORT_SYMBOL_GPL(pnv_check_deepest_stop); > > Does this need to be exported? AFAIKS it's not used in a module. No, it is not used in a module. Will get rid of it. > > > + > > /* > > * Probe device tree for supported idle states > > */ > > @@ -526,7 +534,8 @@ static int __init pnv_init_idle_states(void) > > > > if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED) > > ppc_md.power_save = power7_idle; > > - else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST) > > + else if ((supported_cpuidle_states & OPAL_PM_STOP_INST_FAST) && > > + default_stop_found) > > ppc_md.power_save = power9_idle; > > > > out: > > Is this the only use of default_stop_found? & OPAL_PM_STOP_INST_FAST > should always be the same as default_stop_found value, no? In that case > can you just remove the OPAL_PM_STOP_INST_FAST test here? (I like the > boolean and prefer the default idle state selection logic to stay in the > idle init above where you have it commented). Yup. You are right, the check is redundant. We only consider STOP_INST_FAST states for default in power9_idle_init(). Will fix this and move initialization of ppc_md.power_save to the init function. > > > > diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h > > index 6130522..9acd5eb 100644 > > --- a/arch/powerpc/platforms/powernv/powernv.h > > +++ b/arch/powerpc/platforms/powernv/powernv.h > > @@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { } > > #endif > > > > extern u32 pnv_get_supported_cpuidle_states(void); > > +bool pnv_check_deepest_stop(void); > > extern u64 pnv_deepest_stop_psscr_val; > > extern u64 pnv_deepest_stop_psscr_mask; > > > > diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c > > index 8d5b99e..3f66e6f 100644 > > --- a/arch/powerpc/platforms/powernv/smp.c > > +++ b/arch/powerpc/platforms/powernv/smp.c > > @@ -140,6 +140,7 @@ static void pnv_smp_cpu_kill_self(void) > > unsigned int cpu; > > unsigned long srr1, wmask; > > u32 idle_states; > > + bool deepest_stop_found; > > > > /* Standard hot unplug procedure */ > > local_irq_disable(); > > @@ -155,6 +156,7 @@ static void pnv_smp_cpu_kill_self(void) > > wmask = SRR1_WAKEMASK_P8; > > > > idle_states = pnv_get_supported_cpuidle_states(); > > + deepest_stop_found = pnv_check_deepest_stop(); > > > > /* We don't want to take decrementer interrupts while we are offline, > > * so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9) > > @@ -184,7 +186,11 @@ static void pnv_smp_cpu_kill_self(void) > > > > ppc64_runlatch_off(); > > > > - if (cpu_has_feature(CPU_FTR_ARCH_300)) { > > + if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) { > > + pr_info("CPU %u offlining with psscr = 0x%016llx\n", > > + cpu, pnv_deepest_stop_psscr_val); > > + pr_info("CPU%d down paca pir %016x pir %lx\n", > > + cpu, hard_smp_processor_id(), mfspr(SPRN_PIR)); > > How much log info is appropriate here? This should have been pr_debug. I will clean up this part. > > > srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val, > > pnv_deepest_stop_psscr_mask); > > } else if (idle_states & OPAL_PM_WINKLE_ENABLED) { > -- Thanks and Regards gautham.