Received: by 10.192.165.148 with SMTP id m20csp570914imm; Wed, 25 Apr 2018 04:27:54 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+TdmugDSOVIx4PjE3zPoIxCLLKwQ+Ok0r0fAk9JU3axLO6xGw/f5VlXWnCy5lq1DZMIiOS X-Received: by 10.98.156.152 with SMTP id u24mr27921229pfk.74.1524655674838; Wed, 25 Apr 2018 04:27:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524655674; cv=none; d=google.com; s=arc-20160816; b=XHpEpvs5Mo9k4OjNTRZFk90DpZyuMddOgRJlKRNRrleNcWR89IVyt2oYWgO8YDUM09 +CI0slCmgena1uP4uLdMm0EtVFRd/pGcslkQ1f8HMO0hZIh0jgFb3HWLBbBQO8LWPaEW HdCDMXk8bd6BzEc2hIsF6i+yxu6oQ57F8mAsIKgIPLvD2X0TNVzpTqndqzx1SwMmbLTY IIFxDpR5HdxEc4lpRFQVFoJEUcPNQaICim9M9k3/Yxs7LdyzPThYgwJ3NqPIe81Y/PAa DQ+JRHLjHvXIjkAmqUcWPD2h9Jy5M28HZiu1hD14eS3oOLOIFVEeXO8rU+FoiM6u9GOm 88cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature:arc-authentication-results; bh=A28nSBMoA+TxMsW/RVxKESXCP2raCtwjGUJbfkMs1Hs=; b=tvibS5FCHtsO2YaS9rhxMt+AhqdSGRAPTwmK8niIG2ilku3wUQ2NMsip8qtdVvWVp/ aE6co5n0sfiip+F87WTck1Uc/o5qHbmNMC6kKkRT+mLuC488afYA+ZHxwndmJXC86zzq fV0IPzBnmcGXVmbRBvG86JAVkdpQTniCtSmMvK8pEoPqMjOVhTHk7rQfTymDMzlcWV7S yzjXS9tzhLqvT4KCgaDZpGonkFr97lzmV7uH81GyM+TCo4brGhoQr7pDVf45QzVaX8xi cjziRsvizq2eBCoTCCj57g2damdgnrfGqb5mM4tq8kzylZ3GU+xVj0iky5bQdLaKB2yJ 7IGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LgjgujKk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e63si10936282pfd.261.2018.04.25.04.27.40; Wed, 25 Apr 2018 04:27:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LgjgujKk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753720AbeDYL0j (ORCPT + 99 others); Wed, 25 Apr 2018 07:26:39 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:39678 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752156AbeDYL0b (ORCPT ); Wed, 25 Apr 2018 07:26:31 -0400 Received: by mail-pf0-f195.google.com with SMTP id z9so14956703pfe.6; Wed, 25 Apr 2018 04:26:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=A28nSBMoA+TxMsW/RVxKESXCP2raCtwjGUJbfkMs1Hs=; b=LgjgujKkG3ODuCk3iT99Kg/Ilvr3Aib0GV70HbwDctQXcn7oss1aLAdRWAuZdgd3gG uXzMonGsTihXVJ7nuwdXv0LZnMxvoeU2wIYpGSoAZ1Uw0+dpi+GlKbwCFwPI7ueOQS+V Z8+n19pH2uQiZBv2tM9QgrNew/lCgX/y2fhRgLtu1MAoEhvjLusTrIuUWVDzDPtqwD5c xmAwTPbVeFhG8QRWYSD51vSRM1d3bG50w+drhFxdxAdPzJXKbybyvWm1ucJaSByhDl5A qJ4xMiFE0n139Wa1rfS2+COlOKzN04fmugPkRI5YgHzjw66XMrLhhPByiq2U8iCmzPrs KA3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=A28nSBMoA+TxMsW/RVxKESXCP2raCtwjGUJbfkMs1Hs=; b=Ja+11ui7R9l0WcXsp42Sc6TLhbEVaZfqwmiVv1EOJBGi/HbQTTk0wXFeOqFIYfi29y KKTK891IsJMrQhHRXjcvLuPbU6bDR6JlFfQ1wknH3KoOmjjG+sLfvTmT8SWhbByOAEkQ IUU/z7sm7c8kkSPgzMWvEI21+B0AVOcsKHSWBCQuWqgy1KhdUHognlweAoETF+y3OYfP hOYt2hPxxBLttfuJRu3q7HfO/oDDv3dfR9KwHNBW3J40bYpoeqizGzXRay9YkXd5k0gu BQoYbkpDIVQ8FI0I4jj7MnceML6BFxFjHSeMKES3bs2s43UFpAamzWACiAlVo8T2RutF RHTQ== X-Gm-Message-State: ALQs6tBOqCKzf0RARMOqOrk6HMlmeLuPgTEY8umiyYE+ooMGlgY1/1FZ HZ1tDOhI84PRd60vG3o9aPg= X-Received: by 10.99.136.73 with SMTP id l70mr23956364pgd.49.1524655590399; Wed, 25 Apr 2018 04:26:30 -0700 (PDT) Received: from roar.ozlabs.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id e73sm47450592pfj.186.2018.04.25.04.26.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 25 Apr 2018 04:26:29 -0700 (PDT) Date: Wed, 25 Apr 2018 21:26:13 +1000 From: Nicholas Piggin To: Shilpasri G Bhat Cc: rjw@rjwysocki.net, viresh.kumar@linaro.org, benh@kernel.crashing.org, mpe@ellerman.id.au, linux-pm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, ppaidipe@linux.vnet.ibm.com, svaidy@linux.vnet.ibm.com, Subject: Re: [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt Message-ID: <20180425212613.1fc2c468@roar.ozlabs.ibm.com> In-Reply-To: <1524653971-4919-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> References: <1524653971-4919-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> Organization: IBM X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 25 Apr 2018 16:29:31 +0530 Shilpasri G Bhat wrote: > gpstate_timer_handler() uses synchronous smp_call to set the pstate > on the requested core. This causes the below hard lockup: > > [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) > [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250 > [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 > [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 > [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0 > [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 > [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 > [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120 > [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0 > [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120 > -- interrupt: 901 at doorbell_global_ipi+0x34/0x50 > LR = arch_send_call_function_ipi_mask+0x120/0x130 > [c000003fe566ba50] [c00000000004876c] > arch_send_call_function_ipi_mask+0x4c/0x130 > [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450 > [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 > [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 > [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40 > [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 > [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 > [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c > > One way to avoid this is removing the smp-call. We can ensure that the timer > always runs on one of the policy-cpus. If the timer gets migrated to a > cpu outside the policy then re-queue it back on the policy->cpus. This way > we can get rid of the smp-call which was being used to set the pstate > on the policy->cpus. > > Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned) > Cc: [4.8+] > Reported-by: Nicholas Piggin > Reported-by: Pridhiviraj Paidipeddi > Signed-off-by: Shilpasri G Bhat Thanks, this looks good to me. I don't know the code though, so Acked-by: Nicholas Piggin > --- > Changes from V2: > - Remove the check for active policy while requeing the migrated timer > Changes from V1: > - Remove smp_call in the pstate handler. > > drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c > index 71f8682..e368e1f 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t) > > if (!spin_trylock(&gpstates->gpstate_lock)) > return; I still think it would be good to do something about the trylock failure. It may be rare, but if it happens it could stop the timer and lead to some rare unpredictable behaviour? Not for this patch, but while you're looking at the code it would be good to consider it. Just queueing up another timer seems like it should be enough. > + /* > + * If the timer has migrated to the different cpu then bring > + * it back to one of the policy->cpus > + */ > + if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) { > + gpstates->timer.expires = jiffies + msecs_to_jiffies(1); > + add_timer_on(&gpstates->timer, cpumask_first(policy->cpus)); > + spin_unlock(&gpstates->gpstate_lock); > + return; > + } Really small nitpick, but you could use cpumask_any there. Thanks, Nick > > /* > * If PMCR was last updated was using fast_swtich then > @@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t) > if (gpstate_idx != gpstates->last_lpstate_idx) > queue_gpstate_timer(gpstates); > > + set_pstate(&freq_data); > spin_unlock(&gpstates->gpstate_lock); > - > - /* Timer may get migrated to a different cpu on cpu hot unplug */ > - smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1); > }