Received: by 10.192.165.148 with SMTP id m20csp457395imm; Wed, 25 Apr 2018 02:18:35 -0700 (PDT) X-Google-Smtp-Source: AIpwx49YiwwUDg7SOiX//50A5u6N6OTtjiyhS3ceYvVO0Dik5ddnSS7+2Q2m9OhAoCwi/TnMfOyp X-Received: by 10.101.98.90 with SMTP id q26mr23397443pgv.113.1524647914969; Wed, 25 Apr 2018 02:18:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524647914; cv=none; d=google.com; s=arc-20160816; b=I9JqH9wg/2UlW4OpEGt74MEU/nvKjwYA0w4s07W7e5EvC7Iq9TpSf/iXYBGrQJ0wus O8i7wp1Yk0WfAXtlVMn3gvS2n2h/zHOc1s3CN0TVP/uArsHztU1Z3D+ivKnlQ668UJxQ NBAQ/EKp67c6Czld++fuKBiKeynyUmFYkPvhxOki9T3CJ/wKqVj6Hy8Fs4IizJ0XmZjh Gc75MFe3w9c7mI6HQb/EVbxUQ2GlegvF0CI2p+hJIyIWDh37Ph7oKJF6zgEJbaaxuRkS JD5p+NNew/hzBLW6hepHBUgiwP8EP+TQOzvpK8+A8GgBvHieknBLpXBtopTCFOGxOYbo Z27g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=Rt8WmuQtWVsLYt6n/k27CIBKdG3AvuMznFfTZx/C2AI=; b=cY43fMTdPp/C/lEoe3oDryxFCT+myXRk1iBHwvki1ACS6VtN0VKQj66X/l1EKAlH3Y OeuBKgO0Ff7GYktHTLq4mvLgRK/mSTokwIYSnaZqtk5wEqFMgL5pBG2+1NA2mzeOXVVD MMTUvQrWXawjG29KBnJgPB6SXaebRD5bqp1px+nM8Fo1oLh6Y6lbf9eNasvJ/s3PRYuk 3c8UvFxyZWPfeBeBp0lMdF1HvAjukXXjQHMfd1K6Zs+UmsODQb9JoUejgTQMW0mDnP71 8CBiKb/M13d8tX4y70eERDLGpRbXKJOPgoOJZ9R5x8OwwQee7O2vH2uMMUFuzwjpu2Cu AUNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=X63ygR8K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 70si1264266pfu.274.2018.04.25.02.18.20; Wed, 25 Apr 2018 02:18:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=X63ygR8K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751452AbeDYJRL (ORCPT + 99 others); Wed, 25 Apr 2018 05:17:11 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:40911 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751350AbeDYJRE (ORCPT ); Wed, 25 Apr 2018 05:17:04 -0400 Received: by mail-pf0-f196.google.com with SMTP id f189so5560907pfa.7 for ; Wed, 25 Apr 2018 02:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Rt8WmuQtWVsLYt6n/k27CIBKdG3AvuMznFfTZx/C2AI=; b=X63ygR8KOwMfPK9E8NtGmgWVhhtfqYbSJKl5so0qQQbiBlc/dut9HbyngenW2HKFem 8gKJ8Eq1jxuRLa5KLElYbAXiZWoqlK/EpCTmTmtjxvRH0zydT5UE+8WyLnYCm1B2ZjNS +tmAidt90QPuFvdKqDSHIhydBuxrtY3sGO9Wg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Rt8WmuQtWVsLYt6n/k27CIBKdG3AvuMznFfTZx/C2AI=; b=bnsoiI7YHiULfEr29OX0IoRbq/xJEB0lAI7grsjzl5gwyH2+ACep6idisc1iq+9VLF jRjEPWhSu+z49FlCLlDP6gGishqO9/eOVF+FZB7T45Q6RN7vEVLsRVozp3A4l5BM3vjo XWY+3zwyD4LJvwpQujC8EzhSZCWZUw5HT2y5J+UJEYQWZ0aZldoHQngCkd1dOWUdl0My AA7gI90i4lhQNknD7wD54bAj8vKO+Lmf0nV3QhqdKVyonJ2MjjDJ1cWqSrW/ogQnBQ75 tAUktf1hBp79F/ICXvCWhq3ccNleNcY2SDl3sivgybdu9wSLphz6UZsPbxLoup+zF5EP I2QQ== X-Gm-Message-State: ALQs6tCzeu+Zc9c955SF00EvZZ2VCCjJdE8P1wYtspsjNi7sCl3X9xOL cUkKKAaIpLouL3o35YuESHqtOA== X-Received: by 10.101.74.132 with SMTP id b4mr9530112pgu.36.1524647823513; Wed, 25 Apr 2018 02:17:03 -0700 (PDT) Received: from localhost ([122.172.61.40]) by smtp.gmail.com with ESMTPSA id v187sm31441970pfv.21.2018.04.25.02.17.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Apr 2018 02:17:02 -0700 (PDT) Date: Wed, 25 Apr 2018 14:47:01 +0530 From: Viresh Kumar To: Shilpasri G Bhat Cc: rjw@rjwysocki.net, npiggin@gmail.com, linux-pm@vger.kernel.org, ppaidipe@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, svaidy@linux.vnet.ibm.com, stable@vger.kernel.org Subject: Re: [PATCH V2] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt Message-ID: <20180425091701.kxxslgalw3sdpiym@vireshk-i7> References: <1524646968-526-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1524646968-526-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> User-Agent: NeoMutt/20180323-120-3dd1ac Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25-04-18, 14:32, Shilpasri G Bhat wrote: > gpstate_timer_handler() uses synchronous smp_call to set the pstate > on the requested core. This causes the below hard lockup: > > [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) > [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250 > [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 > [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 > [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0 > [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 > [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 > [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120 > [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0 > [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120 > -- interrupt: 901 at doorbell_global_ipi+0x34/0x50 > LR = arch_send_call_function_ipi_mask+0x120/0x130 > [c000003fe566ba50] [c00000000004876c] > arch_send_call_function_ipi_mask+0x4c/0x130 > [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450 > [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 > [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 > [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40 > [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 > [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 > [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c > > One way to avoid this is removing the smp-call. We can ensure that the timer > always runs on one of the policy-cpus. If the timer gets migrated to a > cpu outside the policy then re-queue it back on the policy->cpus. This way > we can get rid of the smp-call which was being used to set the pstate > on the policy->cpus. > > Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned) > Cc: [4.8+] > Reported-by: Nicholas Piggin > Reported-by: Pridhiviraj Paidipeddi > Signed-off-by: Shilpasri G Bhat > --- > Changes from V1: > - Remove smp_call in the pstate handler. > > drivers/cpufreq/powernv-cpufreq.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c > index 71f8682..dc8ffb5 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -679,6 +679,25 @@ void gpstate_timer_handler(struct timer_list *t) > > if (!spin_trylock(&gpstates->gpstate_lock)) > return; > + /* > + * If the timer has migrated to the different cpu then bring > + * it back to one of the policy->cpus > + */ > + if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) { > + /* > + * Timer should be deleted if policy is inactive. > + * If policy is active then re-queue on one of the > + * policy->cpus. > + */ This looks racy. Shouldn't you guarantee that the timer is already removed in a synchronous way before de-activating the policy ? > + if (!cpumask_empty(policy->cpus)) { > + gpstates->timer.expires = jiffies + > + msecs_to_jiffies(1); > + add_timer_on(&gpstates->timer, > + cpumask_first(policy->cpus)); > + } > + spin_unlock(&gpstates->gpstate_lock); > + return; > + } > > /* > * If PMCR was last updated was using fast_swtich then > @@ -718,10 +737,8 @@ void gpstate_timer_handler(struct timer_list *t) > if (gpstate_idx != gpstates->last_lpstate_idx) > queue_gpstate_timer(gpstates); > > + set_pstate(&freq_data); > spin_unlock(&gpstates->gpstate_lock); > - > - /* Timer may get migrated to a different cpu on cpu hot unplug */ > - smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1); > } > > /* > -- > 1.8.3.1 -- viresh