Received: by 10.192.165.148 with SMTP id m20csp4333992imm; Tue, 24 Apr 2018 00:33:35 -0700 (PDT) X-Google-Smtp-Source: AIpwx48OcDNADDAW76/wvFk94/2ABliK0KKUogqFpxlIB0cgLaQgvnSLJPDPE8rbV+nzyDL5sUKw X-Received: by 2002:a17:902:ea:: with SMTP id a97-v6mr23256580pla.28.1524555215387; Tue, 24 Apr 2018 00:33:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524555215; cv=none; d=google.com; s=arc-20160816; b=yKndtr/R4vLo2scdC+eA+HXUzThWz6VEK+koQtUTeJzq7aus3RrboRyLSud3gXF3SG GPUUZ34DyAIYg838UxrjzvDpON+nsZCRrbUHawrp+dJIUjMXdeg5s71rjh0EpDCAkkE7 y5M+D1+knh/yBT7LubfYNMNjibJbFvp9vBsONniHRh3fyH/pe3hpNcOIZU86dC/TjzRv eva+KLCY1kr4j6YOWO2tDJoMK40zKSne1f2mr+rIK18dA6zaVB241ZMqe5FIeY46aKA0 +NVnfW8kPKqljFyCfkoPbhXYyENpSKvnBoqphqV9A3mXttZXplBLTaq+8O/IOhkfIsyt 7B0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature:arc-authentication-results; bh=Sz8fUeplcnQzrId7AIXVdW2XJeEYSAKZD59T4ifufRM=; b=fs0wjZMDAbLva+ZBr+QnO50/7k2FK8ofw/eohgbVyrTDkwj27V4vmFXwwuvWC0kIK3 ZKLspI5MKzzcSA9UDWWc6/VmlbaD2bH7TvezOpWZSWNK061RRBfaeyfZJsl3tquXaoVw XVlRmTyHccEg6qD4SePouFiSogVsIuz+nrxtyRSqkNinWjED3xLioo4BmemYvYBazPOn USZvA8IC1lH/BEgyXMo3DXIMo59Ud6YuWgEkRd54HuWlT6w5wQPjBD0dSFdJEyW7Mh1y xshdP15EL8qVDp3YIxfui5uEjHOtTmSDk/NbeegUxbLqVJ5GMrwPJFAdRLIcR1KnuMN0 rcnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=n6HJ4prL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65-v6si13384404plb.301.2018.04.24.00.33.21; Tue, 24 Apr 2018 00:33:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=n6HJ4prL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756434AbeDXHcF (ORCPT + 99 others); Tue, 24 Apr 2018 03:32:05 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:38491 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756249AbeDXHcB (ORCPT ); Tue, 24 Apr 2018 03:32:01 -0400 Received: by mail-pf0-f194.google.com with SMTP id o76so6812393pfi.5; Tue, 24 Apr 2018 00:32:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=Sz8fUeplcnQzrId7AIXVdW2XJeEYSAKZD59T4ifufRM=; b=n6HJ4prL8TwdB3hbkDm8xCb21vNNIhZIKEHt92bXnXu0kQh0H4+NwPb6DR8SGwmsVO f9PXiO5eiVwtcfe5/OKW2dDa454jAML8R7E7N9Hul5VYffABvtHYDV2YkXdZlK8bk4Y9 bid5VEUWGlXzpbbBt7HFwyGsjZfAh6Zehdw7AHlAaCov+hpRN+naX0/eD8vRD4jF7maO LHcDKkxhs6ujSWWLTGpZ2qSeLof8yEHcT0Mh/0ku5aWzLm/qO2jukABSbp4jbV7oR33U NJHfTTqx7S03jgCCzGBwC/hNmvgUVgsUSRY8xuwOaEw84HE4gzOlI4BO6tpn4XgKGg3r 8keA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=Sz8fUeplcnQzrId7AIXVdW2XJeEYSAKZD59T4ifufRM=; b=m7rNPoZIOPgkp841VTGSPBba/OmBdZiNZcM3dMuk1hThJa2cqu2zJ57TyD35WkchAm dWOb/f2pfjLkbRo5lmCy3WwFQgMIIqQshtCVQQSE72/hZuZ04IcBZKhlYQTFVbgWZKvb eHATw8lFOe3TadP38trGruE9owFq5rYpltVklI0A9AQCSa8QP/Nz24pgfQLuuDy+NR/u 3DH+IQ0iPwTyGhAFlr0nGB4AymY+XkslKxsvdspym4AXlgK2cCjhCKAqex5Vm567w/3/ icI+B9B7x/uJUvraMT+uuz2yqAdeOfE+ZQ+EfIUmByQ/nnZn41keYOilAJsdyCUFbNq6 ZsBA== X-Gm-Message-State: ALQs6tBmBsl3sumv5/DtKAsDqGpCUIIXDA7NcqnKzdfG4OIK7AwatruW D6e6ySynLW31QAggu0EN49I= X-Received: by 10.99.49.205 with SMTP id x196mr17421312pgx.397.1524555121010; Tue, 24 Apr 2018 00:32:01 -0700 (PDT) Received: from roar.ozlabs.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id a5sm3408873pff.8.2018.04.24.00.31.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 24 Apr 2018 00:32:00 -0700 (PDT) Date: Tue, 24 Apr 2018 17:31:47 +1000 From: Nicholas Piggin To: Shilpasri G Bhat Cc: rjw@rjwysocki.net, viresh.kumar@linaro.org, benh@kernel.crashing.org, mpe@ellerman.id.au, linux-pm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, ppaidipe@linux.vnet.ibm.com, svaidy@linux.vnet.ibm.com Subject: Re: [PATCH] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt Message-ID: <20180424173147.7bcd86c5@roar.ozlabs.ibm.com> In-Reply-To: References: <1524544906-31512-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> <20180424160034.6e9d2274@roar.ozlabs.ibm.com> Organization: IBM X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 24 Apr 2018 12:47:32 +0530 Shilpasri G Bhat wrote: > Hi, > > On 04/24/2018 11:30 AM, Nicholas Piggin wrote: > > On Tue, 24 Apr 2018 10:11:46 +0530 > > Shilpasri G Bhat wrote: > > > >> gpstate_timer_handler() uses synchronous smp_call to set the pstate > >> on the requested core. This causes the below hard lockup: > >> > >> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) > >> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250 > >> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 > >> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 > >> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0 > >> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 > >> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 > >> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120 > >> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0 > >> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120 > >> --- interrupt: 901 at doorbell_global_ipi+0x34/0x50 > >> LR = arch_send_call_function_ipi_mask+0x120/0x130 > >> [c000003fe566ba50] [c00000000004876c] arch_send_call_function_ipi_mask+0x4c/0x130 (unreliable) > >> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450 > >> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 > >> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 > >> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40 > >> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 > >> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 > >> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c > >> > >> Fix this by using the asynchronus smp_call in the timer interrupt handler. > >> We don't have to wait in this handler until the pstates are changed on > >> the core. This change will not have any impact on the global pstate > >> ramp-down algorithm. > >> > >> Reported-by: Nicholas Piggin > >> Reported-by: Pridhiviraj Paidipeddi > >> Signed-off-by: Shilpasri G Bhat > >> --- > >> drivers/cpufreq/powernv-cpufreq.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c > >> index 0591874..7e0c752 100644 > >> --- a/drivers/cpufreq/powernv-cpufreq.c > >> +++ b/drivers/cpufreq/powernv-cpufreq.c > >> @@ -721,7 +721,7 @@ void gpstate_timer_handler(struct timer_list *t) > >> spin_unlock(&gpstates->gpstate_lock); > >> > >> /* Timer may get migrated to a different cpu on cpu hot unplug */ > >> - smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1); > >> + smp_call_function_any(policy->cpus, set_pstate, &freq_data, 0); > >> } > >> > >> /* > > > > This can still deadlock because !wait case still ends up having to wait > > if another !wait smp_call_function caller had previously used the > > call single data for this cpu. > > > > If you go this way you would have to use smp_call_function_async, which > > is more work. > > > > As a rule it would be better to avoid smp_call_function entirely if > > possible. Can you ensure the timer is running on the right CPU? Use > > add_timer_on and try again if the timer is on the wrong CPU, perhaps? > > > > Yeah that is doable we can check for the cpu and re-queue it. We will only > ramp-down slower in that case which is no harm. Great, I'd be much happier avoiding that IPI. I guess it should happen quite rarely that we have to queue on a different CPU. I would say just do add_timer unless we have migrated to the wrong CPU, then do add_timer_on in that case (it's a bit slower). > (If the targeted core turns out to be offline then we will not queue the timer > again as we would have already set the pstate to min in the cpu-down path.) Something I noticed is that if we can not get the lock (trylock fails), then the timer does not get queued again. Should it? Thanks, Nick