Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp902840pxb; Wed, 27 Oct 2021 14:50:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw43cLXgYEzooYTyiKQXCx6nv+18MXxGzQceiwjOJSuBHCSgPpHRmTOzS9JiAIbo8bZwUjY X-Received: by 2002:a63:b04c:: with SMTP id z12mr229873pgo.371.1635371423712; Wed, 27 Oct 2021 14:50:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635371423; cv=none; d=google.com; s=arc-20160816; b=ei1sPyqLzKMYI4m4hQmQZYt0HANI4WLPqNYuUaEEtT8UmbrW+6/ANXHlfNw1MhfBkL h1ODLm3wh1FxCVcVgb/gIo8nFYMHlwR+q+oDEf8ud2V008qXw0S18Tlrcx/fLbago0iB ot2zcv6N/YK88Sm4IvxvJcXBMuM3vNRUj47xVXxSOcSUAT60CgqSvPy0OYBBl7AYVXUQ 8iDrm73ceomi/DIXOUi24HuSYb5k06dplLUIklhBzIA1IWOamOIQz7UnPdSnEABUg0J0 Ozy2S+Q0X1yHlc4qmU7vtzGzy8vVJRchVoKBmo1P17gpzZDWuf2zTS8MrQplpYBRBcu9 EMMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=REmhyH977aSmduvzyGxYccWFWjTuSZ1JicmOk/oQceU=; b=PwGavYl5V6v81QBRL5f6t35OATDeDJfin7S3rj/YfXw6ayECyAbOWalXomcnz8rZv0 5IrOPGmRXH+SJ0NErtoaQyQse+w/tKPjh/HPjW8VB855gIXrNBtUO9LOyt79Na7l5j+H rYfQjlycCwXwDuCKpk8HKwMb+mVcRHfyHulJz7Fb4RvccWFKWAHd0HA6pqKZMgUZ7UBb Pu28ysU94IJhdukc9yk6iID+QvotN5PKT37FksffJT8/uvk/1U0L3L6ngTNBx0KURpyD fuLd7WgruJmwze0+jveW9X8zZtlXu8fKHYB+PJDa2TaCzcmgsX/Yblx50uGNU9b+bAn3 7QgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=MKMYT2z6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i20si1508647pgk.180.2021.10.27.14.50.10; Wed, 27 Oct 2021 14:50:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=MKMYT2z6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238796AbhJ0IR5 (ORCPT + 99 others); Wed, 27 Oct 2021 04:17:57 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:21750 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S238821AbhJ0IRz (ORCPT ); Wed, 27 Oct 2021 04:17:55 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19R6NiM8010667; Wed, 27 Oct 2021 08:15:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=REmhyH977aSmduvzyGxYccWFWjTuSZ1JicmOk/oQceU=; b=MKMYT2z6MlwPcZHvZSd5i+XWdYQXu3GGx1slmn2CRaDPz4Bzu3sRGEVhSrIpOdEsNin8 ojkT0fdQ6Opkpq8JF8p9hg6Q/wj4++owaSjPDoEh16iO8Kg4HfJtWjqpkABK2jnUVCZr 2z9hJmY5LRTy1Y2GW18mtiZ5oCkoHj5M+4SKDdk79G6HNVUXVvIi/5322hH0AXOoUSR6 DejcbJTrdpw3wsjWVSFRQ1r5t6aoOp3sgSPMJ43Jt+CJhDo6amcZns/9Mm7QCFG7a34M iZjar7Ke5PF8WWt0SNkU2ss4ZYz+G2ClZcAbzP4SsSYttlbJvabJ0/Hp30rplXmppc9I 6Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3by1f8218e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Oct 2021 08:15:05 +0000 Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19R7eHIs025849; Wed, 27 Oct 2021 08:15:05 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3by1f8216t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Oct 2021 08:15:05 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19R8BnFI009930; Wed, 27 Oct 2021 08:15:01 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma04ams.nl.ibm.com with ESMTP id 3bx4edvsqt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Oct 2021 08:15:00 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19R8Ew8S53477732 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 27 Oct 2021 08:14:58 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0972DAE056; Wed, 27 Oct 2021 08:14:58 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BFE69AE045; Wed, 27 Oct 2021 08:14:57 +0000 (GMT) Received: from pomme.local (unknown [9.145.77.240]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 27 Oct 2021 08:14:57 +0000 (GMT) Subject: Re: [PATCH 1/2] powerpc/watchdog: prevent printk and send IPI while holding the wd lock To: Nicholas Piggin , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org References: <20211026162740.16283-1-ldufour@linux.ibm.com> <20211026162740.16283-2-ldufour@linux.ibm.com> <1635303699.wgz87uxy4c.astroid@bobo.none> From: Laurent Dufour Message-ID: <33e15005-d342-5270-9b9d-64750f8794a7@linux.ibm.com> Date: Wed, 27 Oct 2021 10:14:57 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <1635303699.wgz87uxy4c.astroid@bobo.none> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AcoByQLg3K9eWGmLBkq27NYptffJ_uUy X-Proofpoint-ORIG-GUID: GoL5SOeyY_mqUWxSmD4LKnruQMbJag3W X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-27_02,2021-10-26_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 adultscore=0 suspectscore=0 malwarescore=0 spamscore=0 mlxscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2110270048 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 27/10/2021 à 05:29, Nicholas Piggin a écrit : > Excerpts from Laurent Dufour's message of October 27, 2021 2:27 am: >> When handling the Watchdog interrupt, long processing should not be done >> while holding the __wd_smp_lock. This prevents the other CPUs to grab it >> and to process Watchdog timer interrupts. Furhtermore, this could lead to >> the following situation: >> >> CPU x detect lockup on CPU y and grab the __wd_smp_lock >> in watchdog_smp_panic() >> CPU y caught the watchdog interrupt and try to grab the __wd_smp_lock >> in soft_nmi_interrupt() >> CPU x wait for CPU y to catch the IPI for 1s in __smp_send_nmi_ipi() > > CPU y should get the IPI here if it's a NMI IPI (which will be true for >> = POWER9 64s). > > That said, not all platforms support it and the console lock problem > seems real, so okay. > >> CPU x will timeout and so has spent 1s waiting while holding the >> __wd_smp_lock. >> >> A deadlock may also happen between the __wd_smp_lock and the console_owner >> 'lock' this way: >> CPU x grab the console_owner >> CPU y grab the __wd_smp_lock >> CPU x catch the watchdog timer interrupt and needs to grab __wd_smp_lock >> CPU y wants to print something and wait for console_owner >> -> deadlock >> >> Doing all the long processing without holding the _wd_smp_lock prevents >> these situations. > > The intention was to avoid logs getting garbled e.g., if multiple > different CPUs fire at once. > > I wonder if instead we could deal with that by protecting the IPI > sending and printing stuff with a trylock, and if you don't get the > trylock then just return, and you'll come back with the next timer > interrupt. That sounds a bit risky to me, especially on large system when system goes wrong, all the CPU may try lock here. Furthermore, now operation done under the lock protection are quite fast, there is no more spinning like the delay loop done when sending an IPI. Protecting the IPI sending is a nightmare, if the target CPU is later play with the lock we are taking during the IPI processing, furthermore, if the target CPU is not responding the sending CPU is waiting for 1s, which slows all the system due to the lock held. Since I do a copy of the pending CPU mask and clear it under the lock protection, the IPI sending is safe while done without holding the lock. Regarding the interleaved traces, I don't think this has to be managed down here, but rather in the printk/console path. Cheers, Laurent. > > Thanks, > Nick > >> >> Signed-off-by: Laurent Dufour >> --- >> arch/powerpc/kernel/watchdog.c | 31 +++++++++++++++++-------------- >> 1 file changed, 17 insertions(+), 14 deletions(-) >> >> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c >> index f9ea0e5357f9..bc7411327066 100644 >> --- a/arch/powerpc/kernel/watchdog.c >> +++ b/arch/powerpc/kernel/watchdog.c >> @@ -149,6 +149,8 @@ static void set_cpu_stuck(int cpu, u64 tb) >> >> static void watchdog_smp_panic(int cpu, u64 tb) >> { >> + cpumask_t cpus_pending_copy; >> + u64 last_reset_tb_copy; >> unsigned long flags; >> int c; >> >> @@ -161,29 +163,32 @@ static void watchdog_smp_panic(int cpu, u64 tb) >> if (cpumask_weight(&wd_smp_cpus_pending) == 0) >> goto out; >> >> + cpumask_copy(&cpus_pending_copy, &wd_smp_cpus_pending); >> + last_reset_tb_copy = wd_smp_last_reset_tb; >> + >> + /* Take the stuck CPUs out of the watch group */ >> + set_cpumask_stuck(&wd_smp_cpus_pending, tb); >> + >> + wd_smp_unlock(&flags); >> + >> pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n", >> - cpu, cpumask_pr_args(&wd_smp_cpus_pending)); >> + cpu, cpumask_pr_args(&cpus_pending_copy)); >> pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n", >> - cpu, tb, wd_smp_last_reset_tb, >> - tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000); >> + cpu, tb, last_reset_tb_copy, >> + tb_to_ns(tb - last_reset_tb_copy) / 1000000); >> >> if (!sysctl_hardlockup_all_cpu_backtrace) { >> /* >> * Try to trigger the stuck CPUs, unless we are going to >> * get a backtrace on all of them anyway. >> */ >> - for_each_cpu(c, &wd_smp_cpus_pending) { >> + for_each_cpu(c, &cpus_pending_copy) { >> if (c == cpu) >> continue; >> smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000); >> } >> } >> >> - /* Take the stuck CPUs out of the watch group */ >> - set_cpumask_stuck(&wd_smp_cpus_pending, tb); >> - >> - wd_smp_unlock(&flags); >> - >> if (sysctl_hardlockup_all_cpu_backtrace) >> trigger_allbutself_cpu_backtrace(); >> >> @@ -204,6 +209,8 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) >> unsigned long flags; >> >> wd_smp_lock(&flags); >> + cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck); >> + wd_smp_unlock(&flags); >> >> pr_emerg("CPU %d became unstuck TB:%lld\n", >> cpu, tb); >> @@ -212,9 +219,6 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) >> show_regs(regs); >> else >> dump_stack(); >> - >> - cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck); >> - wd_smp_unlock(&flags); >> } >> return; >> } >> @@ -267,6 +271,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) >> return 0; >> } >> set_cpu_stuck(cpu, tb); >> + wd_smp_unlock(&flags); >> >> pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n", >> cpu, (void *)regs->nip); >> @@ -277,8 +282,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) >> print_irqtrace_events(current); >> show_regs(regs); >> >> - wd_smp_unlock(&flags); >> - >> if (sysctl_hardlockup_all_cpu_backtrace) >> trigger_allbutself_cpu_backtrace(); >> >> -- >> 2.33.1 >> >>