Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp622114rdb; Fri, 6 Oct 2023 13:26:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEUgGzUT7nY0LYlGleeSFeHsRrMb0QxskNlyKP1SZ+DzNJ22R2z8iXWnucX3Gvz6iLdXqCb X-Received: by 2002:a05:6358:e49f:b0:143:9827:5f71 with SMTP id by31-20020a056358e49f00b0014398275f71mr9577905rwb.8.1696623977556; Fri, 06 Oct 2023 13:26:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696623977; cv=none; d=google.com; s=arc-20160816; b=AueKmqU7Juw4+ryXRVKp85ocAdZ1OIew/CogjKJPvVx3QEN0QZkosCz53jxxSPeYrW 0M5ktNz8SSU5dxy+LKtKSitNArLVfMRIqKNxhiW0X2JIb/4aXfLZvj92IYuI6FY0uyh4 Yv+fcQZVbUYq423TRMPYDEih+nbc//2FyhfcF7r19kPAb5xYHoxFPhGeCnWrCTjl0XD6 W2jlFnaFYT7WFqP0yg0cXHGsLgjGXsIc6DEpiNpuG9wKvYllpl+mcJPoGCFBZ6jlvBZk R7KwpZurEegCYqRZWtsUJYWUmdT9apSutqcpLaMR3limGwx5MHeqRsefz8TewY8GhPbe iWYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=7G78X+sN9e0GOxaoG5RLtY6VaeGv+an+IZccjSq7LF8=; fh=1VDmDB9L9/SuzYSmhvm6yQycDFMiasIUimLCm8QSjCA=; b=dzqMXNFGjxOHOicZEpxTMgP7H1k3kTnVSYgR3+/wquQyVa+WfNVjM+fcytaM5nSaIR NSf+HNh6LhtqGGPY2s/baf9jn9gCz4SHbnx5u7kL1qWDXInK0/wdNyYgtUVqwxXWTLT2 X1D4DNmqnLpl6BJTyuY/ZQN9D/s6Bnu0ZEUNmgg2HFzIYihM6PhurMBKaX6SiH600CFK 0TYrFeqFg34lLvA8cwHaTeGfpwrTiS3ZHXkqfeQkv0v1fJp/JWOosI72tUtM4audsvMA qOxzQHcJ01NbsmAYzRXc3ocebwWTNaONzHBxyXcu90Yuq4CypdfVZU7+H+d1HiKUxle8 gicA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=uAEV6jJl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id be3-20020a656e43000000b00573f9a427d6si4287200pgb.450.2023.10.06.13.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 13:26:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=uAEV6jJl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id F369C820D5FE; Fri, 6 Oct 2023 13:26:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233443AbjJFU0G (ORCPT + 99 others); Fri, 6 Oct 2023 16:26:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231912AbjJFU0F (ORCPT ); Fri, 6 Oct 2023 16:26:05 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46471BD for ; Fri, 6 Oct 2023 13:26:01 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3E05C433C7; Fri, 6 Oct 2023 20:26:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1696623960; bh=QPzTmb5/1DB0NMquULcALN2ygsW27WAkF/x0WUgo1LA=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=uAEV6jJlml6Gzejf+VQJTL/pc7ei0O4nXg6LcVHsRkT2ncLGEztegJugOg78k78Cw ztsf9tjToi/TP0APFDu1aKhlOTdIzKeVu07nwJXv4PDSikYPmJZzdp6dXwcGqZadKB eg1VUOdDeRthW5qVbLQyyEwrY82PHT0y2EvI1+D1LX/FO2jU+VCy2XoukYmC8iST3B iakI/dLCTeAPX+RGN1t6KiPikrm30JG4il5TngYJUhmdqgX31BE76DfObL9grts63q M8NViw7SmLBdW90+Q5G9mFOg4as7M1LgGsV8Ks4SIr8pQP+nsaqoaBfEfvnOMKe8Jh lzaL5wzRpEipQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 607E2CE0BAE; Fri, 6 Oct 2023 13:26:00 -0700 (PDT) Date: Fri, 6 Oct 2023 13:26:00 -0700 From: "Paul E. McKenney" To: Jonas Oberhauser Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Valentin Schneider , Juergen Gross , Leonardo Bras , Imran Khan Subject: Re: [PATCH smp,csd] Throw an error if a CSD lock is stuck for too long Message-ID: Reply-To: paulmck@kernel.org References: <53b8065f-1f65-6956-279c-05bd461a7284@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53b8065f-1f65-6956-279c-05bd461a7284@huaweicloud.com> X-Spam-Status: No, score=2.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 06 Oct 2023 13:26:15 -0700 (PDT) X-Spam-Level: ** On Fri, Oct 06, 2023 at 08:48:23PM +0200, Jonas Oberhauser wrote: > Is this related to the qspinlock issue you described earlier? Kind of in that sometimes qspinlock issues trigger CSD-lock warnings, but not really directly related. Thanx, Paul > jonas > > > Am 10/5/2023 um 6:48 PM schrieb Paul E. McKenney: > > The CSD lock seems to get stuck in 2 "modes". When it gets stuck > > temporarily, it usually gets released in a few seconds, and sometimes > > up to one or two minutes. > > > > If the CSD lock stays stuck for more than several minutes, it never > > seems to get unstuck, and gradually more and more things in the system > > end up also getting stuck. > > > > In the latter case, we should just give up, so the system can dump out > > a little more information about what went wrong, and, with panic_on_oops > > and a kdump kernel loaded, dump a whole bunch more information about > > what might have gone wrong. > > > > Question: should this have its own panic_on_ipistall switch in > > /proc/sys/kernel, or maybe piggyback on panic_on_oops in a different > > way than via BUG_ON? > > > > Signed-off-by: Rik van Riel > > Signed-off-by: Paul E. McKenney > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > index 8455a53465af..059f1f53fc6b 100644 > > --- a/kernel/smp.c > > +++ b/kernel/smp.c > > @@ -230,6 +230,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 * > > } > > ts2 = sched_clock(); > > + /* How long since we last checked for a stuck CSD lock.*/ > > ts_delta = ts2 - *ts1; > > if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) > > return false; > > @@ -243,9 +244,17 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 * > > else > > cpux = cpu; > > cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */ > > + /* How long since this CSD lock was stuck. */ > > + ts_delta = ts2 - ts0; > > pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n", > > - firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0, > > + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts_delta, > > cpu, csd->func, csd->info); > > + /* > > + * If the CSD lock is still stuck after 5 minutes, it is unlikely > > + * to become unstuck. Use a signed comparison to avoid triggering > > + * on underflows when the TSC is out of sync between sockets. > > + */ > > + BUG_ON((s64)ts_delta > 300000000000LL); > > if (cpu_cur_csd && csd != cpu_cur_csd) { > > pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n", > > *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)), >