Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3477658rdb; Wed, 13 Sep 2023 13:18:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZe5dHQMiu9RFCWvcZqbbTG5irrEDLFIyq1S4YPgURi/+0XtYszqL/7s1DFsrwhyVCVYbJ X-Received: by 2002:a05:6a20:9758:b0:153:860e:47ef with SMTP id hs24-20020a056a20975800b00153860e47efmr3180749pzc.47.1694636293664; Wed, 13 Sep 2023 13:18:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694636293; cv=none; d=google.com; s=arc-20160816; b=nwGxzKvWaWLhM5+o//iYszz3g2AZmpKkU8F1WuBW4YzU+zjG37kXYDEmbvURVcdweG IgFiEvZWClIpmRSFEMx0ItLLBd7D8mFSxBiYm3eammwbhnM14UnIrwvRSnmEOTV+6sR4 Ddrn9LX5W82QR9FZttTsZLFr3655vQrX0vtUKjBkRtN4mvLTyMElCE7RoA2fSdOlNpoj OuYDAlJrFypj2t2iQUTMvtwi6wzweHMX8FyK+Hbmvv/ucWw8OeeMeK0qMjVcby40bzD5 2CS/COPwK0Hzid6eXWvNkcgJ6nEyd0Uh/Zh63WtX0xNrKVImaI1vKbSpfI5n/NIfFKnw l1Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=KQTq4W11zAq6/VmzWenJDuIvqlLl48lsbfVr7xulrLw=; fh=0wyT/zyP+Ec2ctGHNY3pGovSekYrpi3Eks6QJoVyJ3E=; b=SZYhR30q7MxVwkrORv6hEGSOuWV4A+DDuJjpoRHpIuhu4AvLZ5CG6OD+PlW5Y+wQoO x1nbs83GfJjPENIef5aP4ECqb+Y+offGUxAnTETdI2yQ9Mk22tEb49WkSkM8n8SqVjEU sYZrqYTrjhICOSdPBRueXI9yvhV0IgB0jzOZObqCSM7cH68sdyHzuBHodAD0SxrmSSUr mHBuxPUrimuHisPOuIbNCiXMVfPlS6K6Zks9Hqh7sNzL5zzanCeBNXXYdotbw8GMo5px aDmm1F/vvLAtrDLvBPFsa6ij+HhD5DCNbQPDfDGnzHqOirY4ej93aS+eLPSELAa5MRfI ODKw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id l191-20020a6388c8000000b00537c6c285a0si8034849pgd.167.2023.09.13.13.18.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 13:18:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 5C56C802A3DC; Wed, 13 Sep 2023 13:18:11 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231866AbjIMUSL convert rfc822-to-8bit (ORCPT + 99 others); Wed, 13 Sep 2023 16:18:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229527AbjIMUSK (ORCPT ); Wed, 13 Sep 2023 16:18:10 -0400 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A9331BC6 for ; Wed, 13 Sep 2023 13:18:06 -0700 (PDT) Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qgWIo-0006uU-18; Wed, 13 Sep 2023 16:17:34 -0400 Message-ID: <925761974f452b3f7afa98f96cf6762dc8d89dba.camel@surriel.com> Subject: Re: [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for too long From: Rik van Riel To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, "Paul E. McKenney" , Valentin Schneider , Juergen Gross Date: Wed, 13 Sep 2023 16:17:34 -0400 In-Reply-To: <20230913161749.GK692@noisy.programming.kicks-ass.net> References: <20230821160409.663b8ba9@imladris.surriel.com> <20230913132251.GE22758@noisy.programming.kicks-ass.net> <2189326aaca37487b17eb1103830156ff1684c27.camel@surriel.com> <20230913161749.GK692@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 Sender: riel@surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 13 Sep 2023 13:18:11 -0700 (PDT) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email On Wed, 2023-09-13 at 18:17 +0200, Peter Zijlstra wrote: > On Wed, Sep 13, 2023 at 10:33:51AM -0400, Rik van Riel wrote: > > > > > It's more fun than that. We're seeing this on bare metal. > > Oh, 'fun' indeed, *groan*. > > > Unfortunately, when a system gets wedged that way currently, > > it ends up being power cycled automatically, and we aren't > > getting crash dumps with clues on what causes the issue. > > > > Doing a BUG_ON() + panic, followed by a kexec into the kdump > > kernel will hopefully give us some clues on what might be > > causing the issue. > > I'm conflicted on the need to push such a debug patch upstream, otoh. > given the amount of debug code already in csd, why not. > > But yeah, curious hear what comes out of this. > Oh, there's more to it than just debugging the issue. This will also help recover systems faster, since they will end up panicking, kdumping, and rebooting, faster than the "hey, that system looks like it's stuck" power cycling scripts can get to it. -- All Rights Reversed.