Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3283661rdb; Wed, 13 Sep 2023 07:40:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEZsNF7n7ZNM6VwENihUmcuiUDXpCnJDCvRzS8XUcgowS1PoBQbjUxqRb9tVshfpP8diLjg X-Received: by 2002:aa7:88c8:0:b0:68f:c89b:bff5 with SMTP id k8-20020aa788c8000000b0068fc89bbff5mr3275327pff.7.1694616046516; Wed, 13 Sep 2023 07:40:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694616046; cv=none; d=google.com; s=arc-20160816; b=KbNO0jSPou5kPUOMNoTlsvUdolnQOY3PSELgHPxn9DfOcteM01KdC6wmyucmJ9fqsR PVqbu3aBFlmP+UMyJOlVxEZfN7yoTijSifjOx1SRmb4LdWwjMT2Xpvmr+pAuj+q7Q27G fGp0PLfRsMcNl/ZnDWJQMRmRONuv1WNpAZ79FpYzPqZ8fOUhOm/HceQ821fmiIXQZQ3l xn9aJWH3LEUuFadiXkueKxVLgLMrIYlgmUyN+pEXOpoQhhtnO95n8c54ItIQczwgRJ4H bS+dnZ6ceeQMAq3H8wX3LqcsMpm/LjgA89oW9auan8viVWyRgtb+oSzHEmxHRT2zY/NM +GPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=kymXfNjoSevxwCVh+dFaMV+QPagIEpxebqEiFbfBtO8=; fh=0wyT/zyP+Ec2ctGHNY3pGovSekYrpi3Eks6QJoVyJ3E=; b=UPzIcw1QdTwSpSRO++ZiyN8uPEovl7rwG5wtWG8fpohWPl8BWdrlISbRMHxHuIdSaj qlzkbcsE7/Ccp5ZUURekMyItHubMb0/XenmMPhjbrPYTKV5XEQKMoj1QGLsDFw2rXprQ 7aw+f1oS3OLvR5dYCbj+gyMHdol6PRsOcmZDpMbxiC5zh0f/hkgG4byPHbnuIehFvw/o ksM/xmBOEMbHfyK7bel9GUnlRICp6ChtKROkQ9GjCh4kfoO3m08+uZqc2nLqaXUK/WYh OAose4djEirWhWQNUJL5RHJp2bizrBjlnz3y2WbH3cmwua+U1XuGPXgq/B7vt7yx1uck eHrA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id i184-20020a6387c1000000b00574042dfdecsi7783731pge.191.2023.09.13.07.40.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 07:40:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 3714A8025452; Wed, 13 Sep 2023 07:34:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237604AbjIMOed convert rfc822-to-8bit (ORCPT + 99 others); Wed, 13 Sep 2023 10:34:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235721AbjIMOed (ORCPT ); Wed, 13 Sep 2023 10:34:33 -0400 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE75991 for ; Wed, 13 Sep 2023 07:34:28 -0700 (PDT) Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qgQwB-0003ew-2k; Wed, 13 Sep 2023 10:33:51 -0400 Message-ID: <2189326aaca37487b17eb1103830156ff1684c27.camel@surriel.com> Subject: Re: [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for too long From: Rik van Riel To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, "Paul E. McKenney" , Valentin Schneider , Juergen Gross Date: Wed, 13 Sep 2023 10:33:51 -0400 In-Reply-To: <20230913132251.GE22758@noisy.programming.kicks-ass.net> References: <20230821160409.663b8ba9@imladris.surriel.com> <20230913132251.GE22758@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 Sender: riel@surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 13 Sep 2023 07:34:42 -0700 (PDT) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email On Wed, 2023-09-13 at 15:22 +0200, Peter Zijlstra wrote: > On Mon, Aug 21, 2023 at 04:04:09PM -0400, Rik van Riel wrote: > > > > +       /* How long since this CSD lock was stuck. */ > > +       ts_delta = ts2 - ts0; > > > > +       /* > > +        * If the CSD lock is still stuck after 5 minutes, it is > > unlikely > > +        * to become unstuck. Use a signed comparison to avoid > > triggering > > +        * on underflows when the TSC is out of sync between > > sockets. > > +        */ > > +       BUG_ON((s64)ts_delta > 300000000000LL); > >         if (cpu_cur_csd && csd != cpu_cur_csd) { > >                 pr_alert("\tcsd: CSD lock (#%d) handling prior > > %pS(%ps) request.\n", > >                          *bug_id, READ_ONCE(per_cpu(cur_csd_func, > > cpux)), > > How are you guys still seeing this? I thought the KVM APIC thing was > fixed a while ago? > It's more fun than that. We're seeing this on bare metal. Unfortunately, when a system gets wedged that way currently, it ends up being power cycled automatically, and we aren't getting crash dumps with clues on what causes the issue. Doing a BUG_ON() + panic, followed by a kexec into the kdump kernel will hopefully give us some clues on what might be causing the issue. -- All Rights Reversed.