Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5782182rwr; Mon, 24 Apr 2023 08:55:14 -0700 (PDT) X-Google-Smtp-Source: AKy350aprgjF0JacdMexrpHy5E41tXpooXskIYUP7jW8ctp0tOOiVsnfW64qMdxcpn3x6cqk/8PA X-Received: by 2002:a17:90a:ac18:b0:249:6086:a301 with SMTP id o24-20020a17090aac1800b002496086a301mr14250798pjq.27.1682351714204; Mon, 24 Apr 2023 08:55:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682351714; cv=none; d=google.com; s=arc-20160816; b=RJOK9VDAW1zek12OCMpzBHJYdkbejKhwcie3dn79CioutlHXkN1ZshREkKOw4VBugi mrxGrZ8hCxdIz84ViG2oql941NZ2sUzEPiRxro1Yw2lb0LbzO6TfTV6KrfQBYpw0XaYS vz8pWNSG6pWdCMLWnPcI1EgKinJ7sVR66pWVQs+gIukWrNyzK59vGSKmntSqV6Mpc+jw ZEL9MT7HEHYQXLm560HGXpS06ItEc4C4tufe3+IrL2S+9tALwA5TK2TwMQaPMAIYl1ba 3x5YRM0ZO3NMnZkQan6psRwAviWB0UbI9RykviE9vDMn7w2BRkU24YlQOt/EGkdhfCCs 88Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=hxcjgeLi5mR/+Rnc6mrQlDv9TmBqIx7WXWMzAgilAKE=; b=OCAYUneh6NGgd/lM05uiC7L8nHBIfbaszciHAEMyRPy5UGZupa1wpzN6rqGeSF8uWG u8w/UQaxiABmoA7UowFqO60YrYLpx95VJSOloiQwTFOl1fhEg6szQPYdOhMh6YkQSDe2 vLKnL/zYcBSNZtzoKSAdtIgZrOD4LH6QKIhxENbFJ6foP36mTs2ZzPWUUvpOrsAaYpDm HGs97SbQiKNuvObsz6tWviQdnbw3lnpt0DcRkAxT1qCjjroj65iEvpzt7/ELgB4i2fwO Oq1sHyP8TUXrN12pBBo6IpPblV4X7NaqIHf1HpMOEmkv4rx6ZDaykLopfm5By5A8qpvF pCMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=GrilttQI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oj13-20020a17090b4d8d00b0024b960de7b1si5790366pjb.71.2023.04.24.08.55.02; Mon, 24 Apr 2023 08:55:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=GrilttQI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231579AbjDXPsj (ORCPT + 99 others); Mon, 24 Apr 2023 11:48:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231779AbjDXPsh (ORCPT ); Mon, 24 Apr 2023 11:48:37 -0400 Received: from mail-io1-xd30.google.com (mail-io1-xd30.google.com [IPv6:2607:f8b0:4864:20::d30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AD7183E6 for ; Mon, 24 Apr 2023 08:48:35 -0700 (PDT) Received: by mail-io1-xd30.google.com with SMTP id ca18e2360f4ac-760ed95b9e6so100295639f.1 for ; Mon, 24 Apr 2023 08:48:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1682351314; x=1684943314; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hxcjgeLi5mR/+Rnc6mrQlDv9TmBqIx7WXWMzAgilAKE=; b=GrilttQIBtixqe5NQCftFI28cpKq89pHISF0NLxKcDgwY0XBw99bzQjkJW49DzkUrJ 5fShbW47Ce7KY5A9oJvbnITjc6ZfCiLOGCzC2rlrHkzxDqC2XyWzxEWGnYuISNkM2XES RpfeY1AnW+ysoweFBeek5JD/O+twCi29z3H/0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682351314; x=1684943314; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hxcjgeLi5mR/+Rnc6mrQlDv9TmBqIx7WXWMzAgilAKE=; b=a8MO5oNfHdJ4aM1HepIB6v/Ptqt9nvJqXwGGhicKJxlBVZiuuFowKV7y7iDuxcOlsE Gl61ZE6VjvSLKbjwYpp1LZf3cKpMjYGNd/xohR6LsysroorXpRI50Yg5KWiCfe6/trig Nhmqol19fZPV76ig/cindBPqnwGhqq6xgRZJSa0Y0rJvwOVrOdDGKNgkx5BMyGszzxhp TBNTVG4EfLfabme1Lz71ljyr5fEY6babynevWEowioMBuEO2M2e+aAUHIrh0n+sKHTHo +M6/H+n6jlJuq56/oJWeHS1b2cZee6Qtu8PC56/mhbsjyKCzQ8QYa//CrW1YBoqfQRBs XD8g== X-Gm-Message-State: AAQBX9c4VHsKQLUXMLJHsP+7xCQXv7doxgkYYLoQi5l24rI//NO3dfJs enhYNUA15DygVfRjSuLWQqqwtXWluWvdAo8NPSA= X-Received: by 2002:a6b:db0a:0:b0:74c:b348:738 with SMTP id t10-20020a6bdb0a000000b0074cb3480738mr6201446ioc.11.1682351314543; Mon, 24 Apr 2023 08:48:34 -0700 (PDT) Received: from mail-il1-f182.google.com (mail-il1-f182.google.com. [209.85.166.182]) by smtp.gmail.com with ESMTPSA id r23-20020a5d96d7000000b007079249a9d1sm3339789iol.34.2023.04.24.08.48.34 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Apr 2023 08:48:34 -0700 (PDT) Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-328cb023b1dso1064545ab.0 for ; Mon, 24 Apr 2023 08:48:34 -0700 (PDT) X-Received: by 2002:a05:6e02:1606:b0:315:8e3a:f546 with SMTP id t6-20020a056e02160600b003158e3af546mr480218ilu.6.1682350901412; Mon, 24 Apr 2023 08:41:41 -0700 (PDT) MIME-Version: 1.0 References: <20230421155255.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid> <20230424125355.GA4054@aspen.lan> In-Reply-To: <20230424125355.GA4054@aspen.lan> From: Doug Anderson Date: Mon, 24 Apr 2023 08:41:28 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] hardlockup: detect hard lockups using secondary (buddy) cpus To: Daniel Thompson Cc: Petr Mladek , Andrew Morton , Lecopzer Chen , Stephen Boyd , Chen-Yu Tsai , linux-arm-kernel@lists.infradead.org, kgdb-bugreport@lists.sourceforge.net, Marc Zyngier , linux-perf-users@vger.kernel.org, Mark Rutland , Masayoshi Mizuma , Will Deacon , ito-yuichi@fujitsu.com, Sumit Garg , Catalin Marinas , Colin Cross , Matthias Kaehlcke , Guenter Roeck , Tzung-Bi Shih , Alexander Potapenko , AngeloGioacchino Del Regno , Dan Williams , Geert Uytterhoeven , Ingo Molnar , John Ogness , Josh Poimboeuf , Juergen Gross , Kees Cook , Laurent Dufour , Liam Howlett , Marco Elver , Matthias Brugger , Michael Ellerman , Miguel Ojeda , Nathan Chancellor , Nick Desaulniers , "Paul E. McKenney" , Peter Zijlstra , Randy Dunlap , Rasmus Villemoes , Sami Tolvanen , Stefano Stabellini , Vlastimil Babka , Zhaoyang Huang , Zhen Lei , linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Mon, Apr 24, 2023 at 5:54=E2=80=AFAM Daniel Thompson wrote: > > On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote: > > From: Colin Cross > > > > Implement a hardlockup detector that can be enabled on SMP systems > > that don't have an arch provided one or one implemented atop perf by > > using interrupts on other cpus. Each cpu will use its softlockup > > hrtimer to check that the next cpu is processing hrtimer interrupts by > > verifying that a counter is increasing. > > > > NOTE: unlike the other hard lockup detectors, the buddy one can't > > easily provide a backtrace on the CPU that locked up. It relies on > > some other mechanism in the system to get information about the locked > > up CPUs. This could be support for NMI backtraces like [1], it could > > be a mechanism for printing the PC of locked CPUs like [2], or it > > could be something else. > > > > This style of hardlockup detector originated in some downstream > > Android trees and has been rebased on / carried in ChromeOS trees for > > quite a long time for use on arm and arm64 boards. Historically on > > these boards we've leveraged mechanism [2] to get information about > > hung CPUs, but we could move to [1]. > > On the Arm platforms is this code able to leverage the existing > infrastructure to extract status from stuck CPUs: > https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html Yup! I wasn't explicit about this, but that's where you end up if you follow the whole bug tracker item that was linked as [2]. Specifically, we used to have downstream patches in the ChromeOS that just reached into the coresight range from a SoC specific driver and printed out the CPU_DBGPCSR. When Brian was uprevving rk3399 Chromebooks he found that the equivalent functionality had made it upstream in a generic way through the coresight framework. Brian confirmed it was working on rk3399 and made all of the device tree changes needed to get it all hooked up, so (at least for that SoC) it should work on that SoC. [2] https://issuetracker.google.com/172213129