Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp1941911rdb; Tue, 5 Sep 2023 09:23:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHo89SDRx2NqmZLKwog7vx4JQ1HJcJ86OCOgtRUdSDQL6eqHkdkfpJ6vvI2R/+MR9Re1H70 X-Received: by 2002:a05:6512:15a6:b0:500:8ffe:7486 with SMTP id bp38-20020a05651215a600b005008ffe7486mr340681lfb.4.1693930986170; Tue, 05 Sep 2023 09:23:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693930986; cv=none; d=google.com; s=arc-20160816; b=vAixpYBnAt2YM6dgbUt8iM37LMYNtNS05a5jnkQ85gT7nd6wU1S9B8OV5mNOPte1AH 7dMeDHo07vXWasMWdT6mnmVP0AnEo1oIRnfwLZnooUbcH6R7V2LY1GeWAjDh00RAxnky ZcQsINQ04nVWVRpthDgF4dlXfdUPc9RRHZJPOPkx5C5+3I0c7gJ0fROGOKR925IH5sfZ dHuqs3BB+wo9tfaX1kRHCtNWMSUZSczEoBRVXgm86V4BZn45Ooe6wobKs4wqtjXBo4nM PdbCIRWp5VzvnrImuI09V+oofoVs4BDOGbxTERUJn+7FV11drGKLSuElhQ103NCVIXQR 0fZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Vv6vlTMJy2ho8geEqPUuJ2REKrug7GrLSZ8XpNgzcaA=; fh=b7KH+5pQDtRinp+9y3DGQ8ag+3JEyz28e2AODDCwBP4=; b=cVsdaI79n50fRJeLIQzpPpV+CxxxNSL1zKFiohfcGFCNEnMYY7fUtkVmk60zMFRbE9 O82ReeQVtDp9nFjmyJMVNEzzr5GcZ3MSgfD1sk63PlyZmKM0oCAkpiG4s7fWWVnq669E NGLnTT5RopLKO3t5x2yy26Wf7uQ6FzyjIDx6tPrkJbMdmu/x1tPUqvqjkZF4tG5bQdkK QRmZxm8llxasuIyTyVUx8QQw+prEBBchuLyJMtcXkGGFvZxamcxGGD1qtMmakmKPp0n5 cJ3BtkrrGVIWKOUF82mvP+giqfDntVTiTJFs2aGHoTbi5/AkE2OmtuRw1nhedDWVZVFF X+EA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=A2JZY0xa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z25-20020a1709067e5900b009a5c20114dasi7672399ejr.625.2023.09.05.09.22.58; Tue, 05 Sep 2023 09:23:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=A2JZY0xa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343980AbjHaNQ4 (ORCPT + 8 others); Thu, 31 Aug 2023 09:16:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234400AbjHaNQ4 (ORCPT ); Thu, 31 Aug 2023 09:16:56 -0400 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0B671A4 for ; Thu, 31 Aug 2023 06:16:52 -0700 (PDT) Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6bd3317144fso666924a34.1 for ; Thu, 31 Aug 2023 06:16:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1693487812; x=1694092612; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Vv6vlTMJy2ho8geEqPUuJ2REKrug7GrLSZ8XpNgzcaA=; b=A2JZY0xa1vyX9ulNoQLF6c/L8xe3CzgFEM91LZWXjHm03i3b8kQti8U+pZjYlpKfb0 i4541kP5eWYBTplk0UD2Hwsii6pPvWxWux6UxYNG+vbGymyr7JwSwFY3SSiPws0TYEU/ 3+N7NkH2UArOS9Datkz1E59r/wMhtNa205DHKfjlt1liwZL6sy087/njHUy7HBp67Gcg z3w2vSGiQwXnQvlsfJb1KEgGkK7WP2v0o1WmQL6zLf2QUQCht09iet/aUOkNNngyEwuz yD4kcifh0Vrn7Vn0TBRZGCM2Ogx3DIJlx0PpnDDsjdpo71UrIsUI79NviO5I0QY4xRQD yJkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693487812; x=1694092612; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Vv6vlTMJy2ho8geEqPUuJ2REKrug7GrLSZ8XpNgzcaA=; b=KZIvAytQXITjHK8RxuVtmzuRlxjXN5bIDrN1pmpU0QI1v4gZHdn6WsyaySh/NCGVQ/ tiCZjODg1QRjGJTMRo2/27DFNnWovO3B9gyJQjs5mtuAGAYunI004IYWrl7enDjJnjfs bl7JIojCMBEtTJGzBfkNY99aPRr38Js8X6y5kJy/JbGg3Tu4s6ph6rR61GQOpJjSq2Ir 5AO4kS+dZMDA5jSkqaWwa5rhuSlcKug0YEEJI+d/+1KLZXeF0KnBlc+Mms/kHWIstBhj N0+6p7MU+6mybdRQEBGnWfRzqQyHFNcLUDT8qo33VTrfaBCvhqGHOhqbqq5ykP2ZmQB2 oCKQ== X-Gm-Message-State: AOJu0Yy2kPS+ENjWBd1EFHwffVgrVFHZZqYLaYz2pGMeulS5H+yh85LC fgogldq6gfR5Lefixlm2McU12o1GwWlR63gsumZiQw== X-Received: by 2002:a05:6358:340f:b0:13a:4f34:8063 with SMTP id h15-20020a056358340f00b0013a4f348063mr5169461rwd.32.1693487812052; Thu, 31 Aug 2023 06:16:52 -0700 (PDT) MIME-Version: 1.0 References: <20230831101026.3122590-1-mark.rutland@arm.com> In-Reply-To: From: Sumit Garg Date: Thu, 31 Aug 2023 18:46:41 +0530 Message-ID: Subject: Re: [PATCH] lkdtm/bugs: add test for panic() with stuck secondary CPUs To: Mark Rutland Cc: linux-kernel@vger.kernel.org, dianders@chromium.org, keescook@chromium.org, swboyd@chromium.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 31 Aug 2023 at 18:38, Mark Rutland wrote: > > On Thu, Aug 31, 2023 at 06:15:29PM +0530, Sumit Garg wrote: > > Hi Mark, > > > > Thanks for putting up a test case for this. > > > > On Thu, 31 Aug 2023 at 15:40, Mark Rutland wrote: > > > > > > Upon a panic() the kernel will use either smp_send_stop() or > > > crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI, > > > which may or may not be an NMI. Generally it's preferable that this is an > > > NMI so that CPUs can be stopped in as many situations as possible, but > > > it's not always possible to provide an NMI, and there are cases where > > > CPUs may be unable to handle the NMI regardless. > > > > > > This patch adds a test for panic() where all other CPUs are stuck with > > > interrupts disabled, which can be used to check whether the kernel > > > gracefully handles CPUs failing to respond to a stop, and whe NMIs stops > > > > s/whe/when/ > > > > > work. > > > > > > For example, on arm64 *without* an NMI, this results in: > > > > > > | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT > > > | lkdtm: Performing direct entry PANIC_STOP_IRQOFF > > > | Kernel panic - not syncing: panic stop irqoff test > > > | CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4 > > > | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > > > | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4 > > > | Call trace: > > > | dump_backtrace+0x94/0xec > > > | show_stack+0x18/0x24 > > > | dump_stack_lvl+0x74/0xc0 > > > | dump_stack+0x18/0x24 > > > | panic+0x358/0x3e8 > > > | lkdtm_PANIC+0x0/0x18 > > > | multi_cpu_stop+0x9c/0x1a0 > > > | cpu_stopper_thread+0x84/0x118 > > > | smpboot_thread_fn+0x224/0x248 > > > | kthread+0x114/0x118 > > > | ret_from_fork+0x10/0x20 > > > | SMP: stopping secondary CPUs > > > | SMP: failed to stop secondary CPUs 0-3 > > > | Kernel Offset: 0x401cf3490000 from 0xffff800080000000 > > > | PHYS_OFFSET: 0x40000000 > > > | CPU features: 0x00000000,68c167a1,cce6773f > > > | Memory Limit: none > > > | ---[ end Kernel panic - not syncing: panic stop irqoff test ]--- > > > > > > On arm64 *with* an NMI, this results in: > > > > I suppose a more interesting test scenario to show difference among > > NMI stop IPI and regular stop IPI would be: > > > > - First put any CPU into hard lockup state via: > > $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > > > > - And then provoke following from other CPU: > > $ echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT > > I don't follow. IIUC that's only going to test whether a HW watchdog can fire > and reset the system? > > The PANIC_STOP_IRQOFF test has each CPU run panic_stop_irqoff_fn() with IRQs > disabled, and if one CPU is stuck in the HARDLOCKUP test, we'll never get all > CPUs into panic_stop_irqoff_fn(), and so all CPUs will be stuck with IRQs > disabled, spinning. > > The PANIC_STOP_IRQOFF test itself tests the different between an NMI stop IPI > and regular stop IPI, as the results in the commit message shows. Look for the > line above that says: > > | SMP: failed to stop secondary CPUs 0-3 > > ... which is *not* present in the NMI case (though we don't have an explicit > "stoppped all CPUs" message). Ah, I see your point as I missed that difference when I first looked up the panic() logs. So it's the post panic() CPU stop behaviour that we are testing here. Thanks for the explanation. -Sumit