Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp1932537rwo; Thu, 3 Aug 2023 02:00:07 -0700 (PDT) X-Google-Smtp-Source: APBJJlGqX5/fk264Omypb/PerboGrfr1hVK6jhwwoHeR8jUlS2Sucma+POxcB710kp3o7ZYlV1gz X-Received: by 2002:a17:906:2002:b0:98e:933:2909 with SMTP id 2-20020a170906200200b0098e09332909mr6769023ejo.22.1691053207591; Thu, 03 Aug 2023 02:00:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691053207; cv=none; d=google.com; s=arc-20160816; b=ICdXl1evaUwt0x45iZen9ar/NGxn5A5lXO4FdFHidKftDGcXlt8PlTscmBXH8xn4sR VruPgswKgBjx3iSwnqcuMvgdZCOGZr8uwXc/gP5mtJwHJYCbMwkOmBkCpqxVu9EAQ/68 TwHcCWsmKzauhdGX4meoU3FlgWZq/cGdX28+P8baDN7PA1+lJxqphgLULV2S8MMjPJsH Y2qiYQ5nuaoOjRF17N8OaN5hPV5JZ80YsFYpLkbUqxjR4OpZYd9ya+z1hSGMNHBIrJJl qHWuK4cZUqGBB4ff1Mw2cFOE60MgWVw0g2QK1WMQAjoHxuA2yIOkUovUfBk5HN0pMvHM mcEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=jCwNjlEPzwo3RqM6Rt46O7/xaBMXEl5SoX/GHbEUVNk=; fh=rmN0vDieyYsHoBj2ifSV+UObQ+B34WtKmsbYgj7nsbc=; b=i0+JuvzjssSwHW1HFkfa99nOZBAyREIaEKChDXM3g6+JeUufH8MgSCCtCi456epFCv kfLHkOTif8RIM1bf7QTrq/Ve6JG9/tnh1zWN8m2cq17frYevp99MZWa+Vq7Iv2cpuuO+ czaVRKb61d/u3dJMf//7gxBQ+5+wnVGFgN6e4bMueNFevggn8Hm9o6eibQmFgc5c91ZD ckbPaWJD4P84kygVBx0HW4HTNTkW6822X6BI3Feo0Ysy30UhDdYKq0HOaJ7AX+b4UK5x NKWJiMQFMcSFmjyRw0yM/vdNWvDZlqs7NfVRhrOWOrjyy5mRvqZx4mE6BxwzRu9Skz2m OASA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=fNh9Zu9f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id um26-20020a170906cf9a00b0099bc58ab34csi12172760ejb.135.2023.08.03.01.59.43; Thu, 03 Aug 2023 02:00:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=fNh9Zu9f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233825AbjHCISD (ORCPT + 99 others); Thu, 3 Aug 2023 04:18:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233847AbjHCIRN (ORCPT ); Thu, 3 Aug 2023 04:17:13 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6A243A8C for ; Thu, 3 Aug 2023 01:12:16 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 7E62621901; Thu, 3 Aug 2023 08:12:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1691050335; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jCwNjlEPzwo3RqM6Rt46O7/xaBMXEl5SoX/GHbEUVNk=; b=fNh9Zu9fWa+sHYKqVsXEbll0njtoGVPbKKSizO4R4NJyTdEHrv2DnFp6WLrTg4iM1tI/oH pxx/HVRZTBrMsUPypbZX+LuFWByULmihI1T7D4frMpouZEjrUmLGkwUGUVGTbnNrIShquT Gm1CXoLu6GSdoK+esPWhjmbEOcFNY20= Received: from suse.cz (pmladek.udp.ovpn2.prg.suse.de [10.100.201.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 1A1AA2C142; Thu, 3 Aug 2023 08:12:15 +0000 (UTC) Date: Thu, 3 Aug 2023 10:12:12 +0200 From: Petr Mladek To: Doug Anderson Cc: Michal Hocko , Andrew Morton , kernel test robot , Lecopzer Chen , Pingfan Liu , linux-kernel@vger.kernel.org Subject: Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check() Message-ID: References: <20230731091754.1.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2023-08-02 07:12:29, Doug Anderson wrote: > Hi, > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko wrote: > > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > > [...] > > > Ah, I see what you mean. The one issue I have with your solution is > > > that the ordering of the stack crawls is less ideal in the "dump all" > > > case when cpu != this_cpu. We really want to see the stack crawl of > > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > > With your solution the locked up CPU will be interspersed with all the > > > others and will be harder to find in the output (you've got to match > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > > While that's probably not a huge deal, it's nicer to make the output > > > easy to understand for someone trying to parse it... > > > > Is it worth to waste memory for this arguably nicer output? Identifying > > the stack of the locked up CPU is trivial. > > I guess it's debatable, but as someone who has spent time staring at > trawling through reports generated like this, I'd say "yes", it's > super helpful in understanding the problem to have the hung CPU first. > Putting the memory usage in perspective: nmi_trigger_cpumask_backtrace() has its own copy of the cpu mask. What about changing the @exclude_self parameter to @exclude_cpu and do: if (exclude_cpu >= 0) cpumask_clear_cpu(exclude_cpu, to_cpumask(backtrace_mask)); It would require changing also arch_trigger_cpumask_backtrace() to void arch_trigger_cpumask_backtrace(const struct cpumask *mask, int exclude_cpu); but it looks doable. It might theoretically change the behavior because nmi_trigger_cpumask_backtrace() does this_cpu = get_cpu(); I guess that it was used to make sure that smp_processor_id() the same. But it is too late. It should ignore the callers CPU. So, this might actually fix a possible race. Note that get_cpu() also disabled preemption. IMHO, it is not strictly needed. But it might make sense to keep it because the backtraces are printed when something goes wrong and it should print backtraces from the current state. Best Regards, Petr