Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1643796ybt; Sat, 27 Jun 2020 15:16:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwD77y5K2netAtvBYfUbENVtoeM6KFJmR+zcTabX8XmOmIOxHbY+hlmLEgvvYtJHq5vpkm0 X-Received: by 2002:a17:906:2304:: with SMTP id l4mr7920983eja.104.1593296207666; Sat, 27 Jun 2020 15:16:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593296207; cv=none; d=google.com; s=arc-20160816; b=M6tIwYtQ8WpByMTjH58FLCi6qCyuSE5VSwvUkt8q4BHO99gCAm7OxiQfEfhMS5LsbH NEthFNBMan9C+gsDrX5swzJXhad3pL5y/OVhCqT3x96R2SQjPNYingIFubTq3NnfxEeb CxC2RcAm2U9hIM2hpLsAbt4fcb6y9k3y34sLIc4vhquaGCg6MobeboMJ9LKiNxz7tlNr xq+ToXxLCqPZMgsVUQ9I5qEAJxy+XxdoO3PpyJjlXM0K5X8LpSHVsdrRWQ7dGNhDINfV +VNGx87aYQ/ieQ5IcOrx1gCc3yyKUfIz1n/ureLAJ6dcUQjihrmU3TLpEq5zXRJ7RHo5 7Cig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:in-reply-to:cc:references:message-id :date:subject:mime-version:from:content-transfer-encoding :dkim-signature; bh=Ea+WAVW1cojpak2BnE/W8a2xxeVZM5iKPEaLgDFR2Ys=; b=ziesA3Z5s9M1pcDZw8YfSwEIrHhccGNnDH1tdBDQL2pQDS8L596q6W4AMyVNAOZweb vEYZjYj63ievThk9udl4l//BmJqxtid1gIrifX5VIjdPURAkvFwoIowHv7j+T+KfyipV uyHUUx5saGOee/Js3qk1cipVDT0GOUGCg9D7VhuoY+ZDK0QJvwM6tEHlcJ/uaP7incxA HeYdRfg7YGVy4AdNHQIKZWbcUlTlikkqs9Y6EtHuSocOgRr90MOGfEgX6VFiv+NZOYvt XCZ7VTEZeq9r4FCG45bfyvkrPFd9ZzxcSTjcNhr7+bP75UzJWARKXk8N54zmuVjNYOBo 7Txw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b="veU/BlHD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ef2si320095ejb.636.2020.06.27.15.16.25; Sat, 27 Jun 2020 15:16:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b="veU/BlHD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725940AbgF0WOS (ORCPT + 99 others); Sat, 27 Jun 2020 18:14:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725912AbgF0WOS (ORCPT ); Sat, 27 Jun 2020 18:14:18 -0400 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16CD9C061794 for ; Sat, 27 Jun 2020 15:14:18 -0700 (PDT) Received: by mail-pf1-x443.google.com with SMTP id f9so6244476pfn.0 for ; Sat, 27 Jun 2020 15:14:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=Ea+WAVW1cojpak2BnE/W8a2xxeVZM5iKPEaLgDFR2Ys=; b=veU/BlHDO8EFjWqir2bTt6RPxgkjGz1OZ7pEWVjsWCI9+TVnML8nP8JkynGAe8/hO2 hO54A/p6HvZqllq71ivU8oxeMPpeMrQTAsB6EDdyxhOdGnugGmCSd+IetNdsszMxnYMz OcxrmiWaDLMWuTVR+GL5RqiDB7g26Bh0pQZU+kGAbd5cz5GiqEU2qT6XtvdYJLqHlYmc R1uzpfPFgwjgRwVfA4+hSfnQgRo+ohIltL3AR4q0nyLMM0sAgwD1bFYAuD+870PMwzzk colYVIfvxel04n6FJgsW8c+fyersBKzMDANMkStZXQVc7OeA9ijzlGsQw6YJIRjgWR3n qIzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=Ea+WAVW1cojpak2BnE/W8a2xxeVZM5iKPEaLgDFR2Ys=; b=VH39WUgK7N5D5BWH3ejFvtT78FebYTjI+B5yZ5YN4yVc6G4Za+tySms4bWG7ttrtz/ jFcjVS+QHOmJyEF2JU7EAKT3E0ivzZcHH97ywYMTbH6Fi9trOLoTx51vDD8pi9uL7Ydu Oy5ixN7Q90move+6Xbm2lwJDpnff78IMqcGAGuizK+HIO7SOArkO0k+IdoT53ranf/Yb A2HGHKCOIG+t7WW1tpgRXkzjwwRhoAu+/JwnPBOe9FFCjLMrePcvpyZ7eOqsqUoGxzsf pgmddLoMv+I7LENF0ZIMZtcsk0hT0frcN0we3nMdeiRXthsAiC+HbyON+DZfE6dBYGym 84dQ== X-Gm-Message-State: AOAM531w9g4aT4GQpM2QKvhbSsDOFcuofljInlUMqxOAs6oehVdJ6iT1 OIdPL7ybmc3fitzjgS4MHtQbP9/dEFA= X-Received: by 2002:a65:6884:: with SMTP id e4mr4516364pgt.283.1593296057450; Sat, 27 Jun 2020 15:14:17 -0700 (PDT) Received: from ?IPv6:2600:1010:b06c:c4ce:a9fe:12c:6f82:8006? ([2600:1010:b06c:c4ce:a9fe:12c:6f82:8006]) by smtp.gmail.com with ESMTPSA id j70sm10483828pfd.208.2020.06.27.15.14.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 27 Jun 2020 15:14:16 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [PATCH tick-sched] Clarify "NOHZ: local_softirq_pending" warning Date: Sat, 27 Jun 2020 15:14:14 -0700 Message-Id: <83B12EF8-3792-4943-A548-5DB0C6FC71D1@amacapital.net> References: <20200627214629.GH9247@paulmck-ThinkPad-P72> Cc: Andy Lutomirski , Frederic Weisbecker , Thomas Gleixner , Ingo Molnar , LKML , kernel-team In-Reply-To: <20200627214629.GH9247@paulmck-ThinkPad-P72> To: paulmck@kernel.org X-Mailer: iPhone Mail (17F80) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jun 27, 2020, at 2:46 PM, Paul E. McKenney wrote: >=20 > =EF=BB=BFOn Sat, Jun 27, 2020 at 02:02:15PM -0700, Andy Lutomirski wrote: >>> On Fri, Jun 26, 2020 at 2:05 PM Paul E. McKenney wr= ote: >>>=20 >>> Currently, can_stop_idle_tick() prints "NOHZ: local_softirq_pending HH" >>> (where "HH" is the hexadecimal softirq vector number) when one or more >>> non-RCU softirq handlers are still enablded when checking to stop the >>> scheduler-tick interrupt. This message is not as enlightening as one >>> might hope, so this commit changes it to "NOHZ tick-stop error: Non-RCU >>> local softirq work is pending, handler #HH. >>=20 >> Thank you! It would be even better if it would explain *why* the >> problem happened, but I suppose this code doesn't actually know. >=20 > Glad to help! >=20 > To your point, is it possible to bisect the appearance of this message, > or is it as usual non-reproducible? (Hey, had to ask!) >=20 > =20 In this particular case, I tracked it down by good old fashioned sleuthing f= or bugs, but it=E2=80=99s still unclear to me precisely how NOHZ gets involv= ed. The bug is that we were entering the kernel from usermode, doing nmi_ent= er(), turning on interrupts, maybe getting a page fault, raising a signal, t= urning off interrupts, nmi_exit(), and back to usermode, with the signal sti= ll queued and undelivered. This is all kinds of bad, but I still don=E2=80=99= t understand what softirqs or idle have to do with it. But I have the bug fixed now!=