Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2881701lqz; Wed, 3 Apr 2024 11:09:17 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUK42D5wW4gjBWiixXQlZaJE0Yo+lpJCazTx6avPJHplhoPfMEm1bjZTukp5faeHITDZ+VMymREGj9LibK00CAXotVib8GD038cq0jAfg== X-Google-Smtp-Source: AGHT+IFot4APEIvuc1DN2uNM1lEh8SnY56UPFDNR8nHhkVSbSbOs6WiTwQeJqk0KqF3nw4fCD5uW X-Received: by 2002:a05:6358:1209:b0:183:8b61:eeaa with SMTP id h9-20020a056358120900b001838b61eeaamr956rwi.15.1712167757316; Wed, 03 Apr 2024 11:09:17 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712167757; cv=pass; d=google.com; s=arc-20160816; b=AP/S2Se20A4QXr1r1vJl7rPxjNVI2BjIaYbtQ9heYsqPCh28BvcwxwFdSLlwRieg0I Nv2relLL0CBuaJd/QXJcSYv1oskCUIl5EadM5eXMDWjge4dYw53t3yzfFlKNbCN0RCnZ 3Dhv2og+3VnR5qk2l75qoorzM21v9SafkgcVGTGHvQO/rSq7rZX10XECN9q0mdZL0kpv suWJtdhbCzK5BK0wEempC/f0oq3RDBudq9fhftwwu+cG39VgkrAVXJ4l4bjjUZ46HL9v eL56bMBW+At2AoWLWxu1MRgT0XLywhettfCqUlEqjYMs/RPpx+I2CGuocxQ8nZkdtNom vfxQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=z852bNnvs2c02HAu+x8c5ghFnSz+ksVj0e907qC82/E=; fh=Vzoq0nVshxERZBL4pLZ68yhgpR1mmqRekZaurNon2Tg=; b=y0fn3RckImG1hEiSP7M4KWQD3m1zxrx38b1NlsIRgC1qmy8iNSp4HthXDcyLy4/F9v bWBUCj1QKUloKmJZHaDdV1NGTKC9p2po4/VM1132YY23S1XHK1YEPIoGFxoLyBpAnyHh 5qfNKJaKDK1lqFc6EHXsc5KrkjIP7HSOTdgqQf7ctIpBRenXJ420xMy1VCIxzP+BrTCH 9NIrKmZ/Rg+PsIUVVXdaO9RddWPl5bw4zvLB92BqEtMSzRdcPozU+rOYaaITKz4FCoNN qLMOoAoQxTWbR6+o7QLmkBjPUi7xPLb29opQGtWEvlS4OefpZnALADTXSAZWGY0OrNAA lujQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=eZZftPSh; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-130414-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130414-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id b21-20020a63d315000000b005dc4fdcf9b2si13512327pgg.9.2024.04.03.11.09.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 11:09:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-130414-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=eZZftPSh; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-130414-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130414-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A5C2E283E2C for ; Wed, 3 Apr 2024 18:05:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6A11315381F; Wed, 3 Apr 2024 18:05:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eZZftPSh" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9476015358B for ; Wed, 3 Apr 2024 18:05:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712167512; cv=none; b=h+7z2eB31RmNeSrYsv189kWijCqjGBRrNUcw03oPL/cKzy7UElKIavvRvzJbqJYveVwXruEsd/rUlHNwPqD5/LOfsVVCnBg3Fv81BXWmRjbMFV81NEcRTaCzssQti6ZTqeWjiZ7l7aKsKbG/gIIG8kwFHtGNGUojq20m1/AsWCo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712167512; c=relaxed/simple; bh=qyTX9PCGdZlKPgAFk3WtbDgUfv+tS3H3+547kM4FpWI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jLQvGinJ9X4FOoFcHvDeAnJ3am1/ESC6Y+3ew7UbzORTy+9CcxXXuBGrMc98e5Si5bIYKy0N2QjpUf4eM3FfbF6kcS+dYWtNAp2/uz4t18tSGtPgufJt1NB6LU2LXcZ5dYoPQnGoPyAcZfwUe0tOv0zcn+kCWAp0yeHUAcKHrJo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eZZftPSh; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29725C43394; Wed, 3 Apr 2024 18:05:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712167512; bh=qyTX9PCGdZlKPgAFk3WtbDgUfv+tS3H3+547kM4FpWI=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=eZZftPShbISmEeQKoIHT5469+LMJIlQ7sbTCf8W5J1a6toL/QjjxL5MTFCRnRBM+u qzIHtVnam99z8xqLySmXQLaA8VCwLFjLIoscuf+fx3rb4PYSN3YKjdT8mZ/GK234q0 t+lKgbL/D6Oi/sqM9H9vhc8FVuNYbN/3OIjtBt1ZK720TvS92ynZN2t1IWlHPu3XQM 2P8bKGdiJktgXPQp8RixC5H2R4bZzEmzqQg+T1KdG3S4lM/9rLf9+CnBZskPZyXvik C59Nze07s8TBcNEx80uINLXVZthiRmiOfoMfruWqnxw3nIqXACfphwBXjjdEW7Tcoz c45SgVDesmxJw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id BAAA4CE0D85; Wed, 3 Apr 2024 11:05:11 -0700 (PDT) Date: Wed, 3 Apr 2024 11:05:11 -0700 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: Thomas Gleixner , LKML , Ingo Molnar , Anna-Maria Behnsen Subject: Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Message-ID: <9e1a3be7-839a-44fb-9d10-82784581f7a0@paulmck-laptop> Reply-To: paulmck@kernel.org References: <464f6be2-4a72-440d-be53-6a1035d56a4f@paulmck-laptop> <1b5752c8-ef32-4ed4-b539-95d507ec99ce@paulmck-laptop> <6a95b6ac-6681-4492-b155-e30c19bb3341@paulmck-laptop> <797f44f9-701d-4fca-a9f4-d112a7178e7b@paulmck-laptop> <3f2597ea-81ba-4498-a0ee-84d7e4e3da59@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3f2597ea-81ba-4498-a0ee-84d7e4e3da59@paulmck-laptop> On Tue, Apr 02, 2024 at 09:47:37AM -0700, Paul E. McKenney wrote: > On Mon, Apr 01, 2024 at 05:04:10PM -0700, Paul E. McKenney wrote: > > On Mon, Apr 01, 2024 at 11:56:36PM +0200, Frederic Weisbecker wrote: > > > Le Mon, Apr 01, 2024 at 02:26:25PM -0700, Paul E. McKenney a ?crit : > > > > > > _ The RCU CPU Stall report. I strongly suspect the cause is the hrtimer > > > > > > enqueue to an offline CPU. Let's solve that and we'll see if it still > > > > > > triggers. > > > > > > > > > > Sounds like a plan! > > > > > > > > Just checking in on this one. I did reproduce your RCU CPU stall report > > > > and also saw a TREE03 OOM that might (or might not) be related. Please > > > > let me know if hammering TREE03 harder or adding some debug would help. > > > > Otherwise, I will assume that you are getting sufficient bug reports > > > > from your own testing to be getting along with. > > > > > > Hehe, there are a lot indeed :-) > > > > > > So there has been some discussion on CPUSET VS Hotplug, as a problem there > > > is likely the cause of the hrtimer warning you saw, which in turn might > > > be the cause of the RCU stalls. > > > > > > Do you always see the hrtimer warning along the RCU stalls? Because if so, this > > > might help: > > > https://lore.kernel.org/lkml/20240401145858.2656598-1-longman@redhat.com/T/#m1bed4d298715d1a6b8289ed48e9353993c63c896 > > > > Not always, but why not give it a shot? > > And no failures, though I would need to run much longer for this to > mean much. These were wide-spectrum tests, so my next step will be to > run only TREE03 and TREE07. And 600 hours each of TREE03 and TREE07 got me a single TREE07 instance of the sched_tick_remote() failure. This one: WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); But this is just rcutorture testing out "short" 14-second stalls, which can only be expected to trigger this from time to time. The point of this stall is to test the evasive actions that RCU takes when 50% of the way to the RCU CPU stall timeout. One approach would be to increase that "3" to "15", but that sounds quite fragile. Another would be for rcutorture to communicate the fact that stall testing is in progress, and then this WARN_ON_ONCE() could silence itself in that case. But is there a better approach? Thanx, Paul