Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp584647rdg; Tue, 10 Oct 2023 22:05:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG03FiFAmtYSXpXeGarPvnZ3Orwzr6XBgiICt9cfmqIGnw6jPYFpe4vRSfn7eIwyS+1+pRd X-Received: by 2002:a17:90b:1d05:b0:269:524f:2a19 with SMTP id on5-20020a17090b1d0500b00269524f2a19mr19663881pjb.26.1697000725371; Tue, 10 Oct 2023 22:05:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697000725; cv=none; d=google.com; s=arc-20160816; b=aN9ywMh2HxDgYg4dl2hRv0Q3pdnVNITwtyeC2bYzsthfgnmktrPbAuaQFmsaHOPqjH 2c4eCSTNkd1d9g3hncroMBc07qd+ZQ4GS4M1kP2zNRoOoMZHS8U9dg9XV01EvlA5gLvg Rx3vC3qwi89ocL+gGLtV/ooutUlctDUPSnusIK/6x6GAUu3uTpeEqLMFAm1J5+sOqPr0 uG1wr8FKMPs6Wm2w1vMAIAUYQkf6qM74IaaZWubz/7Hm0FtaO0Zeu+SQNw/pO//J0GQM WP/kbTi3DJeJ5wXUwq1eCoZPIPj5CsWits99S/zhQP2SzS7HxqvU6AnT4bpJRNjOOOIs n8XA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=WxTb+tzaW80rCd53UmA2HdlmPdirZzFTdr3/d2GhRv0=; fh=j4Sb8WqKIlcuLz1rhKFqAZDTYuoGlO9hsdWlc4yr5V0=; b=mWOWt3buSSAW3nQDBmriS1lfsUNZkUdo65yYy2qa7fOo0Lth+gSpCUTe/PNhHEmVWn +bjCwRA8iDOFQPCPqaP8bfl/XZf8xa0vpG4MOvnNe/JUvsH+WzO1r1drKOCmUZFIPGzl OjZdsdOETFeURGOkpNmTABvSZMN2ILXiLOKYQyYiFl456dvRddUSH9mct4KjOCVbc0sJ lGsKvbb1lYwmzJ+UVg5sgv2Q9eS1OtvLmXxIifKRyKmyfjIpff5wpmNzA7iJTBC07oo0 5rWTkZUnbifT5n0HA+kmWTmma056XZ0usxw4NbQ4GHcZ3TgpSL+WynF6ISzqSbqRXMKI aUBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=NIvEbTze; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id u5-20020a17090a890500b00279202f4151si3031697pjn.45.2023.10.10.22.05.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 22:05:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=NIvEbTze; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 9D7C5801C18A; Tue, 10 Oct 2023 22:05:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229534AbjJKFFJ (ORCPT + 99 others); Wed, 11 Oct 2023 01:05:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbjJKFFI (ORCPT ); Wed, 11 Oct 2023 01:05:08 -0400 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 615789E for ; Tue, 10 Oct 2023 22:05:06 -0700 (PDT) Received: by mail-io1-xd35.google.com with SMTP id ca18e2360f4ac-7a2a9e5451bso250160939f.1 for ; Tue, 10 Oct 2023 22:05:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1697000705; x=1697605505; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WxTb+tzaW80rCd53UmA2HdlmPdirZzFTdr3/d2GhRv0=; b=NIvEbTzeRoUZGJ+s5QD52YqMFlgDbLKjEAv8AVBD6UHZNY9HeFXHuuSHKaYpCt91tq g5K7asLLhH90l1V8aebrpjxZNyFXmCM4+FpcDTyrTnUtUwlouJwKakKakQjiADUDBVwE 6VfWPlivqeUv/xe3ddYE/qpbNxHeJhVpjpcdo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697000705; x=1697605505; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WxTb+tzaW80rCd53UmA2HdlmPdirZzFTdr3/d2GhRv0=; b=uwxwzeNRWBCq+n+fGbnOXAd2VrDV8v5f9jcASJl02A5vp81vy1hNcTmUBU85DIlBwN yM7GbW7g0drYVXN17GpQqT68n9eCvagJ1FkKG2ualHtz+d6tfOwasWA6vSFn3JXjfhLV lFKpF3nJ+OEVhJnOe8A1FlP5Yq7kiiyF2Q62OT0m4EGf9UOIG6SjHYXSybsjnGL2dAts 4aM2iEq2Y3ZWi91N1b0RsN/t2ZKd6mDsEbk2dSUfKbryHkD3I4Npeoy6NwEnzf//UYgY k6+Ww5UymJQGkL0zm6MAkGc4SnRWyYVhZfBoXZd4wJ133Kh/SvZUBZ3naUo3K8QSqOOs nA5g== X-Gm-Message-State: AOJu0Yx1mz9uySM8/MdxYpCBx6T7/0UY7vwMB4x9vj7wsu2hqc50lbMr v3LuMWMM+FpfSRTJT/2RT3XbGg== X-Received: by 2002:a5d:9954:0:b0:786:f4a0:d37e with SMTP id v20-20020a5d9954000000b00786f4a0d37emr19775042ios.4.1697000705583; Tue, 10 Oct 2023 22:05:05 -0700 (PDT) Received: from localhost (161.74.123.34.bc.googleusercontent.com. [34.123.74.161]) by smtp.gmail.com with ESMTPSA id m11-20020a02c88b000000b0042b35e163besm3195221jao.88.2023.10.10.22.05.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 22:05:05 -0700 (PDT) Date: Wed, 11 Oct 2023 05:05:04 +0000 From: Joel Fernandes To: "Paul E. McKenney" Cc: "Liam R. Howlett" , Naresh Kamboju , Greg Kroah-Hartman , stable@vger.kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, linux@roeck-us.net, shuah@kernel.org, patches@kernelci.org, lkft-triage@lists.linaro.org, pavel@denx.de, jonathanh@nvidia.com, f.fainelli@gmail.com, sudipm.mukherjee@gmail.com, srw@sladewatkins.net, rwarsow@gmx.de, conor@kernel.org, Chengming Zhou , Peter Zijlstra , Ovidiu Panait , Ingo Molnar , rcu Subject: Re: [PATCH 5.15 000/183] 5.15.134-rc1 review Message-ID: <20231011050504.GA201855@google.com> References: <20231004175203.943277832@linuxfoundation.org> <20231006162038.d3q7sl34b4ouvjxf@revolver> <57c1ff4d-f138-4f89-8add-c96fb3ba6701@paulmck-laptop> <20231006175714.begtgj6wrs46ukmo@revolver> <7652477c-a37c-4509-9dc9-7f9d1dc08291@paulmck-laptop> <9470dab6-dee5-4505-95a2-f6782b648726@paulmck-laptop> <433f5823-059c-4b51-8d18-8b356a5a507f@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <433f5823-059c-4b51-8d18-8b356a5a507f@paulmck-laptop> X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Tue, 10 Oct 2023 22:05:22 -0700 (PDT) X-Spam-Level: ** On Tue, Oct 10, 2023 at 06:34:35PM -0700, Paul E. McKenney wrote: [...] > > > > > > > It's also worth noting that the bug this fixes wasn't exposed until the > > > > > > > maple tree (added in v6.1) was used for the IRQ descriptors (added in > > > > > > > v6.5). > > > > > > > > > > > > Lots of latent bugs, to be sure, even with rcutorture. :-/ > > > > > > > > > > The Right Thing is to fix the bug all the way back to the introduction, > > > > > but what fallout makes the backport less desirable than living with the > > > > > unexposed bug? > > > > > > > > You are quite right that it is possible for the risk of a backport to > > > > exceed the risk of the original bug. > > > > > > > > I defer to Joel (CCed) on how best to resolve this in -stable. > > > > > > Maybe I am missing something but this issue should also be happening > > > in mainline right? > > > > > > Even though mainline has 897ba84dc5aa ("rcu-tasks: Handle idle tasks > > > for recently offlined CPUs") , the warning should still be happening > > > due to Liam's "kernel/sched: Modify initial boot task idle setup" > > > because the warning is just rearranged a bit but essentially the same. > > > > > > IMHO, the right thing to do then is to drop Liam's patch from 5.15 and > > > fix it in mainline (using the ideas described in this thread), then > > > backport both that new fix and Liam's patch to 5.15. > > > > > > Or is there a reason this warning does not show up on the mainline? > > There is not a whole lot of commonality between the v5.15.134 version of > RCU Tasks Trace and that of mainline. In theory, in mainline, CPU hotplug > is supposed to be disabled across all calls to trc_inspect_reader(), > which means that there would not be any CPU coming or going. > > But there could potentially be some time between when a CPU was > marked as online and its idle task was marked PF_IDLE. And in > fact x86 start_secondary() invokes set_cpu_online() before it calls > cpu_startup_entry(), and it is the latter than sets PF_IDLE. > > The same is true of alpha, arc, arm, arm64, csky, ia64, loongarch, mips, > openrisc, parisc, powerpc, riscv, s390, sh, sparc32, sparc64, x86 xen, > and xtensa, which is everybody. > > One reason why my testing did not reproduce this is because I was running > against v6.6-rc1, and cff9b2332ab7 ("kernel/sched: Modify initial boot > task idle setup") went into v6.6-rc3. An initial run merging in current > mainline also failed to reproduce this, but I am running overnight. > If that doesn't reproduce, I will try inserting delays between the > set_cpu_online() and the cpu_startup_entry(). I thought the warning happens before set_cpu_online() is even called, because under such situation, ofl == true and the task is not set to PF_IDLE yet: WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t)); > If this problem is real, fixes include: > > o Revert Liam's patch and make Tiny RCU's call_rcu() deal with > the problem. This is overhead and non-tinyness, but to Joel's > point, it might be best. > > o Go back to something more like Liam's original patch, which > cleared PF_IDLE only for the boot CPU. > > o Set PF_IDLE before calling set_cpu_online(). This would work, > but it would also be rather ugly, reaching into each and every > architecture. > > o Move the call to set_cpu_online() into cpu_startup_entry(). > This would require some serious inspection to prove that it is > safe, assuming that it is in fact safe. > > o Drop the WARN_ON_ONCE() from trc_inspect_reader(). Not all > that excited by losing this diagnostic, but then again it > has been awhile since it has caught anything. > > o Make the WARN_ON_ONCE() condition in trc_inspect_reader() instead > to a "return false" to retry later. Ditto, also not liking the > possibility of indefinite deferral with no warning. Just for completeness, o Since it just a warning, checking for task_struct::pid == 0 instead of is_idle_task()? Though PF_IDLE is also set in play_idle_precise(). o Change warning to: WARN_ON_ONCE(ofl && task_curr(t) && (!is_idle_task(t) && t->pid != 0)); thanks, - Joel