Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp2529191rdb; Wed, 21 Feb 2024 10:17:24 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWb+w0yfAe9juAhzDf4wzTb1Oq3EwIV8eRyTTSoe4CHiIsEcL4rKD3UbZp4aLNo9LfmY6z+Yr/bPy7hGBFKq8KlCVS6zV0+MYNOqyjZQQ== X-Google-Smtp-Source: AGHT+IGTEGHxUllvuWyWFqMh4g59ZUI8W5dIDNXgYnuLddrsqxJ2l7iINmnSaY6uZgUTcCEyqnZi X-Received: by 2002:a17:906:130c:b0:a3f:7129:6b1a with SMTP id w12-20020a170906130c00b00a3f71296b1amr87350ejb.63.1708539443990; Wed, 21 Feb 2024 10:17:23 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708539443; cv=pass; d=google.com; s=arc-20160816; b=CKHnCvA4jBof7ThxOeXAwdPptrBYHWiUPg+uZI2se0AtVwLl7rvao0iTg1MnsMP733 P2zFLgMiZMM3gkCA02NvFynkvRW9LTlsUlvQYwqtrfftzi1iyKC8IVddU3bTTCyAOgJ5 UTqVg7ZXXibgs23bq5DqNTA+w27u1MKBp8uQpYDSIFyr2+lNFUt9pNHKcEIWd5JvjjGA 6GNqcSyFuk2mU3bzwgrfIF1fnrvo5mA1zSUmkGoEDN7wYRWF8qNaolS/qYYY1MEdvc/4 CC5um/QIwyYGbhAPhHQb0kFAdU/Mdtp/XxeFKAVWqwLO0E3myYu+fPdABKWIdPDbrDCg hd9g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date; bh=39luptfwLLUKbC4GPD9EKIk0yN808Z9I/iuvhkZZoxo=; fh=sVTpJEK8EVFIMQr9TVHW7ybPmHuBwBLFc3OhmIEA82I=; b=Jkltp4WvbdNGxTHI2ustm5nK91U/DcwdCKiTLa8LVG/NETpJBEfgSjb14T5/mTQJaP 6st7AmmUDzrBCawehBcTB6MZSSFBgX4Q+CE+CVvvajGs0pg7N6YbKWPj+UcDjPVGGAGB qmnQjn5xGjZ2SzwhenbNLVCPzkegx1PiZedIMCRMYGEdP1g9gTWBO4Z62j8YRnOGBVmg p0znAN1STns95gHgvBTkmE0E343IOHdZLbl615GQ2mBa6CYlBKam3e1pG9idEmW4ecTz L2eSbU+3KKdFaUtMQu9XKlAuaQUR0x1kuJ9m4fG/DKxN9mYnIvpZpXRpzaPjVpVoElZA 7uJw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-75259-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75259-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id lu20-20020a170906fad400b00a3ec680ee34si2482486ejb.469.2024.02.21.10.17.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 10:17:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75259-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-75259-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75259-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 929811F249C3 for ; Wed, 21 Feb 2024 18:17:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B48EA85277; Wed, 21 Feb 2024 18:17:17 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33B3D42A8B for ; Wed, 21 Feb 2024 18:17:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708539437; cv=none; b=UjIUsP1/C/aaU1pqvl6CzwMKNlj/j7hxAFWuuBxv42O+xi6doYbV3Enepq909RlwV0OVYpgEsUh+HXAumEm/KcyfZatdQyRYzmsAxn1CxroTG2mEyAIzptompcUDklFW+zslQsgy8XVg+cKXREh4WkdKE5sIvbeCm0Xl2bQVw6Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708539437; c=relaxed/simple; bh=XaMobFpXAK8tIXrf0/UQugdaVpU+uyKO0rivX2fVDtI=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FWybEEloO9mJY3m9S/ZpOEREYr1N7Mn8XoSaSxrh5EArBRnFj09QFxA4EV8Pk1Kdbq68FgWFn75DDdVh6eBCbiaiTbc+hJuzmM+7eHL8Auok3PfdV8VI/EzxzODsPn6Tqw3qF+lpYXCU5oGqmZw7l+OmMKKv3xbOrcwvrkuMjCM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64274C433F1; Wed, 21 Feb 2024 18:17:13 +0000 (UTC) Date: Wed, 21 Feb 2024 13:19:01 -0500 From: Steven Rostedt To: "Paul E. McKenney" Cc: Ankur Arora , linux-kernel@vger.kernel.org, tglx@linutronix.de, peterz@infradead.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jpoimboe@kernel.org, mark.rutland@arm.com, jgross@suse.com, andrew.cooper3@citrix.com, bristot@kernel.org, mathieu.desnoyers@efficios.com, glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com, mattst88@gmail.com, krypton@ulrich-teichert.org, David.Laight@aculab.com, richard@nod.at, jon.grimm@amd.com, bharata@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Message-ID: <20240221131901.69c80c47@gandalf.local.home> In-Reply-To: <2b735ba4-8081-4ddb-9397-4fe83143d97f@paulmck-laptop> References: <87le7mpjpr.fsf@oracle.com> <4e070ae0-29dc-41ee-aee6-0d3670304825@paulmck-laptop> <0d4a4eec-ce91-48da-91b6-1708a97edaeb@paulmck-laptop> <871q9dmndg.fsf@oracle.com> <9916c73f-510c-47a6-a9b4-ea6b438e82c0@paulmck-laptop> <87le7lkzj6.fsf@oracle.com> <4bc4ea06-e3e9-4d22-bacf-71cae0ba673d@paulmck-laptop> <0be4df28-99be-41a3-9e24-2b7cfc740b4a@paulmck-laptop> <87r0hbkafi.fsf@oracle.com> <7db5c057-8bd4-4209-8484-3a0f9f3cd02d@paulmck-laptop> <2b735ba4-8081-4ddb-9397-4fe83143d97f@paulmck-laptop> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 19 Feb 2024 08:48:20 -0800 "Paul E. McKenney" wrote: > > I will look again -- it is quite possible that I was confused by earlier > > in-fleet setups that had Tasks RCU enabled even when preemption was > > disabled. (We don't do that anymore, and, had I been paying sufficient > > attention, would not have been doing it to start with. Back in the day, > > enabling rcutorture, even as a module, had the side effect of enabling > > Tasks RCU. How else to test it, right? Well...) > > OK, I got my head straight on this one... > > And the problem is in fact that Tasks RCU isn't normally present > in non-preemptible kernels. This is because normal RCU will wait > for preemption-disabled regions of code, and in PREMPT_NONE and > PREEMPT_VOLUNTARY kernels, that includes pretty much any region of code > lacking an explicit schedule() or similar. And as I understand it, > tracing trampolines rely on this implicit lack of preemption. > > So, with lazy preemption, we could preempt in the middle of a > trampoline, and synchronize_rcu() won't save us. > > Steve and Mathieu will correct me if I am wrong. > > If I do understand this correctly, one workaround is to remove the > "if PREEMPTIBLE" on all occurrences of "select TASKS_RCU". That way, > all kernels would use synchronize_rcu_tasks(), which would wait for > a voluntary context switch. > > This workaround does increase the overhead and tracepoint-removal > latency on non-preemptible kernels, so it might be time to revisit the > synchronization of trampolines. Unfortunately, the things I have come > up with thus far have disadvantages: > > o Keep a set of permanent trampolines that enter and exit > some sort of explicit RCU read-side critical section. > If the address for this trampoline to call is in a register, > then these permanent trampolines remain constant so that > no synchronization of them is required. The selected > flavor of RCU can then be used to deal with the non-permanent > trampolines. > > The disadvantage here is a significant increase in the complexity > and overhead of trampoline code and the code that invokes the > trampolines. This overhead limits where tracing may be used > in the kernel, which is of course undesirable. I wonder if we can just see if the instruction pointer at preemption is at something that was allocated? That is, if it __is_kernel(addr) returns false, then we need to do more work. Of course that means modules will also trigger this. We could check __is_module_text() but that does a bit more work and may cause too much overhead. But who knows, if the module check is only done if the __is_kernel() check fails, maybe it's not that bad. -- Steve > > o Check for being preempted within a trampoline, and track this > within the tasks structure. The disadvantage here is that this > requires keeping track of all of the trampolines and adding a > check for being in one on a scheduler fast path. > > o Have a variant of Tasks RCU which checks the stack of preempted > tasks, waiting until all have been seen without being preempted > in a trampoline. This still requires keeping track of all the > trampolines in an easy-to-search manner, but gets the overhead > of searching off of the scheduler fastpaths. > > It is also necessary to check running tasks, which might have > been interrupted from within a trampoline. > > I would have a hard time convincing myself that these return > addresses were unconditionally reliable. But maybe they are? > > o Your idea here! > > Again, the short-term workaround is to remove the "if PREEMPTIBLE" from > all of the "select TASKS_RCU" clauses. > > > > > My next step is to try this on bare metal on a system configured as > > > > is the fleet. But good progress for a week!!! > > > > > > Yeah this is great. Fingers crossed for the wider set of tests. > > > > I got what might be a one-off when hitting rcutorture and KASAN harder. > > I am running 320*TRACE01 to see if it reproduces. > > [ . . . ] > > > So, first see if it is reproducible, second enable more diagnostics, > > third make more grace-period sequence numbers available to rcutorture, > > fourth recheck the diagnostics code, and then see where we go from there. > > It might be that lazy preemption needs adjustment, or it might be that > > it just tickled latent diagnostic issues in rcutorture. > > > > (I rarely hit this WARN_ON() except in early development, when the > > problem is usually glaringly obvious, hence all the uncertainty.) > > And it is eminently reproducible. Digging into it...