Received: by 2002:a05:7412:798b:b0:fc:a2b0:25d7 with SMTP id fb11csp424949rdb; Thu, 22 Feb 2024 07:57:11 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCW/bMjnEgBllnXu6z+13Q4gnqvkjtHKohEeJFj2b8QyreOu4Sb0mF4aLxEM5FizW4GmM7oy1pzhMaqwitPdRh+UYKrvBJsb5rMvcDBmTA== X-Google-Smtp-Source: AGHT+IHji/Bhk/5bDfxV2QRR51WTKhXTp1ORg2BiPJ8T3T6KxQT+U4b9umA4fZ8IK41DKzkIBtSy X-Received: by 2002:a05:6359:411f:b0:178:de46:783c with SMTP id kh31-20020a056359411f00b00178de46783cmr26073306rwc.16.1708617431577; Thu, 22 Feb 2024 07:57:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708617431; cv=pass; d=google.com; s=arc-20160816; b=ti55GyB3jr3n+oqnaozcbsDLraw+nTRoPDA2B+L/PA63kOS8R7GmkT6DDmYYbofh9C yLp8IpuOhYCFSFVJrkh3NlklOoA2T9gX+SHma++4md6xpit+s+pkT+yGgMKIOoEM+cVm nehO/0ru6ql2rrfztS2haSh8MVHXyUkb4TOvBpAsL5VUQYpqZxEC6cCVH/TTzX5CDlZS tfcR4o58/s4xbFLN8q6ven0sgKy241scCLP3cbhP9yu/kMKFgBHfMZGeU9XbFl2OLvp8 zAdRfWC4SnZpO+UFvSLfqloqq5EoiE+8/VP0jvjWUcfiz4xmMElnmrSF9WoU7IpwUnkz QPCw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=rWVmYPMAjfR76tOELfvwb8AtGdyti7xeo92qWnpnbHc=; fh=wObB+PfjDNfkiNfmhn53a2BO2y7Rdlp4d97VzllClQE=; b=J9rXNJyB2aiYNCVWbjcwk9OvCt8ckL+klVAnr/ZqzqjDQp9Bdvxdwzzvm0jJ4+8MV2 sbL86hkacnWuqhMyCmrvK07SiLU8Rih/EM6aaJFW8xZ1uCcv+67HsX+QtNB2Gh+6XtmR qGZW5TUiJ3BUidPUDDLg1eDLXbM2S7YElofyB5qSkiSCiUPY3VcEQFNuwcIci+jWCK3m QTkc0D272wH58ku9AHhgdxVa7QYCMudFhJdLkz31sqzMbHuzPOGuowXiKHF8/W1seMMQ owq2xIU7xiA/xw0DVfgwzj1SwFXmz84jvcdZM5b0/v0PxqJx3KbKDF7e5CtDhdN+g400 ttuQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-76815-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-76815-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id q18-20020a656852000000b005cdfcb3908bsi10341768pgt.316.2024.02.22.07.57.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Feb 2024 07:57:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-76815-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-76815-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-76815-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 99CB9B22A49 for ; Thu, 22 Feb 2024 15:50:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5FCF414E2EC; Thu, 22 Feb 2024 15:50:17 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C4F0A39FC7 for ; Thu, 22 Feb 2024 15:50:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708617016; cv=none; b=qLEQlD7MOV4d4z71HFRhnChRcfhjNL/jxsBofgAOPnUPE/F/VJf7o813TAQ/ggZwe19hxIvIBehG70YmYXo4QCDlYMc6g0SIhXT84j50DEW4fAYT1RkfRJ5ueKnSyWFGb9tgcmN5RrrH04gwni3V/wurMRGRrbYzbh91d3I2cDI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708617016; c=relaxed/simple; bh=F5Lm3dTZGGzB2AupBbPbEA1PpVKFW7117BQantObeic=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=p8Zxy2TxbdGKOcHKLw8LtoopZQMroOe8wmm5uAGc2fUS4c3R2ywo6Bzam5+4Ztee3OBlvsvj0c01IajzYcbfX3HiijrFuR0Wka1vl/9qEVvS7fIjddoPW4Kiku+3hF0iBLuAO1L69CXNIFrlBRdbGJg4JyYtfzobTFMyNm9DQZs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 74905DA7; Thu, 22 Feb 2024 07:50:52 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.79.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 798DA3F762; Thu, 22 Feb 2024 07:50:09 -0800 (PST) Date: Thu, 22 Feb 2024 15:50:02 +0000 From: Mark Rutland To: "Paul E. McKenney" Cc: Steven Rostedt , Ankur Arora , linux-kernel@vger.kernel.org, tglx@linutronix.de, peterz@infradead.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jpoimboe@kernel.org, jgross@suse.com, andrew.cooper3@citrix.com, bristot@kernel.org, mathieu.desnoyers@efficios.com, glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com, mattst88@gmail.com, krypton@ulrich-teichert.org, David.Laight@aculab.com, richard@nod.at, jon.grimm@amd.com, bharata@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Message-ID: References: <87le7lkzj6.fsf@oracle.com> <4bc4ea06-e3e9-4d22-bacf-71cae0ba673d@paulmck-laptop> <0be4df28-99be-41a3-9e24-2b7cfc740b4a@paulmck-laptop> <87r0hbkafi.fsf@oracle.com> <7db5c057-8bd4-4209-8484-3a0f9f3cd02d@paulmck-laptop> <2b735ba4-8081-4ddb-9397-4fe83143d97f@paulmck-laptop> <20240221131901.69c80c47@gandalf.local.home> <8f30ecd8-629b-414e-b6ea-b526b265b592@paulmck-laptop> <20240221151157.042c3291@gandalf.local.home> <53020731-e9a9-4561-97db-8848c78172c7@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53020731-e9a9-4561-97db-8848c78172c7@paulmck-laptop> On Wed, Feb 21, 2024 at 12:22:35PM -0800, Paul E. McKenney wrote: > On Wed, Feb 21, 2024 at 03:11:57PM -0500, Steven Rostedt wrote: > > On Wed, 21 Feb 2024 11:41:47 -0800 > > "Paul E. McKenney" wrote: > > > > > > I wonder if we can just see if the instruction pointer at preemption is at > > > > something that was allocated? That is, if it __is_kernel(addr) returns > > > > false, then we need to do more work. Of course that means modules will also > > > > trigger this. We could check __is_module_text() but that does a bit more > > > > work and may cause too much overhead. But who knows, if the module check is > > > > only done if the __is_kernel() check fails, maybe it's not that bad. > > > > > > I do like very much that idea, but it requires that we be able to identify > > > this instruction pointer perfectly, no matter what. It might also require > > > that we be able to perfectly identify any IRQ return addresses as well, > > > for example, if the preemption was triggered within an interrupt handler. > > > And interrupts from softirq environments might require identifying an > > > additional level of IRQ return address. The original IRQ might have > > > interrupted a trampoline, and then after transitioning into softirq, > > > another IRQ might also interrupt a trampoline, and this last IRQ handler > > > might have instigated a preemption. > > > > Note, softirqs still require a real interrupt to happen in order to preempt > > executing code. Otherwise it should never be running from a trampoline. > > Yes, the first interrupt interrupted a trampoline. Then, on return, > that interrupt transitioned to softirq (as opposed to ksoftirqd). > While a softirq handler was executing within a trampoline, we got > another interrupt. We thus have two interrupted trampolines. > > Or am I missing something that prevents this? Surely the problematic case is where the first interrupt is taken from a trampoline, but the inner interrupt is taken from not-a-trampoline? If the innermost interrupt context is a trampoline, that's the same as that without any nesting. We could handle nesting with a thread flag (e.g. TIF_IN_TRAMPOLINE) and a flag in irqentry_state_t (which is on the stack, and so each nested IRQ gets its own): * At IRQ exception entry, if TIF_IN_TRAMPOLINE is clear and pt_regs::ip is a trampoline, set TIF_IN_TRAMPOLINE and irqentry_state_t::entered_trampoline. * At IRQ exception exit, if irqentry_state_t::entered_trampoline is set, clear TIF_IN_TRAMPOLINE. That naturally nests since the inner IRQ sees TIF_IN_TRAMPOLINE is already set and does nothing on entry or exit, and anything imbetween can inspect TIF_IN_TRAMPOLINE and see the right value. On arm64 we don't dynamically allocate trampolines, *but* we potentially have a similar problem when changing the active ftrace_ops for a callsite, as all callsites share a common trampoline in the kernel text which reads a pointer to an ftrace_ops out of the callsite, then reads ftrace_ops::func from that. Since the ops could be dynamically allocated, we want to wait for reads of that to complete before reusing the memory, and ideally we wouldn't have new entryies into the func after we think we'd completed the transition. So Tasks RCU might be preferable as it waits for both the trampoline *and* the func to complete. > > > Are there additional levels or mechanisms requiring identifying > > > return addresses? > > > > Hmm, could we add to irq_enter_rcu() > > > > __this_cpu_write(__rcu_ip, instruction_pointer(get_irq_regs())); > > > > That is to save off were the ip was when it was interrupted. > > > > Hmm, but it looks like the get_irq_regs() is set up outside of > > irq_enter_rcu() :-( > > > > I wonder how hard it would be to change all the architectures to pass in > > pt_regs to irq_enter_rcu()? All the places it is called, the regs should be > > available. > > > > Either way, it looks like it will be a bit of work around the trampoline or > > around RCU to get this efficiently done. > > One approach would be to make Tasks RCU be present for PREEMPT_AUTO > kernels as well as PREEMPTIBLE kernels, and then, as architectures provide > the needed return-address infrastructure, transition those architectures > to something more precise. FWIW, that sounds good to me. Mark.