Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp762661pxb; Thu, 21 Oct 2021 09:02:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzf9HGdb6elJkQJcMpTFYl31WramfnVF6uy5NbcdfAx9IxxTeZWLiNwVa3i+uhUXoPQduw4 X-Received: by 2002:a05:6402:34d0:: with SMTP id w16mr8812234edc.98.1634832142881; Thu, 21 Oct 2021 09:02:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634832142; cv=none; d=google.com; s=arc-20160816; b=gLd7SAw/9LrOQwSsNYR2PYB33JXX9PpAZnLlulhGK02cGaFZfvZkU1Wj9rw7vFNVix TlaXpBazrg+EPAZw7wcWqdwtq0bLt0XFZ2xUw9c9V7+qhPpcU4wLjKZNRnTgrXiwmiqA jG7dBWVwGU5cTPrFF+QxT584fQ3OLx/eh3j5jhrTYcThgq36d4taG59Y6LQzKnJGp07z th9S7lY4TrZ5C2nPgIM5z52pTrArbMv2ksKl8V5G4yMIrX8jThmyIX+OokNfwcFSjExK l9nDW3gO7MIzXKeDvobriYuqstbqNQo/aXhNahDuhyi8J0+ArDMyM1cVPAfiDF4z9xau r25w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=7v1evPX91RKtvGl+BHA9roQMkP4CrWpKb5r2k4tryX0=; b=r5a8I2eQW7dOaYcj8MDFb/zq/He5VQO3pAHpL8FMj5Ezk7pVoVTrMUkan/TysKvh77 wXCMfEaJ8VU4TLHE9B1xSA3OECAVQYHiRBoNDM9Bv1TfMmII3X/jyh2VIqPu3txTXaRS dTrsQBvggKG7rKrXXu3ttx89uwKBHEo81Z8riUi1tt6Xniod0/ICSZ2NW4gIb6si2hVP GLlgwfsYN9nw/BnF9qCzfe4sPzA6sRfNQ2nwd47rEEyu2v+o9dDmldCZag2/354u3c9q Y5hKxIe35rSkKaL3tkZdd07meLZW5rUa3BwBdtnTphSnb/2UkPEQPY8JCLtAFwCB50wh 1Ajw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f2si9691074edl.338.2021.10.21.09.01.59; Thu, 21 Oct 2021 09:02:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231764AbhJUP7i (ORCPT + 99 others); Thu, 21 Oct 2021 11:59:38 -0400 Received: from foss.arm.com ([217.140.110.172]:44622 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231745AbhJUP73 (ORCPT ); Thu, 21 Oct 2021 11:59:29 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 24098D6E; Thu, 21 Oct 2021 08:57:13 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AC3BD3F694; Thu, 21 Oct 2021 08:57:11 -0700 (PDT) Date: Thu, 21 Oct 2021 16:57:03 +0100 From: Mark Rutland To: Thomas Gleixner Cc: Linus Torvalds , Catalin Marinas , Will Deacon , Marc Zyngier , Linux Kernel Mailing List , Linux ARM , "Paul E. McKenney" , Peter Zijlstra , Thomas Bogendoerfer , linux-mips@vger.kernel.org Subject: Re: [GIT PULL] arm64 fixes for 5.15-rc5 Message-ID: <20211021155654.GA55459@lakrids.cambridge.arm.com> References: <20211011104729.GB1421@C02TD0UTHF1T.local> <87czoacrfr.ffs@tglx> <20211012140243.GA41546@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211012140243.GA41546@C02TD0UTHF1T.local> User-Agent: Mutt/1.11.1+11 (2f07cb52) (2018-12-01) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 12, 2021 at 03:02:43PM +0100, Mark Rutland wrote: > On Tue, Oct 12, 2021 at 03:18:16PM +0200, Thomas Gleixner wrote: > > On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote: > > > On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland wrote: > > > And so the reason I really hate that patch is that it introduces a new > > > "different architectures randomly and inexplicably do different > > > things, and the generic behavior is very different on arm64 than it is > > > elsewhere". > > > > > > That's just the worst kind of hack to me. > > > > > > And in this case, it's really *horribly* hard to see what the call > > > chain is. It all ends up being actively obfuscated and obscured > > > through that 'handle_arch_irq' function pointer, that is sometimes set > > > through set_handle_irq(), and sometimes set directly. > > > > > > I really think that if the rule is "we can't do accounting in > > > handle_domain_irq(), because it's too late for arm64", then the fix > > > really should be to just not do that. > > > > > > Move the irq_enter()/irq_exit() to the callers - quite possibly far up > > > the call chain to the root of it all, and just say "architecture code > > > needs to do this in the low-level code before calling > > > handle_arch_irq". I've spent the last few days attacking this, and I have a series which reworks things to pull irq_{enter,exit}() out of the irqchip code and into arch/entry code where it belongs, removig CONFIG_HANDLE_DOMAIN_IRQ entirely in the process. I'll post that out soon once I've cleaned up the commit messages and given it a decent cover letter. > > > Anyway, it _looks_ to me like the pattern is very simple: > > > > > > Step 1: > > > - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. > > > > > > This clearly doesn't change anything at all, but also doesn't fix the > > > problem you have. But it's easy to verify that the code is the same > > > before-and-after. > > > > > > Step 2 is the pattern matching step: > > > > > > - if the caller of handle_domain_irq() ends up being a function that > > > is registered with set_handle_irq(), then we > > > (a) remove the irq_enter/irq_exit from it > > > (b) add it to the architectures that use handle_arch_irq. > > > (c) make sure that if there are other callers of it (not through > > > handle_arch_irq) we move that irq_enter/irq_exit into them too > > > > > > I _suspect_ - but didn't check - that Step 2(c) doesn't actually I had a go with the approach suggested above, but that didn't really work out and I ended up splitting the problem a different way. Comments belwo for the sake of posterity. Attacking this as a per-caller issue is *really* chury, and interdependencies force you to fix all drivers and all architectures in one go, which makes it really hard to see the wood for the trees. The underlying issue was with CONFIG_HANDLE_DOMAIN_IRQ, so just looking as set_handle_irq (which indicates CONFIG_GENERIC_IRQ_MULTI_HANDLER) also wasn't sufficient, and I had to go digging through each of the affected architectures' entry code. Instead, I've added a temporary shim, migrated each architecture in turn, then removed the shim and CONFIG_HANDLE_DOMAIN_IRQ entirely, which also ends up simplifying the drivers a bit. Thanks, Mark.