Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp90893imn; Mon, 25 Jul 2022 10:50:49 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vdnyuJv+TFsDGvdePisMr0B83LlJzTxTQCmT/hstEjwn3Ze9ohmrSLl5EFpDACPA6XWNL8 X-Received: by 2002:a05:6402:4016:b0:43a:f310:9522 with SMTP id d22-20020a056402401600b0043af3109522mr14421667eda.200.1658771448977; Mon, 25 Jul 2022 10:50:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658771448; cv=none; d=google.com; s=arc-20160816; b=urKO+OWhd81M2TPVGHKtVCm6IFHNORBqcO7598QKCNweGYk7PnLSTwkOOmNM13CNft cjPLljHdoS+tnpMXKGlcxVYlWIVKZa6jTOmIgmVkid7RTo6SV62js3g79EXcZH7KDb6U 5STth0BFqTcNFtIDBF3/RlEvDWfZXuqA5p5ELhyoJR/rfUB86trlDKt1bLEdcwQK2V2N wtknm5iAOvxvIijfKuByzZ+u9NQ6/1tNtsTITTcmb+lw3vESDQdotRJL7qGu8ko00Db6 w5GyxRIFtx/pAQ0BrwtudZLtzLTs3pll+ciHoIRpAj5MD5rzDXgpuQKpW0HBkDNJurpL 1g3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=EWdJXaJ5TqoryqwuMvq5bnNx+IakyH3fiuxz/gvRKzA=; b=c9m7uIAzV2J/ovluYqAEb3SnVac8shbkgsQPdFxmQhBvrI3unAXYb9ClgIHirE663z T+yozP71qORrOPBUbaV3TqSYkbfU0d5yXmuVRQnrYP+OqX938UmznVp7oQU7Qu6tHki0 3fSVMp6UErdZCYSpML38mXBQ+U9v4Ktvnj4vrK0kGHGHqZLW/0/lKqLz17jPfLdut0j2 6h25i8iLJgAD3BUAwNMXcAYiPz4nWW3n7nkW9n2ngFbtti5M+vE9R7TTDtQZ7/NSYG/6 fh/8sxhFqUtDAD6KywvH46fB7cWPDiITccNcN9Yoh8oAhTW7ItptL5TVdNO7eootmA3n kOFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nHQYUm7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hp33-20020a1709073e2100b0072b870ca25esi12132879ejc.996.2022.07.25.10.50.24; Mon, 25 Jul 2022 10:50:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nHQYUm7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235107AbiGYRff (ORCPT + 99 others); Mon, 25 Jul 2022 13:35:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233651AbiGYRfe (ORCPT ); Mon, 25 Jul 2022 13:35:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E5D4A46D; Mon, 25 Jul 2022 10:35:32 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8D2986136C; Mon, 25 Jul 2022 17:35:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE9D7C341C6; Mon, 25 Jul 2022 17:35:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1658770531; bh=kh3tndFuVD2wJowfuXuViF7ZbZfeBVF01vFmDB5+zMA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nHQYUm7pAy8gTngWmTf0uFsMzw7YjqQDGiq/xazn19yNqsWr3sA04SOkAC9WtualF Qgfmiz8sL0eYXWIoKcJpiInFK037rMp6t4hJY7h2DjgtTJnEsEEECoKKy3WXnul18p We61RxXPB3R55oQYCXqwnR7HxynqlipEUM+EHw+JJnAL9UbfIz5g51rZ3rrcWnJUIk XXUWfgFXZa0/4qVe5J7OOHRnN5WY6eGD/87mSOVnnkLFWE7ozjAr/yUxXMB35UxJe6 /1nDo3bCkED3R3lirUpgj/HYFi6AtvEZu/wXVI9CRsh2+ho7NOUM/98WHOLtmSacA7 YtaU0s5SeFR0g== Received: from 82-132-215-64.dab.02.net ([82.132.215.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oG1zN-009tTK-FG; Mon, 25 Jul 2022 18:35:29 +0100 Date: Mon, 25 Jul 2022 18:35:27 +0100 Message-ID: <87zggxaye8.wl-maz@kernel.org> From: Marc Zyngier To: Johan Hovold Cc: Bjorn Helgaas , Pali =?UTF-8?B?Um9ow6Fy?= , Johan Hovold , Kishon Vijay Abraham I , Xiaowei Song , Binghui Wang , Thierry Reding , Ryder Lee , Jianjun Wang , linux-pci@vger.kernel.org, Krzysztof =?UTF-8?B?V2lsY3p5xYRza2k=?= , Ley Foon Tan , linux-kernel@vger.kernel.org Subject: Re: Why set .suppress_bind_attrs even though .remove() implemented? In-Reply-To: References: <20220722143858.GA1818206@bhelgaas> <87czdtxnfn.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 82.132.215.64 X-SA-Exim-Rcpt-To: johan@kernel.org, helgaas@kernel.org, pali@kernel.org, johan+linaro@kernel.org, kishon@ti.com, songxiaowei@hisilicon.com, wangbinghui@hisilicon.com, thierry.reding@gmail.com, ryder.lee@mediatek.com, jianjun.wang@mediatek.com, linux-pci@vger.kernel.org, kw@linux.com, ley.foon.tan@intel.com, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 25 Jul 2022 16:18:48 +0100, Johan Hovold wrote: > > On Mon, Jul 25, 2022 at 03:43:40PM +0100, Marc Zyngier wrote: > > On Mon, 25 Jul 2022 14:25:49 +0100, > > Johan Hovold wrote: > > > > > > [ +CC: maz ] > > > > > > On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote: > > > > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote: > > > > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote: > > > > > > > > > > qcom is a DWC driver, so all the IRQ stuff happens in > > > > > > dw_pcie_host_init(). qcom_pcie_remove() does call > > > > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody > > > > > > calls irq_dispose_mapping(). > > > > > > > > > > > > I'm thoroughly confused by all this. But I suspect that maybe I > > > > > > should drop the "make qcom modular" patch because it seems susceptible > > > > > > to this problem: > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e > > > > > > > > > > That should not be necessary. > > > > > > > > > > As you note above, interrupt handling is implemented in dwc core so if > > > > > there are any issue here at all, which I doubt, then all of the dwc > > > > > drivers that currently can be built as modules would all be broken and > > > > > this would need to be fixed in core. > > > > > > > > I don't know yet whether there's an issue. We need a clear argument > > > > for why there is or is not. The fact that others might be broken is > > > > not an argument for breaking another one ;) > > > > > > It's not breaking anything that is currently working, and if there's > > > some corner case during module unload, that's not the end of the world > > > either. > > > > It may not be the end of the world for you, but you have absolutely no > > idea of what dangling pointers to kernel memory will do on a user > > machine, nor how this can be further exploited. Unloading a module > > should never result in an unsafe kernel. > > Since when is unloading modules something that is expected to work > perfectly? I keep hearing "well, don't do that then" when someone > complains about unloading this module while doing this or that broke > something. (And it's only root that can unload modules in the first > place.) Well, maybe I have higher standards. For the stuff I maintain, I now point-blank refuse to support module unloading if this can result in a crash. Or worse. > If this was the general understanding, then it seems the only option > would be to disable module unloading completely as module remove code > almost by definition gets less testing and is subject to bit rot. My personal preference would be to prevent module unloading by default if the probing has succeeded, and have modules to actually buy into unloading. But that ship has sailed a long time ago. > It's useful for developers, but use it at your own risk. > > That said, I agree that if something is next to impossible to get right, > as may be the case with interrupt controllers generally, then fine, > let's disable module unloading for that class of drivers. > > And this would mean disabling driver unbind for the 20+ driver PCI > drivers that currently implement it to some degree. That would be Bjorn's and Lorenzo's call. > Also note that we only appear to have some 60 drivers in the tree that > can be built as modules but cannot be unloaded (if my grep patterns > were correct). I'm not surprised. Preventing module unload requires extra "code", and hardly anyone cares. > > > > > > I've been using the modular pcie-qcom patch for months now, unloading > > > > > and reloading the driver repeatedly to test power sequencing, without > > > > > noticing any problems whatsoever. > > > > > > > > Pali's commit log suggests that unloading the module is not, by > > > > itself, enough to trigger the problem: > > > > > > > > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@kernel.org/ > > > > > > > > Can you test the scenario he mentions? > > > > > > Turns out the pcie-qcom driver does not support legacy interrupts so > > > there's no risk of there being any lingering mappings if I understand > > > things correctly. > > > > It still does MSIs, thanks to dw_pcie_host_init(). If you can remove > > the driver while devices are up and running with MSIs allocated, > > things may get ugly if things align the wrong way (if a driver still > > has a reference to an irq_desc or irq_data, for example). > > That is precisely the way I've been testing it and everything appears > to be tore down as it should. > > And a PCI driver that has been unbound should have released its > resources, or that's a driver bug. Right? But that's the thing: you can easily remove part of the infrastructure without the endpoint driver even noticing. It may not happen in your particular case if removing the RC driver will also nuke the endpoints in the process, but I can't see this is an absolute guarantee. The crash pointed to by an earlier email is symptomatic of it. > And for the OF INTx case you mentioned earlier, aren't those mapped by > PCI core and could in theory be released by core as well? Potentially, though I haven't tried to follow the life cycle of those. The whole thing is pretty fragile, and this sort of resource is rarely expected to be removed... M. -- Without deviation from the norm, progress is not possible.