Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp15222543rwb; Mon, 28 Nov 2022 09:10:45 -0800 (PST) X-Google-Smtp-Source: AA0mqf5BrbhYr/AGBNgTflcSSmfsxKWrrCH9BPt36maIqtb14vn4xXkvN9fkTCBBbDkKmksmJxlZ X-Received: by 2002:a17:906:5442:b0:7c0:4030:ae09 with SMTP id d2-20020a170906544200b007c04030ae09mr4165121ejp.322.1669655445197; Mon, 28 Nov 2022 09:10:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669655445; cv=none; d=google.com; s=arc-20160816; b=CdDRMpvqFvUI9Yv5BvZ6SyCE5ijJoXPc7C/oXBOOb96ckOs6XLzgCd2oickdbwtS0A x1y5JaITYTwK9Z3KObMeZ/n/Xwa2/fhpXw9n2tc2plbdqT33Pyh6TzsycNKrn7+2mtmo p1lsnikiORKnMrS9vYLFftc4n8qpPIewEqCvmPlkZxFce5YAbN5W/PC3lKkhGMhX2OqC VSipb/sJa7pL7yAbuH4zTALucGOAommQLpysmE7ydFhruRedQRhMnkppUpSWLSewPR1E S8y1P9fVTHZGBTKGG7cN5MOoWJqLZlmm1ZP59/ecPlqHkqZnCAFWwk+W0r2tpQvzzymj Pi4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=/ZpBkOTGP5c+RumrWvqeAy6oyVkrGT7Pezg1m/vOjfQ=; b=JmmLTjcusQSUCrDeK464I6Huvbb5EXh2L9VRMl7SCkvIIFB4sp8vAHatJgPbNtRr9I oZgdCm/GiupVZs+VB4UQ1lOA81177d1wmKL+hdhPTME3F/emLR4RWToEGCniTAI8Uc0/ CpSgQ1D451p9mqDQZHwvfHVhmvDsWhUdg7+g5Wx8JQ9MGzuqKSUon1rfy8FvvwCBlcqX hYKAL4qDf5/8YPbo/9XmiK3dRoknst6LSXBvJPbovggHzgcLeDmFIu1TUp9Mb7tv7Nb6 P9rdB2JUaOnwwNZ60rEdrJEvW8uJd5/P5V/g84ccs6LGJtznZJqrT5k+xRnUtwZKR+4n 8MjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g18-20020a50ec12000000b004614db9789asi9586537edr.127.2022.11.28.09.10.24; Mon, 28 Nov 2022 09:10:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231334AbiK1Q4w (ORCPT + 85 others); Mon, 28 Nov 2022 11:56:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230419AbiK1Q4t (ORCPT ); Mon, 28 Nov 2022 11:56:49 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 26ACD1D0D9; Mon, 28 Nov 2022 08:56:47 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 12FCAD6E; Mon, 28 Nov 2022 08:56:54 -0800 (PST) Received: from [10.57.71.118] (unknown [10.57.71.118]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B77073F73B; Mon, 28 Nov 2022 08:56:44 -0800 (PST) Message-ID: <815278cc-7fad-1657-c07a-e9825f137e5c@arm.com> Date: Mon, 28 Nov 2022 16:56:39 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v2 4/7] iommu: Let iommu.strict override ops->def_domain_type Content-Language: en-GB To: Niklas Schnelle , Jason Gunthorpe Cc: Baolu Lu , Matthew Rosato , Gerd Bayer , iommu@lists.linux.dev, Joerg Roedel , Will Deacon , Wenjia Zhang , Pierre Morel , linux-s390@vger.kernel.org, borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com, svens@linux.ibm.com, linux-kernel@vger.kernel.org, Julian Ruess References: <20221116171656.4128212-1-schnelle@linux.ibm.com> <20221116171656.4128212-5-schnelle@linux.ibm.com> <33eea9bd-e101-4836-19e8-d4b191b78b00@linux.intel.com> <9163440eb6a47fe02730638bbdf72fda5ee5ad2c.camel@linux.ibm.com> <52fe7769ca5b66523c2c93c7d46ebc17dc144aca.camel@linux.ibm.com> From: Robin Murphy In-Reply-To: <52fe7769ca5b66523c2c93c7d46ebc17dc144aca.camel@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-11-28 15:54, Niklas Schnelle wrote: > On Mon, 2022-11-28 at 09:29 -0400, Jason Gunthorpe wrote: >> On Mon, Nov 28, 2022 at 12:10:39PM +0100, Niklas Schnelle wrote: >>> On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: >>>> On 2022/11/17 1:16, Niklas Schnelle wrote: >>>>> When iommu.strict=1 is set or iommu_set_dma_strict() was called we >>>>> should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. >>>>> >>>>> Signed-off-by: Niklas Schnelle >>>>> --- >>>>> drivers/iommu/iommu.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>> index 65a3b3d886dc..d9bf94d198df 100644 >>>>> --- a/drivers/iommu/iommu.c >>>>> +++ b/drivers/iommu/iommu.c >>>>> @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) >>>>> { >>>>> const struct iommu_ops *ops = dev_iommu_ops(dev); >>>>> >>>>> + if (iommu_dma_strict) >>>>> + return IOMMU_DOMAIN_DMA; >>>> >>>> If any quirky device must work in IOMMU identity mapping mode, this >>>> might introduce functional regression. At least for VT-d platforms, some >>>> devices do require IOMMU identity mapping mode for functionality. >>> >>> That's a good point. How about instead of unconditionally returning >>> IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- >>>> def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit >>> is set). That way a device that only supports identity mapping gets to >>> set that but iommu_dma_strict at least always prevents use of an IOVA >>> flush queue. >> >> I would prefer we create some formal caps in iommu_ops to describe >> whatever it is you are trying to do. >> >> Jason > > I agree that there is currently a lack of distinction between what > domain types can be used (capability) and which should be used as > default (iommu.strict=, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA, > ops->def_domain_type.). As far as I'm concerned, the purpose of .def_domain_type is really just for quirks where the device needs an identity mapping, based on knowledge that tends to be sufficiently platform-specific that we prefer to delegate it to drivers. What apple-dart is doing is really just a workaround for not being to indicate per-instance domain type support at the point of the .domain_alloc call, and IIRC what mtk_iommu_v1 is doing is a horrible giant hack around the arch/arm DMA ops that don't understand IOMMU groups. Both of those situations are on the cards to be cleaned up, so don't take too much from them. > My case though is about the latter which I think has some undocumented > and surprising precedences built in at the moment. With this series we > can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want > to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on > whether we're running in a paging hypervisor (z/VM or KVM) to get the > best performance. From a semantic point of view I felt that this is a > good match for ops->def_domain_type in that we pick a default but it's > still possible to change the domain type e.g. via sysfs. Now this had > the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ > to be used even if iommu_set_dma_strict() was called (via > iommu.strict=1) because there is a undocumented override of ops- >> def_domain_type over iommu_def_domain_type which I believe comes from > the mixing of capability and default you also point at. > > I think ideally we need two separate mechanism to determine which > domain types can be used for a particular device (capability) and for > which one to default to with the latter part having a clear precedence > between the options. Put together I think iommu.strict=1 should > override a device's preference (ops->def_domain_type) and > CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if > the device is capable of that. Does that sound reasonable? That sounds like essentially what we already have, though. The current logic should be thus: 1: If the device is untrusted, it gets strict translation, nothing else. If that won't actually work, tough. 2: If .def_domain_type returns a specific type, it is because any other type will not work correctly at all, so we must use that. 3: Otherwise, we compute the user's preferred default type based on kernel config and command line options. 4: Then we determine whether the IOMMU driver actually supports that type, by trying to allocate it. If allocation fails and the preferred type was more relaxed than IOMMU_DOMAIN_DMA, fall back to the stricter type and try one last time. AFAICS the distinction and priority of those steps is pretty clear: 1: Core requirements 2: Driver-specific requirements 3: Core preference 4: Driver-specific support Now, for step 4 we *could* potentially use static capability flags in place of the "try allocating different things until one succeeds", but that doesn't change anything other than saving the repetitive boilerplate in everyone's .domain_alloc implementations. The real moral of the story here is not to express a soft preference where it will be interpreted as a hard requirement. Thanks, Robin.