Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp2547191rwo; Sun, 23 Jul 2023 18:37:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlEsx/9T0TIRMChXJxizTAYBshEJdobZ9BbqLQWLNJ3t3qAXRWcsGoyRk3dkHr5NRsDrTtjN X-Received: by 2002:a05:6512:b28:b0:4fd:f7fa:8017 with SMTP id w40-20020a0565120b2800b004fdf7fa8017mr435546lfu.60.1690162675671; Sun, 23 Jul 2023 18:37:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690162675; cv=none; d=google.com; s=arc-20160816; b=e2YxFiU3SWWJBTKM+w91VqDxG+qu3YTOLby/aBwtKsof/ub1M5g8x56TNNYuM0kZaO OmSHm0Aw9xw7yRGqRy/N5EFIu57xB9+IiiCpnzfHNwIx4lsFn/DeW7N+fukdyFPtiWJA se2R1r0RIATZLtuhqzfwR6PzhE//bGVF5ZwnBT9yNyqGA2iSCJR0mkjAQTmrmKhcRHyN j9TyBIbVArbvAl1rhZberPq5AXXwRUq5WlkejObhecr2nPk6+NEW7iR3Nh+oH+z8uQz9 3QzlZK71n4GabGUFECN+owsmC1HjcC2W6N218xKjgyva7QRFPaPJjV/kNaplMl4pdIff CixQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZjczExmZdN/jtzTRy5Aianu3+OGxPgP/cQGnb29S7k0=; fh=tWb09VX64tZ654qovmhMwwzOZwsuHFmOhAa1DdRMECE=; b=mSVgreS4W5nEuR8wthvL1NY4EIXTV+QmA7ei3ED5+7zmD/9u9XjHtmnB5ERDsT09KI fFM5aAu85yxgqXX88QdsVbZQZTR0Dxyhkh57pJKN3QyAoaXn8op4d0Kwmv0X6gA1zYl7 wOm36g1N2Vw6oWG3ZNbz8PAcfmd5IEIhBW7E7Z4s8ni2jbV5OxjjG8MGJeqvTSa4sD24 iAGImZxkXOTfYh1z0tuCM1+qneBUf7HCE+Dazv1j8gJD7VxaDcquvgZdPuzRVjxMs7+v csGjCJITdmLlPs+GyamM6T54Nci7AlHnDrZpx2ejK62OK6Da1K8HtuP7mPtzSBUZMl23 HwCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C1if8TFA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n23-20020a056402515700b00521b39e854esi5822908edd.32.2023.07.23.18.37.31; Sun, 23 Jul 2023 18:37:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C1if8TFA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229770AbjGXBPP (ORCPT + 99 others); Sun, 23 Jul 2023 21:15:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229582AbjGXBPM (ORCPT ); Sun, 23 Jul 2023 21:15:12 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DEE5E67; Sun, 23 Jul 2023 18:14:47 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4D36F60F0D; Mon, 24 Jul 2023 01:14:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 328A5C433CB; Mon, 24 Jul 2023 01:14:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690161284; bh=pFPsulNY8CifDit3IxFHUwPW3LXVhs1/DkBRenKfGQY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C1if8TFAurSgwamDa9KgklonzWT4866h5duwmzwmTAh9FyYTFxe70OmahVbcdJ99l 1OVRqd//qswXRrCAENhGDXuzlhrZPLa0a0L4RH8QVfwxQad+M1KwsGg0aqj4J+MPhI 8VqSboOY8iVMUwAjeTjyWbbSpuX6oVdq8Rr1FNrkf8/3crPHq54dalZYbV0Kl6WzJx iQd2qsnBIRiGpfXWc2Hj61Yv46c0nYnG9VvH0q6zl2lCRxCdTggRyVTZGOwjj/KqbK 8hLx77VIJcCLRBcPBtqpRh/jT3QPA01RlWPF+PUifytfFzn2nj8e+UfHdW3EGTXaye 2PAUHLdm/WFqw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Jason Gunthorpe , Lu Baolu , Kevin Tian , Heiko Stuebner , Niklas Schnelle , Joerg Roedel , Sasha Levin , joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev Subject: [PATCH AUTOSEL 6.4 14/58] iommu: Make __iommu_group_set_domain() handle error unwind Date: Sun, 23 Jul 2023 21:12:42 -0400 Message-Id: <20230724011338.2298062-14-sashal@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230724011338.2298062-1-sashal@kernel.org> References: <20230724011338.2298062-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.4.5 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jason Gunthorpe [ Upstream commit dcf40ed3a20d727be054c4a20db47b32cb5036d4 ] Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Tested-by: Heiko Stuebner Tested-by: Niklas Schnelle Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/iommu.c | 137 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 112 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f1dcfa3f1a1b4..873e66ab1e2d7 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -97,8 +97,26 @@ static int __iommu_attach_device(struct iommu_domain *domain, struct device *dev); static int __iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group); + +enum { + IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0, +}; + +static int __iommu_group_set_domain_internal(struct iommu_group *group, + struct iommu_domain *new_domain, + unsigned int flags); static int __iommu_group_set_domain(struct iommu_group *group, - struct iommu_domain *new_domain); + struct iommu_domain *new_domain) +{ + return __iommu_group_set_domain_internal(group, new_domain, 0); +} +static void __iommu_group_set_domain_nofail(struct iommu_group *group, + struct iommu_domain *new_domain) +{ + WARN_ON(__iommu_group_set_domain_internal( + group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED)); +} + static int iommu_create_device_direct_mappings(struct iommu_group *group, struct device *dev); static struct iommu_group *iommu_group_get_for_dev(struct device *dev); @@ -2028,15 +2046,13 @@ EXPORT_SYMBOL_GPL(iommu_domain_free); static void __iommu_group_set_core_domain(struct iommu_group *group) { struct iommu_domain *new_domain; - int ret; if (group->owner) new_domain = group->blocking_domain; else new_domain = group->default_domain; - ret = __iommu_group_set_domain(group, new_domain); - WARN(ret, "iommu driver failed to attach the default/blocking domain"); + __iommu_group_set_domain_nofail(group, new_domain); } static int __iommu_attach_device(struct iommu_domain *domain, @@ -2221,21 +2237,55 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group) } EXPORT_SYMBOL_GPL(iommu_attach_group); -static int iommu_group_do_set_platform_dma(struct device *dev, void *data) +static int __iommu_device_set_domain(struct iommu_group *group, + struct device *dev, + struct iommu_domain *new_domain, + unsigned int flags) { - const struct iommu_ops *ops = dev_iommu_ops(dev); - - if (!WARN_ON(!ops->set_platform_dma_ops)) - ops->set_platform_dma_ops(dev); + int ret; + ret = __iommu_attach_device(new_domain, dev); + if (ret) { + /* + * If we have a blocking domain then try to attach that in hopes + * of avoiding a UAF. Modern drivers should implement blocking + * domains as global statics that cannot fail. + */ + if ((flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) && + group->blocking_domain && + group->blocking_domain != new_domain) + __iommu_attach_device(group->blocking_domain, dev); + return ret; + } return 0; } -static int __iommu_group_set_domain(struct iommu_group *group, - struct iommu_domain *new_domain) +/* + * If 0 is returned the group's domain is new_domain. If an error is returned + * then the group's domain will be set back to the existing domain unless + * IOMMU_SET_DOMAIN_MUST_SUCCEED, otherwise an error is returned and the group's + * domains is left inconsistent. This is a driver bug to fail attach with a + * previously good domain. We try to avoid a kernel UAF because of this. + * + * IOMMU groups are really the natural working unit of the IOMMU, but the IOMMU + * API works on domains and devices. Bridge that gap by iterating over the + * devices in a group. Ideally we'd have a single device which represents the + * requestor ID of the group, but we also allow IOMMU drivers to create policy + * defined minimum sets, where the physical hardware may be able to distiguish + * members, but we wish to group them at a higher level (ex. untrusted + * multi-function PCI devices). Thus we attach each device. + */ +static int __iommu_group_set_domain_internal(struct iommu_group *group, + struct iommu_domain *new_domain, + unsigned int flags) { + struct group_device *last_gdev; + struct group_device *gdev; + int result; int ret; + lockdep_assert_held(&group->mutex); + if (group->domain == new_domain) return 0; @@ -2245,8 +2295,12 @@ static int __iommu_group_set_domain(struct iommu_group *group, * platform specific behavior. */ if (!new_domain) { - __iommu_group_for_each_dev(group, NULL, - iommu_group_do_set_platform_dma); + for_each_group_device(group, gdev) { + const struct iommu_ops *ops = dev_iommu_ops(gdev->dev); + + if (!WARN_ON(!ops->set_platform_dma_ops)) + ops->set_platform_dma_ops(gdev->dev); + } group->domain = NULL; return 0; } @@ -2256,16 +2310,52 @@ static int __iommu_group_set_domain(struct iommu_group *group, * domain. This switch does not have to be atomic and DMA can be * discarded during the transition. DMA must only be able to access * either new_domain or group->domain, never something else. - * - * Note that this is called in error unwind paths, attaching to a - * domain that has already been attached cannot fail. */ - ret = __iommu_group_for_each_dev(group, new_domain, - iommu_group_do_attach_device); - if (ret) - return ret; + result = 0; + for_each_group_device(group, gdev) { + ret = __iommu_device_set_domain(group, gdev->dev, new_domain, + flags); + if (ret) { + result = ret; + /* + * Keep trying the other devices in the group. If a + * driver fails attach to an otherwise good domain, and + * does not support blocking domains, it should at least + * drop its reference on the current domain so we don't + * UAF. + */ + if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) + continue; + goto err_revert; + } + } group->domain = new_domain; - return 0; + return result; + +err_revert: + /* + * This is called in error unwind paths. A well behaved driver should + * always allow us to attach to a domain that was already attached. + */ + last_gdev = gdev; + for_each_group_device(group, gdev) { + const struct iommu_ops *ops = dev_iommu_ops(gdev->dev); + + /* + * If set_platform_dma_ops is not present a NULL domain can + * happen only for first probe, in which case we leave + * group->domain as NULL and let release clean everything up. + */ + if (group->domain) + WARN_ON(__iommu_device_set_domain( + group, gdev->dev, group->domain, + IOMMU_SET_DOMAIN_MUST_SUCCEED)); + else if (ops->set_platform_dma_ops) + ops->set_platform_dma_ops(gdev->dev); + if (gdev == last_gdev) + break; + } + return ret; } void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group) @@ -3182,16 +3272,13 @@ EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner); static void __iommu_release_dma_ownership(struct iommu_group *group) { - int ret; - if (WARN_ON(!group->owner_cnt || !group->owner || !xa_empty(&group->pasid_array))) return; group->owner_cnt = 0; group->owner = NULL; - ret = __iommu_group_set_domain(group, group->default_domain); - WARN(ret, "iommu driver failed to attach the default domain"); + __iommu_group_set_domain_nofail(group, group->default_domain); } /** -- 2.39.2