Received: by 2002:a05:7412:7c14:b0:fa:6e18:a558 with SMTP id ii20csp281517rdb; Mon, 22 Jan 2024 04:11:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IGwgZn1r1JGi6SJwscHZxmODjcB/QHWFn2PVqF2mZLDGOLRpaJfjssOS+ZKfj8RuvkHOBj/ X-Received: by 2002:a50:fe82:0:b0:55a:3539:bd01 with SMTP id d2-20020a50fe82000000b0055a3539bd01mr1788043edt.2.1705925488968; Mon, 22 Jan 2024 04:11:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705925488; cv=pass; d=google.com; s=arc-20160816; b=DESf8yZQmbg9flV87XxJm/Y74rjqeEK/0WiiPo16gWD5Vj1SK5OsDKa/8ahMRMg/RJ I7cbjEvPSGXiuWKlao8PDnDgq0khuipZEpqXs/20/y3sQm4QjzlBn+UC4OR5C8IIG1Z0 Evcy96uDwyO36AS7t1ah3nZAleAxMESn4uRhswz6gAVxPZCyknnBgrnKkrYlVsRbsCVi 4qw1/gnUxkDZ0m549Sx/XjiqH/jnFdEXsN5obQ6BlfXekuK667Ic1l6qp02XxYkqIbe2 wGBX5C+yOPdFrHBFk1oQuvjO1OCdbQBosEbM13rd6R2Pk9wh8L7FnZq5jgMpcWRCmsFA DbzQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=/1M/Vo6PRjgwybK1W/Cxr6KU9BkXkXd+4QXd4wbpeWg=; fh=0puWr8qDwzaogZbtOaymnvHkPoKpvOitlnMkTa2cyaw=; b=Fv4Gt/HHM6uxAtG4aaiakk1Y9tVH+KC9nNJSxxe9M1V+y2aVm1AFMagtwct1CFTX2W oCDWeQhpWTzPYzoTKWclp4oxe6NcAA0rxG2YL+c4XfKEVyFHXjxF2caCWuo7EFoSuM+C K9aS6v79jkGoeLsX3Cl4LiMvIbHikuWEcwGMSTZBJnEmPSMwKT88mXWihSMExJTUf36+ rWtRsgmsMlHRufaLrpDxm6Q7fBHXfsQGkB3jDXAqmpvl027M9PLAwSX4Q/cJex7GiiBO Bjx8kizG9PKNDSJ3xCP53OckfzJUxx2+OiFBbSeMHrlOyK2q0+SMNrPJRtIUrmwX+7Fm k02g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GlYwsmyd; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-32357-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32357-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id et15-20020a056402378f00b005596712d498si7793765edb.70.2024.01.22.04.11.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 04:11:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-32357-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GlYwsmyd; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-32357-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32357-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 014931F24F85 for ; Mon, 22 Jan 2024 05:53:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 03E8538DD1; Mon, 22 Jan 2024 05:49:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GlYwsmyd" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40395374CF; Mon, 22 Jan 2024 05:49:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=134.134.136.100 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705902566; cv=none; b=G3tFz5BHpaUj1QdxcGVajiRPcKiB/tRE8ey9mmDVSS6G5OelXp0PC7i1OpPhIZZItYkWlXsZslRJveQ7hVMkBipicD3q5r4AECDnY331B3U7GV9ZFsbOALTjk1B3XQdtWsz5rIcW6jtOSagmHmnZ8hHEx83uk7dC0wD91rY0zwg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705902566; c=relaxed/simple; bh=95R07xaQQHoRcNLfIN4PYp5WL6Kt883JL7XRCyDhyQc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SOOht6S2SR84ZOFcQ2GJuRqV47gNSCixE4UqX9juG+YMQ5v9HN0JEwyIkLUa74zjkDcVi8xreCBdVpI4vvxDI33Lh3LK803z3rxzUvIKZ8XS7xghp6lrFcE39fDQGZljppuRkR2hM3E2pD6L1dJEa2DHlmiEV9KxiAPAty9g9RU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GlYwsmyd; arc=none smtp.client-ip=134.134.136.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705902564; x=1737438564; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=95R07xaQQHoRcNLfIN4PYp5WL6Kt883JL7XRCyDhyQc=; b=GlYwsmyd2A5fy6S2GUvQeOXC+XesxV9JLg7Xd9BZ72xXCBgj8tn2xd2E WJk2404hU/JQ/vnunmDAYfkWPSofzhLa+b21X8/P2N7VmPy13gz7cjCZa 7/FqCV9zFM84EFTCuaGHtlq1QMjZjQqR1bQWiwwRXPs4YzncYhBvbuFhj kFADuDXEOS1DN6Ug8uMkC6ztEg7S1YADXrf2pQN4510c8R3Br094gvcE8 y0g92MZVxrhyBco7Lx8w5GLOSBB4mNEmr1sC+s6CTD23a9zPzYKU07Vch QZb8a6oMzjFkGMa4SF2EXBbnb8NFSQyzNF8Gh6Za3hBYu86jSwkgAC/6I g==; X-IronPort-AV: E=McAfee;i="6600,9927,10960"; a="467487201" X-IronPort-AV: E=Sophos;i="6.05,211,1701158400"; d="scan'208";a="467487201" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2024 21:49:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10960"; a="1116764017" X-IronPort-AV: E=Sophos;i="6.05,211,1701158400"; d="scan'208";a="1116764017" Received: from allen-box.sh.intel.com ([10.239.159.127]) by fmsmga005.fm.intel.com with ESMTP; 21 Jan 2024 21:49:19 -0800 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Jason Gunthorpe , Kevin Tian , Jean-Philippe Brucker , Nicolin Chen Cc: Yi Liu , Jacob Pan , Longfang Liu , Yan Zhao , iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Jason Gunthorpe Subject: [PATCH v10 12/16] iommu: Use refcount for fault data access Date: Mon, 22 Jan 2024 13:43:04 +0800 Message-Id: <20240122054308.23901-13-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240122054308.23901-1-baolu.lu@linux.intel.com> References: <20240122054308.23901-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit The per-device fault data structure stores information about faults occurring on a device. Its lifetime spans from IOPF enablement to disablement. Multiple paths, including IOPF reporting, handling, and responding, may access it concurrently. Previously, a mutex protected the fault data from use after free. But this is not performance friendly due to the critical nature of IOPF handling paths. Refine this with a refcount-based approach. The fault data pointer is obtained within an RCU read region with a refcount. The fault data pointer is returned for usage only when the pointer is valid and a refcount is successfully obtained. The fault data is freed with kfree_rcu(), ensuring data is only freed after all RCU critical regions complete. An iopf handling work starts once an iopf group is created. The handling work continues until iommu_page_response() is called to respond to the iopf and the iopf group is freed. During this time, the device fault parameter should always be available. Add a pointer to the device fault parameter in the iopf_group structure and hold the reference until the iopf_group is freed. Make iommu_page_response() static as it is only used in io-pgfault.c. Co-developed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Tested-by: Yan Zhao --- include/linux/iommu.h | 17 +++-- drivers/iommu/io-pgfault.c | 127 +++++++++++++++++++++++-------------- drivers/iommu/iommu-sva.c | 2 +- 3 files changed, 88 insertions(+), 58 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fc912aed7886..396d7b0d88b2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -41,6 +41,7 @@ struct iommu_dirty_ops; struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; +struct iommu_fault_param; #define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -129,8 +130,9 @@ struct iopf_group { struct iopf_fault last_fault; struct list_head faults; struct work_struct work; - struct device *dev; struct iommu_domain *domain; + /* The device's fault data parameter. */ + struct iommu_fault_param *fault_param; }; /** @@ -679,6 +681,8 @@ struct iommu_device { /** * struct iommu_fault_param - per-device IOMMU fault data * @lock: protect pending faults list + * @users: user counter to manage the lifetime of the data + * @ruc: rcu head for kfree_rcu() * @dev: the device that owns this param * @queue: IOPF queue * @queue_list: index into queue->devices @@ -688,6 +692,8 @@ struct iommu_device { */ struct iommu_fault_param { struct mutex lock; + refcount_t users; + struct rcu_head rcu; struct device *dev; struct iopf_queue *queue; @@ -715,7 +721,7 @@ struct iommu_fault_param { */ struct dev_iommu { struct mutex lock; - struct iommu_fault_param *fault_param; + struct iommu_fault_param __rcu *fault_param; struct iommu_fwspec *fwspec; struct iommu_device *iommu_dev; void *priv; @@ -1543,7 +1549,6 @@ void iopf_queue_free(struct iopf_queue *queue); int iopf_queue_discard_partial(struct iopf_queue *queue); void iopf_free_group(struct iopf_group *group); int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); -int iommu_page_response(struct device *dev, struct iommu_page_response *msg); int iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status); #else @@ -1588,12 +1593,6 @@ iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) return -ENODEV; } -static inline int -iommu_page_response(struct device *dev, struct iommu_page_response *msg) -{ - return -ENODEV; -} - static inline int iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status) { diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index 5aea8402be47..ce7058892b59 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -13,6 +13,32 @@ #include "iommu-priv.h" +/* + * Return the fault parameter of a device if it exists. Otherwise, return NULL. + * On a successful return, the caller takes a reference of this parameter and + * should put it after use by calling iopf_put_dev_fault_param(). + */ +static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev) +{ + struct dev_iommu *param = dev->iommu; + struct iommu_fault_param *fault_param; + + rcu_read_lock(); + fault_param = rcu_dereference(param->fault_param); + if (fault_param && !refcount_inc_not_zero(&fault_param->users)) + fault_param = NULL; + rcu_read_unlock(); + + return fault_param; +} + +/* Caller must hold a reference of the fault parameter. */ +static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) +{ + if (refcount_dec_and_test(&fault_param->users)) + kfree_rcu(fault_param, rcu); +} + void iopf_free_group(struct iopf_group *group) { struct iopf_fault *iopf, *next; @@ -22,6 +48,8 @@ void iopf_free_group(struct iopf_group *group) kfree(iopf); } + /* Pair with iommu_report_device_fault(). */ + iopf_put_dev_fault_param(group->fault_param); kfree(group); } EXPORT_SYMBOL_GPL(iopf_free_group); @@ -135,7 +163,7 @@ static int iommu_handle_iopf(struct iommu_fault *fault, goto cleanup_partial; } - group->dev = dev; + group->fault_param = iopf_param; group->last_fault.fault = *fault; INIT_LIST_HEAD(&group->faults); group->domain = domain; @@ -178,64 +206,61 @@ static int iommu_handle_iopf(struct iommu_fault *fault, */ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) { + bool last_prq = evt->fault.type == IOMMU_FAULT_PAGE_REQ && + (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE); struct iommu_fault_param *fault_param; - struct iopf_fault *evt_pending = NULL; - struct dev_iommu *param = dev->iommu; - int ret = 0; + struct iopf_fault *evt_pending; + int ret; - mutex_lock(¶m->lock); - fault_param = param->fault_param; - if (!fault_param) { - mutex_unlock(¶m->lock); + fault_param = iopf_get_dev_fault_param(dev); + if (!fault_param) return -EINVAL; - } mutex_lock(&fault_param->lock); - if (evt->fault.type == IOMMU_FAULT_PAGE_REQ && - (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { + if (last_prq) { evt_pending = kmemdup(evt, sizeof(struct iopf_fault), GFP_KERNEL); if (!evt_pending) { ret = -ENOMEM; - goto done_unlock; + goto err_unlock; } list_add_tail(&evt_pending->list, &fault_param->faults); } ret = iommu_handle_iopf(&evt->fault, fault_param); - if (ret && evt_pending) { + if (ret) + goto err_free; + + mutex_unlock(&fault_param->lock); + /* The reference count of fault_param is now held by iopf_group. */ + if (!last_prq) + iopf_put_dev_fault_param(fault_param); + + return 0; +err_free: + if (last_prq) { list_del(&evt_pending->list); kfree(evt_pending); } -done_unlock: +err_unlock: mutex_unlock(&fault_param->lock); - mutex_unlock(¶m->lock); + iopf_put_dev_fault_param(fault_param); return ret; } EXPORT_SYMBOL_GPL(iommu_report_device_fault); -int iommu_page_response(struct device *dev, - struct iommu_page_response *msg) +static int iommu_page_response(struct iopf_group *group, + struct iommu_page_response *msg) { bool needs_pasid; int ret = -EINVAL; struct iopf_fault *evt; struct iommu_fault_page_request *prm; - struct dev_iommu *param = dev->iommu; - struct iommu_fault_param *fault_param; + struct device *dev = group->fault_param->dev; const struct iommu_ops *ops = dev_iommu_ops(dev); bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID; - - if (!ops->page_response) - return -ENODEV; - - mutex_lock(¶m->lock); - fault_param = param->fault_param; - if (!fault_param) { - mutex_unlock(¶m->lock); - return -EINVAL; - } + struct iommu_fault_param *fault_param = group->fault_param; /* Only send response if there is a fault report pending */ mutex_lock(&fault_param->lock); @@ -276,10 +301,9 @@ int iommu_page_response(struct device *dev, done_unlock: mutex_unlock(&fault_param->lock); - mutex_unlock(¶m->lock); + return ret; } -EXPORT_SYMBOL_GPL(iommu_page_response); /** * iopf_queue_flush_dev - Ensure that all queued faults have been processed @@ -295,22 +319,20 @@ EXPORT_SYMBOL_GPL(iommu_page_response); */ int iopf_queue_flush_dev(struct device *dev) { - int ret = 0; struct iommu_fault_param *iopf_param; - struct dev_iommu *param = dev->iommu; - if (!param) + /* + * It's a driver bug to be here after iopf_queue_remove_device(). + * Therefore, it's safe to dereference the fault parameter without + * holding the lock. + */ + iopf_param = rcu_dereference_check(dev->iommu->fault_param, true); + if (WARN_ON(!iopf_param)) return -ENODEV; - mutex_lock(¶m->lock); - iopf_param = param->fault_param; - if (iopf_param) - flush_workqueue(iopf_param->queue->wq); - else - ret = -ENODEV; - mutex_unlock(¶m->lock); + flush_workqueue(iopf_param->queue->wq); - return ret; + return 0; } EXPORT_SYMBOL_GPL(iopf_queue_flush_dev); @@ -335,7 +357,7 @@ int iopf_group_response(struct iopf_group *group, (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) resp.flags = IOMMU_PAGE_RESP_PASID_VALID; - return iommu_page_response(group->dev, &resp); + return iommu_page_response(group, &resp); } EXPORT_SYMBOL_GPL(iopf_group_response); @@ -384,10 +406,15 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) int ret = 0; struct dev_iommu *param = dev->iommu; struct iommu_fault_param *fault_param; + const struct iommu_ops *ops = dev_iommu_ops(dev); + + if (!ops->page_response) + return -ENODEV; mutex_lock(&queue->lock); mutex_lock(¶m->lock); - if (param->fault_param) { + if (rcu_dereference_check(param->fault_param, + lockdep_is_held(¶m->lock))) { ret = -EBUSY; goto done_unlock; } @@ -402,10 +429,11 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) INIT_LIST_HEAD(&fault_param->faults); INIT_LIST_HEAD(&fault_param->partial); fault_param->dev = dev; + refcount_set(&fault_param->users, 1); list_add(&fault_param->queue_list, &queue->devices); fault_param->queue = queue; - param->fault_param = fault_param; + rcu_assign_pointer(param->fault_param, fault_param); done_unlock: mutex_unlock(¶m->lock); @@ -429,10 +457,12 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) int ret = 0; struct iopf_fault *iopf, *next; struct dev_iommu *param = dev->iommu; - struct iommu_fault_param *fault_param = param->fault_param; + struct iommu_fault_param *fault_param; mutex_lock(&queue->lock); mutex_lock(¶m->lock); + fault_param = rcu_dereference_check(param->fault_param, + lockdep_is_held(¶m->lock)); if (!fault_param) { ret = -ENODEV; goto unlock; @@ -454,8 +484,9 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) list_for_each_entry_safe(iopf, next, &fault_param->partial, list) kfree(iopf); - param->fault_param = NULL; - kfree(fault_param); + /* dec the ref owned by iopf_queue_add_device() */ + rcu_assign_pointer(param->fault_param, NULL); + iopf_put_dev_fault_param(fault_param); unlock: mutex_unlock(¶m->lock); mutex_unlock(&queue->lock); diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 9de878e40413..b51995b4fe90 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -251,7 +251,7 @@ static void iommu_sva_handle_iopf(struct work_struct *work) static int iommu_sva_iopf_handler(struct iopf_group *group) { - struct iommu_fault_param *fault_param = group->dev->iommu->fault_param; + struct iommu_fault_param *fault_param = group->fault_param; INIT_WORK(&group->work, iommu_sva_handle_iopf); if (!queue_work(fault_param->queue->wq, &group->work)) -- 2.34.1