Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp863200pxj; Sat, 8 May 2021 00:36:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJywXHldVtiHC5/M7reR7QzvcK5z4Ji8nq1FUb48Bkcl1YxXy36LCIAaxpYLtUwBwce71Qej X-Received: by 2002:a17:902:8205:b029:ee:aa49:489b with SMTP id x5-20020a1709028205b02900eeaa49489bmr14584910pln.5.1620459412651; Sat, 08 May 2021 00:36:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620459412; cv=none; d=google.com; s=arc-20160816; b=PxNaVgrZtrWry441/iNJeJFcaXqCbGKevK/HPTM18Bdvoy/iJt8uJhFEbkw83okKYn uhqCcdTpvZR4/Xq96VAAGIpFpsM9H+Hx4UpSKRQ9oD7IpuqMVNNanV9NvMFe12HPSmTZ f2pKbUaMCYAa5SpnafIHuC+pSvy5yQqlDbgMv5RjoJYxXEye6Xhz9DfIApOA3wgm+q5K AczJLqfYGtyW09C3iwyIeHGNd8ugd0INHd41mPYVxrWmyd2hU2QPJmuD/7s4YV5g9qKo D+MKWEaz7rEGdq3UIJ+o3Ax+R45yT8Q2KJTsEabJFOaUDTHar6jmCPCsquchtlmlNCOf WrhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=uUQIuKju2Gxd2PgRoiS3tGA6NZwGQ1CLor6CVV/7hm4=; b=TSEq6imKCvdSL+aam9WI0BLis9yArfMxjOFOsBq+f7jivjM3ECEvrHmgBwTZGA4VjZ 8sAhk8WPDjF8KHlhSH2bBpYl/t0H8ccqdXTPcxFiBBIicxUzRsRqEL7c8HNYYwB7WsB/ 9zM6O0uwbs5WY3OmgmMVU5s8szKNnkfccel9wEB31gbaxROf9W3DgyhUtSYes5JGIdN1 vvCCPDiwOLXGI3OwwOFQEPS+b0qgsqYFKxQWDY1N2eAsiJzdPZ/b24A8aqzGHFOBUI9U HL88uQM+z30NY1Af292kf9vcbNfaTajwqUPA59jEhmpjWvTwHGVsdNNknt65POEmDmmZ Xr0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 136si9557193pfx.347.2021.05.08.00.36.40; Sat, 08 May 2021 00:36:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229726AbhEHHhM (ORCPT + 99 others); Sat, 8 May 2021 03:37:12 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:18795 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229583AbhEHHhL (ORCPT ); Sat, 8 May 2021 03:37:11 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4FcfF53gzYzBt7y; Sat, 8 May 2021 15:33:29 +0800 (CST) Received: from [10.174.187.224] (10.174.187.224) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Sat, 8 May 2021 15:35:58 +0800 Subject: Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework To: Lu Baolu , , , , Robin Murphy , Will Deacon , "Joerg Roedel" , Jean-Philippe Brucker , Yi Sun , Tian Kevin References: <20210507102211.8836-1-zhukeqian1@huawei.com> <20210507102211.8836-2-zhukeqian1@huawei.com> CC: Alex Williamson , Kirti Wankhede , Cornelia Huck , Jonathan Cameron , , , , From: Keqian Zhu Message-ID: <18ac787a-179e-71f7-728b-c43feda80a16@huawei.com> Date: Sat, 8 May 2021 15:35:57 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.187.224] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Baolu, On 2021/5/8 11:46, Lu Baolu wrote: > Hi Keqian, > > On 5/7/21 6:21 PM, Keqian Zhu wrote: >> Some types of IOMMU are capable of tracking DMA dirty log, such as >> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the >> dirty log tracking framework in the IOMMU base layer. >> >> Four new essential interfaces are added, and we maintaince the status >> of dirty log tracking in iommu_domain. >> 1. iommu_support_dirty_log: Check whether domain supports dirty log tracking >> 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking >> 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap >> 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap >> >> Note: Don't concurrently call these interfaces with other ops that >> access underlying page table. >> >> Signed-off-by: Keqian Zhu >> Signed-off-by: Kunkun Jiang >> --- >> drivers/iommu/iommu.c | 201 +++++++++++++++++++++++++++++++++++ >> include/linux/iommu.h | 63 +++++++++++ >> include/trace/events/iommu.h | 63 +++++++++++ >> 3 files changed, 327 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index 808ab70d5df5..0d15620d1e90 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, >> domain->type = type; >> /* Assume all sizes by default; the driver may override this later */ >> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; >> + mutex_init(&domain->switch_log_lock); >> return domain; >> } >> @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain, >> } >> EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks); >> +bool iommu_support_dirty_log(struct iommu_domain *domain) >> +{ >> + const struct iommu_ops *ops = domain->ops; >> + >> + return ops->support_dirty_log && ops->support_dirty_log(domain); >> +} >> +EXPORT_SYMBOL_GPL(iommu_support_dirty_log); > > I suppose this interface is to ask the vendor IOMMU driver to check > whether each device/iommu in the domain supports dirty bit tracking. > But what will happen if new devices with different tracking capability > are added afterward? Yep, this is considered in the vfio part. We will query again after attaching or detaching devices from the domain. When the domain becomes capable, we enable dirty log for it. When it becomes not capable, we disable dirty log for it. > > To make things simple, is it possible to support this tracking only when > all underlying IOMMUs support dirty bit tracking? IIUC, all underlying IOMMUs you refer is of system wide. I think this idea may has two issues. 1) The target domain may just contains part of system IOMMUs. 2) The dirty tracking capability can be related to the capability of devices. For example, we can track dirty log based on IOPF, which needs the capability of devices. That's to say, we can make this framework more common. > > Or, the more crazy idea is that we don't need to check this capability > at all. If dirty bit tracking is not supported by hardware, just mark > all pages dirty? Yeah, I think this idea is nice :). Still one concern is that we may have other dirty tracking methods in the future, if we can't track dirty through iommu, we can still try other methods. If there is no interface to check this capability, we have no chance to try other methods. What do you think? > >> + >> +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, >> + unsigned long iova, size_t size, int prot) >> +{ >> + const struct iommu_ops *ops = domain->ops; >> + unsigned long orig_iova = iova; >> + unsigned int min_pagesz; >> + size_t orig_size = size; >> + bool flush = false; >> + int ret = 0; >> + >> + if (unlikely(!ops->switch_dirty_log)) >> + return -ENODEV; >> + >> + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); >> + if (!IS_ALIGNED(iova | size, min_pagesz)) { >> + pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", >> + iova, size, min_pagesz); >> + return -EINVAL; >> + } >> + >> + mutex_lock(&domain->switch_log_lock); >> + if (enable && domain->dirty_log_tracking) { >> + ret = -EBUSY; >> + goto out; >> + } else if (!enable && !domain->dirty_log_tracking) { >> + ret = -EINVAL; >> + goto out; >> + } >> + >> + pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n", >> + enable ? "enable" : "disable", iova, size); >> + >> + while (size) { >> + size_t pgsize = iommu_pgsize(domain, iova, size); >> + >> + flush = true; >> + ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot); > > Per minimal page callback is much expensive. How about using (pagesize, > count), so that all pages with the same page size could be handled in a > single indirect call? I remember I commented this during last review, > but I don't mind doing it again. Thanks for reminding me again :). I'll do that in next version. Thanks, Keqian