Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2975971pxu; Mon, 14 Dec 2020 16:37:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHKoFMmOYnWBaypjLlbRpjqnO0DRoyqiMoeUgWhJOz9UToLB3QAH3cWNfp3geFM9irVfup X-Received: by 2002:a17:906:c1c6:: with SMTP id bw6mr25363982ejb.199.1607992674537; Mon, 14 Dec 2020 16:37:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607992674; cv=none; d=google.com; s=arc-20160816; b=WvXeVrMPbr02dTQtupJ1Djq0TrP/vRyQ2BhyDd2Q93lzdmzpmBfwBZo9/93ahpXwN4 u7t2P6/oZd/x3sc1HBFNlFWeUvh+z6datQagkEGts9zEu4yzwxPlIBXVFKbbTQIJ6AdU Vf4obn507nuj6991AD7GIWCyAOQ8mE+6tvR5Y5Uj1vTLiP+69gqdNZcfRy6tYmVBtiwp WL8VOpAh0T+WiRQ+mnkp+D9fAMvCXwwbrlNV+Hm12RQI4hRIoXzlcbsErWc19HBuAp3c zCccykhRioFaMKkGch0TXjiE/7S9OqHU5YXfFpIYXPgQbcZYEb14m5JVfu9o13ScUXv6 xsQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=+KH49WHOC8U/vy7q7vIiLi7+DPF7822/r/b8oPP/btI=; b=fwo56w0IZ+hqQ+W4dO139NFpRs8jwYhc85/qk0IrvnHLGm/7h7Y0xemboPnvnaNwpr fC2gA4YfUrslQE7Ngx6gfo6qrUfFCH2VVBarOepKu6IA3FOc+bVgyWZd0aaljiqS1GYG 42RA/2Im8arCLScjuPekBdzJQE3z3tP+h8nzXCdt8hJo6TpIDNd/E4gtidh2sx2axaZo 95pe6Q7bWzBjKgYFvcNfS8m1GActDqHW5Fm126lefJERK9Gz2yOUtk3YNZNqV/t6FaVP P1xSRb/vhxmNjOp4L+7Uf0OfErM+7RWqzSAhbzdkZ9U/VszeyXF2nZl0ujFtsniEBqUJ 0C2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eEFc6JrP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y11si6549edp.516.2020.12.14.16.37.31; Mon, 14 Dec 2020 16:37:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eEFc6JrP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728622AbgLOASA (ORCPT + 99 others); Mon, 14 Dec 2020 19:18:00 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:22354 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727174AbgLOASA (ORCPT ); Mon, 14 Dec 2020 19:18:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607991393; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+KH49WHOC8U/vy7q7vIiLi7+DPF7822/r/b8oPP/btI=; b=eEFc6JrPa7iXH/eDmnj0bI3EVJ6ICdPc5JSoKVjk8q+vBEGtaGAHO8Q8ljRZN/55mWAR2I 1HkMzyB4sNSK7bEbtjH5mV9Fp7Cx7qGklyKlhOMJ1b2XPFsR0A4RjS/npMRWFGhJ9lnBp5 PA8HFZznpbDPWvNua9JWs7nuyTb/KpM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-291-aQP1cyCSMyiy2PSi3XYmag-1; Mon, 14 Dec 2020 19:16:29 -0500 X-MC-Unique: aQP1cyCSMyiy2PSi3XYmag-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AC33E1922960; Tue, 15 Dec 2020 00:16:25 +0000 (UTC) Received: from omen.home (ovpn-112-193.phx2.redhat.com [10.3.112.193]) by smtp.corp.redhat.com (Postfix) with ESMTP id C55F510023B8; Tue, 15 Dec 2020 00:16:23 +0000 (UTC) Date: Mon, 14 Dec 2020 17:16:23 -0700 From: Alex Williamson To: zhukeqian Cc: , , , , , Cornelia Huck , "Marc Zyngier" , Will Deacon , Robin Murphy , Joerg Roedel , Catalin Marinas , James Morse , "Suzuki K Poulose" , Sean Christopherson , Julien Thierry , Mark Brown , "Thomas Gleixner" , Andrew Morton , Alexios Zavras , , Subject: Re: [PATCH 1/7] vfio: iommu_type1: Clear added dirty bit when unwind pin Message-ID: <20201214171623.6e909138@omen.home> In-Reply-To: References: <20201210073425.25960-1-zhukeqian1@huawei.com> <20201210073425.25960-2-zhukeqian1@huawei.com> <20201210121646.24fb3cd8@omen.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 11 Dec 2020 14:51:47 +0800 zhukeqian wrote: > On 2020/12/11 3:16, Alex Williamson wrote: > > On Thu, 10 Dec 2020 15:34:19 +0800 > > Keqian Zhu wrote: > > > >> Currently we do not clear added dirty bit of bitmap when unwind > >> pin, so if pin failed at halfway, we set unnecessary dirty bit > >> in bitmap. Clearing added dirty bit when unwind pin, userspace > >> will see less dirty page, which can save much time to handle them. > >> > >> Note that we should distinguish the bits added by pin and the bits > >> already set before pin, so introduce bitmap_added to record this. > >> > >> Signed-off-by: Keqian Zhu > >> --- > >> drivers/vfio/vfio_iommu_type1.c | 33 ++++++++++++++++++++++----------- > >> 1 file changed, 22 insertions(+), 11 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 67e827638995..f129d24a6ec3 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -637,7 +637,11 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > >> struct vfio_iommu *iommu = iommu_data; > >> struct vfio_group *group; > >> int i, j, ret; > >> + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); > >> unsigned long remote_vaddr; > >> + unsigned long bitmap_offset; > >> + unsigned long *bitmap_added; > >> + dma_addr_t iova; > >> struct vfio_dma *dma; > >> bool do_accounting; > >> > >> @@ -650,6 +654,12 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > >> > >> mutex_lock(&iommu->lock); > >> > >> + bitmap_added = bitmap_zalloc(npage, GFP_KERNEL); > >> + if (!bitmap_added) { > >> + ret = -ENOMEM; > >> + goto pin_done; > >> + } > > > > > > This is allocated regardless of whether dirty tracking is enabled, so > > this adds overhead to the common case in order to reduce user overhead > > (not correctness) in the rare condition that dirty tracking is enabled, > > and the even rarer condition that there's a fault during that case. > > This is not a good trade-off. Thanks, > > Hi Alex, > > We can allocate the bitmap when dirty tracking is active, do you accept this? > Or we can set the dirty bitmap after all pins succeed, but this costs cpu time > to locate vfio_dma with iova. TBH I don't see this as a terribly significant problem, in the rare event of an error with dirty tracking enabled, the user might see some pages marked dirty that were not successfully pinned by the mdev vendor driver. The solution shouldn't impose more overhead than the original issue. Thanks, Alex > >> + > >> /* Fail if notifier list is empty */ > >> if (!iommu->notifier.head) { > >> ret = -EINVAL; > >> @@ -664,7 +674,6 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > >> do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); > >> > >> for (i = 0; i < npage; i++) { > >> - dma_addr_t iova; > >> struct vfio_pfn *vpfn; > >> > >> iova = user_pfn[i] << PAGE_SHIFT; > >> @@ -699,14 +708,10 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > >> } > >> > >> if (iommu->dirty_page_tracking) { > >> - unsigned long pgshift = __ffs(iommu->pgsize_bitmap); > >> - > >> - /* > >> - * Bitmap populated with the smallest supported page > >> - * size > >> - */ > >> - bitmap_set(dma->bitmap, > >> - (iova - dma->iova) >> pgshift, 1); > >> + /* Populated with the smallest supported page size */ > >> + bitmap_offset = (iova - dma->iova) >> pgshift; > >> + if (!test_and_set_bit(bitmap_offset, dma->bitmap)) > >> + set_bit(i, bitmap_added); > >> } > >> } > >> ret = i; > >> @@ -722,14 +727,20 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > >> pin_unwind: > >> phys_pfn[i] = 0; > >> for (j = 0; j < i; j++) { > >> - dma_addr_t iova; > >> - > >> iova = user_pfn[j] << PAGE_SHIFT; > >> dma = vfio_find_dma(iommu, iova, PAGE_SIZE); > >> vfio_unpin_page_external(dma, iova, do_accounting); > >> phys_pfn[j] = 0; > >> + > >> + if (test_bit(j, bitmap_added)) { > >> + bitmap_offset = (iova - dma->iova) >> pgshift; > >> + clear_bit(bitmap_offset, dma->bitmap); > >> + } > >> } > >> pin_done: > >> + if (bitmap_added) > >> + bitmap_free(bitmap_added); > >> + > >> mutex_unlock(&iommu->lock); > >> return ret; > >> } > > > > . > > >