Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1688657pxj; Wed, 19 May 2021 11:30:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw4Uq4GcHPZGqmuhIMndScsc50LsuEh2O8D8rG1HZHa1mJRsRyU9GSTuCrvGfwUTyLlnw3U X-Received: by 2002:a5d:84d5:: with SMTP id z21mr824085ior.183.1621449031948; Wed, 19 May 2021 11:30:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621449031; cv=none; d=google.com; s=arc-20160816; b=inyYK6p1lV6Gr+D84DAyhdD5wp3vnPIi6BbqvfdQP4n+oVdJoly6zWvJyp030bTAy5 l/juJYjnXD+8PVVSSz9aLdHmgN4IFHhjUnuRSaPySuuJkOlSWR1mo/YZz6yVvjFsUwNq /jKBTHs2zOCnofQ9OoCqjabhepVoCr4cwgsVprIrsjrvLjFykFgPhziy7iLFkB6O+89F no6YxmlnpUcC/I5sfzM1ioaJ8RZK90A7ovPdEYqzulZRsVe2kn+1R4EK0PLhit8mnyLM C3bMw+TfPrcnCeNZgzxEet14asiORDteB2OwqVUz1Eioo2GcE3Dfkp97bN9gGUBpbXLA z7MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=Tfjlufh0xC6ZP3c7VljIYlskYT5mjHLtMK/8FSz46Jg=; b=HAu2iradFHnUlW5FZ69G36ZboWhQd0kJswab/7EUeJaaqsN5r0NTZqzq8cHaSQbQZ7 ZmiuLs8oFeNRnMp8Lef60XsuroxwPXulS8su5r18qVozcPcXOPCJ5lh2FdQ9+h4gmKM3 4CmH34Vw0yzPTeFzXjn4zbzCPOJCGnm2fkIwsDsYVct4m51hO8hxaElKmAOb7HXHYso+ LtpWUdwruzXxJJDOvJHRjHcClzSaA+7w/5+9VyfmCSvd6Uq5sCxmiOw8diNuV4YJVVsy dBs3QmWlunsCBXHMwA2nHeqUex4FBBSTa4PfxeNWqScvioRhUQgzc/YmuPveVHgZfNnl IkBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y5t6ayWS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n11si394774ilk.101.2021.05.19.11.30.18; Wed, 19 May 2021 11:30:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y5t6ayWS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351799AbhERTAx (ORCPT + 99 others); Tue, 18 May 2021 15:00:53 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36481 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351763AbhERTA1 (ORCPT ); Tue, 18 May 2021 15:00:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621364349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tfjlufh0xC6ZP3c7VljIYlskYT5mjHLtMK/8FSz46Jg=; b=Y5t6ayWSt6iwrOFDlc1S7cXWfgp7vjj40MA0kvIqxiFzX5tvTgd1ovO4pG6jnQa/psPmtH zS3DjIbLUIYY4D49jiDz8OG13BGmLnMrUk1Z39tF5Th3p1zurMzJi2aNsbdDmXQDLDkfCr 0Da83nkwSciaxwV4HY83Lm7e6WFNwL8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-556-urbit0cmMIiAzdAOFUsM4A-1; Tue, 18 May 2021 14:59:05 -0400 X-MC-Unique: urbit0cmMIiAzdAOFUsM4A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BEEFF6D4E6; Tue, 18 May 2021 18:59:03 +0000 (UTC) Received: from redhat.com (ovpn-113-225.phx2.redhat.com [10.3.113.225]) by smtp.corp.redhat.com (Postfix) with ESMTP id ED03B100F49F; Tue, 18 May 2021 18:59:02 +0000 (UTC) Date: Tue, 18 May 2021 12:58:18 -0600 From: Alex Williamson To: Shenming Lu Cc: Cornelia Huck , Will Deacon , Robin Murphy , Joerg Roedel , Jean-Philippe Brucker , Eric Auger , , , , , , Kevin Tian , Lu Baolu , , Christoph Hellwig , Jonathan Cameron , Barry Song , , Subject: Re: [RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled Message-ID: <20210518125818.2282941f.alex.williamson@redhat.com> In-Reply-To: <20210409034420.1799-7-lushenming@huawei.com> References: <20210409034420.1799-1-lushenming@huawei.com> <20210409034420.1799-7-lushenming@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 9 Apr 2021 11:44:18 +0800 Shenming Lu wrote: > If IOPF enabled for the VFIO container, there is no need to statically > pin and map the entire DMA range, we can do it on demand. And unmap > according to the IOPF mapped bitmap when removing vfio_dma. > > Note that we still mark all pages dirty even if IOPF enabled, we may > add IOPF-based fine grained dirty tracking support in the future. > > Signed-off-by: Shenming Lu > --- > drivers/vfio/vfio_iommu_type1.c | 38 +++++++++++++++++++++++++++------ > 1 file changed, 32 insertions(+), 6 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 7df5711e743a..dcc93c3b258c 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -175,6 +175,7 @@ struct vfio_iopf_group { > #define IOPF_MAPPED_BITMAP_GET(dma, i) \ > ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG] \ > >> ((i) % BITS_PER_LONG)) & 0x1) > +#define IOPF_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n) > > #define WAITED 1 > > @@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > * already pinned and accounted. Accouting should be done if there is no > * iommu capable domain in the container. > */ > - do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); > + do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || > + iommu->iopf_enabled; > > for (i = 0; i < npage; i++) { > struct vfio_pfn *vpfn; > @@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data, > > mutex_lock(&iommu->lock); > > - do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); > + do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || > + iommu->iopf_enabled; pin/unpin are actually still pinning pages, why does iopf exempt them from accounting? > for (i = 0; i < npage; i++) { > struct vfio_dma *dma; > dma_addr_t iova; > @@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, > if (!dma->size) > return 0; > > - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled) > return 0; > > /* > @@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu, > } > } > > +static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma *dma) > +{ > + vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size); > + > + kfree(dma->iopf_mapped_bitmap); > +} > + > static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > { > WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > + if (iommu->iopf_enabled) > + vfio_dma_clean_iopf(iommu, dma); > put_task_struct(dma->task); > vfio_dma_bitmap_free(dma); > if (dma->vaddr_invalid) { > @@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, > * mark all pages dirty if any IOMMU capable device is not able > * to report dirty pages and all pages are pinned and mapped. > */ > - if (iommu->num_non_pinned_groups && dma->iommu_mapped) > + if (iommu->num_non_pinned_groups && > + (dma->iommu_mapped || iommu->iopf_enabled)) > bitmap_set(dma->bitmap, 0, nbits); This seems like really poor integration of iopf into dirty page tracking. I'd expect dirty logging to flush the mapped pages and write faults to mark pages dirty. Shouldn't the fault handler also provide only the access faulted, so for example a read fault wouldn't mark the page dirty? > > if (shift) { > @@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > goto out_unlock; > } > > + if (iommu->iopf_enabled) { > + dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES( > + size >> PAGE_SHIFT), GFP_KERNEL); > + if (!dma->iopf_mapped_bitmap) { > + ret = -ENOMEM; > + kfree(dma); > + goto out_unlock; > + } So we're assuming nothing can fault and therefore nothing can reference the iopf_mapped_bitmap until this point in the series? > + } > + > iommu->dma_avail--; > dma->iova = iova; > dma->vaddr = vaddr; > @@ -1811,8 +1834,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > /* Insert zero-sized and grow as we map chunks of it */ > vfio_link_dma(iommu, dma); > > - /* Don't pin and map if container doesn't contain IOMMU capable domain*/ > - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) > + /* > + * Don't pin and map if container doesn't contain IOMMU capable domain, > + * or IOPF enabled for the container. > + */ > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled) > dma->size = size; > else > ret = vfio_pin_map_dma(iommu, dma, size);