Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7696610pxb; Thu, 18 Feb 2021 18:11:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJxX8FZpT7ZIWWYtRK2GzlLjg6l93s2yldDZhLkiaikG2rJrmxvfPLd8KTb6jVMfS98szWLj X-Received: by 2002:a17:906:fad4:: with SMTP id lu20mr6781823ejb.341.1613700661416; Thu, 18 Feb 2021 18:11:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613700661; cv=none; d=google.com; s=arc-20160816; b=gwNNeMx/89IBfpCcOoC0c/UAGNvEIuBEsC/6sF2QBZY6bvJjMZZ6xc2LrdDzD5iWpf Frq2mPsvtc3LUxrViZFezdkVFbrtodGL35KJphpVf1y9I7osIGysYhozADktRInnuUFG FPohWRN2vvm8yYaEYdDGAc/cNqsgU4NtFUDxbik96ANvvC6ZmzrB2x2iI7l6HMycz/Le D8ERU89NBSdb/Stpo/mTFro1+qok6W7vcHZ8tQbajE2y6YYtlhx0w9uz2mHiUl0ABnYX kW0Y9UF4SYxUkywPc15EE8quaWK636KG46Iwy4SS9PdjIGNBQTxREH2lrGHijokCvlxE T7Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:dkim-signature:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from; bh=QixjsL1F7Pxwg3v84/41r2sYS8MsxXfpRe413YfdW1w=; b=zkChcEea6w82G2WA1XRbdjHWfjagoXAt6WMYp1osBYPaiEdL4auHnvVDXlHRFLCUFA WO2exotPp/GXOm3c6K15RVWjaFUs13GR2l3uGhLuWIOrBVay3DUebNFP7WMQzfrelz1F AQpYWahqNQN1BIbTbKsHtBlF/p2JNKwrRDgUTEwnzKrSQGFmQDF9aEQt70U4vmq9VEL3 Fd59Y7T2+gSefDJOnRIZT5ffkhx2FQ4NB3m56eBSM8XnhNee+Jh0iwKnNEX87MCYOCH9 SApMLA8YHshMgeTtriCUVynjKZTychL7E2jFGm/8MHcru4cRT0eiztlS5lqXcA2/h8Zz C8gQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=CKn8oBfd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cx19si1762035edb.480.2021.02.18.18.10.38; Thu, 18 Feb 2021 18:11:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=CKn8oBfd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229876AbhBSCI4 (ORCPT + 99 others); Thu, 18 Feb 2021 21:08:56 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:3708 "EHLO hqnvemgate26.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229809AbhBSCIy (ORCPT ); Thu, 18 Feb 2021 21:08:54 -0500 Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Thu, 18 Feb 2021 18:08:13 -0800 Received: from localhost (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 19 Feb 2021 02:08:12 +0000 From: Alistair Popple To: , , , CC: , , , , , , , , , , Alistair Popple Subject: [PATCH v2 4/4] nouveau/svm: Implement atomic SVM access Date: Fri, 19 Feb 2021 13:07:50 +1100 Message-ID: <20210219020750.16444-5-apopple@nvidia.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210219020750.16444-1-apopple@nvidia.com> References: <20210219020750.16444-1-apopple@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1613700493; bh=QixjsL1F7Pxwg3v84/41r2sYS8MsxXfpRe413YfdW1w=; h=From:To:CC:Subject:Date:Message-ID:X-Mailer:In-Reply-To: References:MIME-Version:Content-Transfer-Encoding:Content-Type: X-Originating-IP:X-ClientProxiedBy; b=CKn8oBfd563oKn4/HuBDhtMjUmwIqUCkIrDzl7kmm8bWPAPWHzvXf9YBgkndMRw7X fnVJ2WvEUQ35RKFirUUNHNnHUi2XyMBYte7RyRlOnwiDH+yixRYKRwuO012ZbYI+5h l39IJ8Ss19+D5BZMLVuzuLxdse6uu/btXUQkbUZ2JvaAqQU+EgxbIT+/I82l7Mcfte CXrfBUvVk3CpZwS6B2mR3vmaGWcdodEgfm1k23noJGuhUIOstO4ODxAqJnAZe2rBdF aRyEVPyXTgVUy3WaKG7bYqt0L/id59ngouxc3noDll5jmaTuIGR8FbYr64/p0MuUDY KTpka6Yus+n3w== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some NVIDIA GPUs do not support direct atomic access to system memory via PCIe. Instead this must be emulated by granting the GPU exclusive access to the memory. This is achieved by replacing CPU page table entries with special swap entries that fault on userspace access. The driver then grants the GPU permission to update the page undergoing atomic access via the GPU page tables. When CPU access to the page is required a CPU fault is raised which calls into the device driver via MMU notifiers to revoke the atomic access. The original page table entries are then restored allowing CPU access to proceed. Signed-off-by: Alistair Popple --- drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 + drivers/gpu/drm/nouveau/nouveau_svm.c | 86 ++++++++++++++++--- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 ++ 4 files changed, 81 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h b/drivers/gpu/dr= m/nouveau/include/nvif/if000c.h index d6dd40f21eed..9c7ff56831c5 100644 --- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h +++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h @@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 { #define NVIF_VMM_PFNMAP_V0_APER 0x00000000000000= f0ULL #define NVIF_VMM_PFNMAP_V0_HOST 0x00000000000000= 00ULL #define NVIF_VMM_PFNMAP_V0_VRAM 0x00000000000000= 10ULL +#define NVIF_VMM_PFNMAP_V0_A 0x0000000000000004ULL #define NVIF_VMM_PFNMAP_V0_W 0x00000000000000= 02ULL #define NVIF_VMM_PFNMAP_V0_V 0x00000000000000= 01ULL #define NVIF_VMM_PFNMAP_V0_NONE 0x00000000000000= 00ULL diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouvea= u/nouveau_svm.c index cd7b47c946cf..d2ce4fb9c8ec 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -421,9 +421,9 @@ nouveau_svm_fault_cmp(const void *a, const void *b) return ret; if ((ret =3D (s64)fa->addr - fb->addr)) return ret; - /*XXX: atomic? */ - return (fa->access =3D=3D 0 || fa->access =3D=3D 3) - - (fb->access =3D=3D 0 || fb->access =3D=3D 3); + /* Atomic access (2) has highest priority */ + return (-1*(fa->access =3D=3D 2) + (fa->access =3D=3D 0 || fa->access =3D= =3D 3)) - + (-1*(fb->access =3D=3D 2) + (fb->access =3D=3D 0 || fb->access =3D= =3D 3)); } =20 static void @@ -555,10 +555,57 @@ static void nouveau_hmm_convert_pfn(struct nouveau_dr= m *drm, args->p.phys[0] |=3D NVIF_VMM_PFNMAP_V0_W; } =20 +static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, + struct nouveau_drm *drm, + struct nouveau_pfnmap_args *args, u32 size, + unsigned long hmm_flags, struct mm_struct *mm) +{ + struct page *page; + unsigned long start =3D args->p.addr; + struct vm_area_struct *vma; + int ret =3D 0; + + mmap_read_lock(mm); + vma =3D find_vma_intersection(mm, start, start + size); + if (!vma || !(vma->vm_flags & VM_WRITE)) { + ret =3D -EPERM; + goto out; + } + + hmm_exclusive_range(mm, start, start + PAGE_SIZE, &page); + if (!page) { + ret =3D -EINVAL; + goto out; + } + + /* Map the page on the GPU. */ + args->p.page =3D 12; + args->p.size =3D PAGE_SIZE; + args->p.addr =3D start; + args->p.phys[0] =3D page_to_phys(page) | + NVIF_VMM_PFNMAP_V0_V | + NVIF_VMM_PFNMAP_V0_W | + NVIF_VMM_PFNMAP_V0_A | + NVIF_VMM_PFNMAP_V0_HOST; + + mutex_lock(&svmm->mutex); + svmm->vmm->vmm.object.client->super =3D true; + ret =3D nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); + svmm->vmm->vmm.object.client->super =3D false; + mutex_unlock(&svmm->mutex); + + unlock_page(page); + put_page(page); + +out: + mmap_read_unlock(mm); + return ret; +} + static int nouveau_range_fault(struct nouveau_svmm *svmm, struct nouveau_drm *drm, struct nouveau_pfnmap_args *args, u32 size, - unsigned long hmm_flags, + unsigned long hmm_flags, int atomic, struct svm_notifier *notifier) { unsigned long timeout =3D @@ -608,12 +655,18 @@ static int nouveau_range_fault(struct nouveau_svmm *s= vmm, break; } =20 - nouveau_hmm_convert_pfn(drm, &range, args); + if (atomic) { + mutex_unlock(&svmm->mutex); + ret =3D nouveau_atomic_range_fault(svmm, drm, args, + size, hmm_flags, mm); + } else { + nouveau_hmm_convert_pfn(drm, &range, args); =20 - svmm->vmm->vmm.object.client->super =3D true; - ret =3D nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); - svmm->vmm->vmm.object.client->super =3D false; - mutex_unlock(&svmm->mutex); + svmm->vmm->vmm.object.client->super =3D true; + ret =3D nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); + svmm->vmm->vmm.object.client->super =3D false; + mutex_unlock(&svmm->mutex); + } =20 out: mmu_interval_notifier_remove(¬ifier->notifier); @@ -637,7 +690,7 @@ nouveau_svm_fault(struct nvif_notify *notify) unsigned long hmm_flags; u64 inst, start, limit; int fi, fn; - int replay =3D 0, ret; + int replay =3D 0, atomic =3D 0, ret; =20 /* Parse available fault buffer entries into a cache, and update * the GET pointer so HW can reuse the entries. @@ -718,12 +771,15 @@ nouveau_svm_fault(struct nvif_notify *notify) /* * Determine required permissions based on GPU fault * access flags. - * XXX: atomic? */ switch (buffer->fault[fi]->access) { case 0: /* READ. */ hmm_flags =3D HMM_PFN_REQ_FAULT; break; + case 2: /* ATOMIC. */ + hmm_flags =3D HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE; + atomic =3D true; + break; case 3: /* PREFETCH. */ hmm_flags =3D 0; break; @@ -740,7 +796,7 @@ nouveau_svm_fault(struct nvif_notify *notify) =20 notifier.svmm =3D svmm; ret =3D nouveau_range_fault(svmm, svm->drm, &args.i, - sizeof(args), hmm_flags, ¬ifier); + sizeof(args), hmm_flags, atomic, ¬ifier); mmput(mm); =20 limit =3D args.i.p.addr + args.i.p.size; @@ -760,7 +816,11 @@ nouveau_svm_fault(struct nvif_notify *notify) !(args.phys[0] & NVIF_VMM_PFNMAP_V0_V)) || (buffer->fault[fi]->access !=3D 0 /* READ. */ && buffer->fault[fi]->access !=3D 3 /* PREFETCH. */ && - !(args.phys[0] & NVIF_VMM_PFNMAP_V0_W))) + !(args.phys[0] & NVIF_VMM_PFNMAP_V0_W)) || + (buffer->fault[fi]->access !=3D 0 /* READ. */ && + buffer->fault[fi]->access !=3D 1 /* WRITE. */ && + buffer->fault[fi]->access !=3D 3 /* PREFETCH. */ && + !(args.phys[0] & NVIF_VMM_PFNMAP_V0_A))) break; } =20 diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h b/drivers/gpu/dr= m/nouveau/nvkm/subdev/mmu/vmm.h index a2b179568970..f6188aa9171c 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h @@ -178,6 +178,7 @@ void nvkm_vmm_unmap_region(struct nvkm_vmm *, struct nv= km_vma *); #define NVKM_VMM_PFN_APER 0x00000000000000= f0ULL #define NVKM_VMM_PFN_HOST 0x00000000000000= 00ULL #define NVKM_VMM_PFN_VRAM 0x00000000000000= 10ULL +#define NVKM_VMM_PFN_A 0x0000000000000004ULL #define NVKM_VMM_PFN_W 0x00000000000000= 02ULL #define NVKM_VMM_PFN_V 0x00000000000000= 01ULL #define NVKM_VMM_PFN_NONE 0x00000000000000= 00ULL diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/g= pu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c index 236db5570771..f02abd9cb4dd 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c @@ -88,6 +88,9 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_p= t *pt, if (!(*map->pfn & NVKM_VMM_PFN_W)) data |=3D BIT_ULL(6); /* RO. */ =20 + if (!(*map->pfn & NVKM_VMM_PFN_A)) + data |=3D BIT_ULL(7); /* Atomic disable. */ + if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) { addr =3D *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT; addr =3D dma_map_page(dev, pfn_to_page(addr), 0, @@ -322,6 +325,9 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu= _pt *pt, if (!(*map->pfn & NVKM_VMM_PFN_W)) data |=3D BIT_ULL(6); /* RO. */ =20 + if (!(*map->pfn & NVKM_VMM_PFN_A)) + data |=3D BIT_ULL(7); /* Atomic disable. */ + if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) { addr =3D *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT; addr =3D dma_map_page(dev, pfn_to_page(addr), 0, --=20 2.20.1