Received: by 10.192.165.148 with SMTP id m20csp1257527imm; Thu, 10 May 2018 08:00:27 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqE240WMuJkxvel3/eutsibLQW4TTvCxgpEqycdvZXeIQVdVTU7Q9PkCh+Zj7XvHIIAQkhi X-Received: by 2002:a65:4083:: with SMTP id t3-v6mr1413585pgp.129.1525964427663; Thu, 10 May 2018 08:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525964427; cv=none; d=google.com; s=arc-20160816; b=XkLNTwBuzImCRrpiOeagsgj5vta6SuP5VEZXoPZM9eJF3QjXe9OUukinj93MlXpM9C /3CWADFDF1KVb9AD28I7eGMjYfTeKxbqS72aXncrXZZpiZv5JqY7TfQG2DEGNYJq/C0g V2rMHAKStjBdhJUbVXP4U2VrooYL0rKCFCcUwXI0iYG1xSAlZgLY3teD3YXXCuHbHtFK 1R/Ncgk+/lOtzwHWTUiAx/XKFO4zEy7VP/gu1AVIMisTU53lQVXvpgsEBTGyIkK6SVWK UJQdM7nqfKtFxtO0x/HLyOiDH9HPwTmqExlSPqXvf9ZQgU+0rfqjXiMiqDNzfYb93Ljj sfMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=2EPiI8N/GmMFjO/pzy/k6n9LzRhWtJR7HkoEA9wDqXU=; b=DE8MXHJvLYzEQHwhCQRHpuca4X1gkFZWq+LGdIFf6quXIOQdgM71miliMiyW2CUO5t StkCRpi7bpxl0rqqIncadLB8GRER2qTxwBkP3PfjSNYfWpF8Z+h/qrO5xzZ0qSNrCCuM 5yRQIk2KhGKzusNMemfL7Pjp2JiVHgNMLVGLTeT6wCNz5WnOXhmMBbgIN+KeEbzbkwuI 8pbpLSo968ps9J/HDvRDwt1scVJcRN54aQUydjNmck02Hv+Pc5hAh2c+RvpS0WdY6MUG i6DGV8UyKKQ7RjpXyVnN/ciUFgiqsJ7whTh9uuAxx3MxjkAQZwg4z37kquUAgQw79dg5 5AgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e125-v6si991793pfg.112.2018.05.10.08.00.12; Thu, 10 May 2018 08:00:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965949AbeEJO7x (ORCPT + 99 others); Thu, 10 May 2018 10:59:53 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59992 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965215AbeEJO7v (ORCPT ); Thu, 10 May 2018 10:59:51 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 841814059FE0; Thu, 10 May 2018 14:59:50 +0000 (UTC) Received: from redhat.com (ovpn-124-156.rdu2.redhat.com [10.10.124.156]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5D20B2166BAD; Thu, 10 May 2018 14:59:48 +0000 (UTC) Date: Thu, 10 May 2018 10:59:46 -0400 From: Jerome Glisse To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: Stephen Bates , Logan Gunthorpe , Alex Williamson , Bjorn Helgaas , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , Benjamin Herrenschmidt Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Message-ID: <20180510145946.GB3652@redhat.com> References: <20180508205005.GC15608@redhat.com> <7FFB9603-DF9F-4441-82E9-46037CB6C0DE@raithlin.com> <1775CC56-4651-422F-953A-18E024D3717C@raithlin.com> <20180509160722.GB4140@redhat.com> <366A8132-B88A-40F7-BDE3-DA542E45FC0C@raithlin.com> <20180509174952.GC4140@redhat.com> <405531AE-8315-4A4F-9B0C-8DBE49BFCAB4@raithlin.com> <73b6a454-bf84-1640-0b5e-137d03c0ad8c@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <73b6a454-bf84-1640-0b5e-137d03c0ad8c@amd.com> User-Agent: Mutt/1.9.2 (2017-12-15) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 10 May 2018 14:59:50 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 10 May 2018 14:59:50 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jglisse@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 10, 2018 at 04:29:44PM +0200, Christian K?nig wrote: > Am 10.05.2018 um 16:20 schrieb Stephen Bates: > > Hi Jerome > > > > > As it is tie to PASID this is done using IOMMU so looks for caller > > > of amd_iommu_bind_pasid() or intel_svm_bind_mm() in GPU the existing > > > user is the AMD GPU driver see: > > Ah thanks. This cleared things up for me. A quick search shows there are still no users of intel_svm_bind_mm() but I see the AMD version used in that GPU driver. > > Just FYI: There is also another effort ongoing to give both the AMD, Intel > as well as ARM IOMMUs a common interface so that drivers can use whatever > the platform offers fro SVM support. > > > One thing I could not grok from the code how the GPU driver indicates which DMA events require ATS translations and which do not. I am assuming the driver implements someway of indicating that and its not just a global ON or OFF for all DMAs? The reason I ask is that I looking at if NVMe was to support ATS what would need to be added in the NVMe spec above and beyond what we have in PCI ATS to support efficient use of ATS (for example would we need a flag in the submission queue entries to indicate a particular IO's SGL/PRP should undergo ATS). > > Oh, well that is complicated at best. > > On very old hardware it wasn't a window, but instead you had to use special > commands in your shader which indicated that you want to use an ATS > transaction instead of a normal PCIe transaction for your read/write/atomic. > > As Jerome explained on most hardware we have a window inside the internal > GPU address space which when accessed issues a ATS transaction with a > configurable PASID. > > But on very newer hardware that window became a bit in the GPUVM page > tables, so in theory we now can control it on a 4K granularity basis for the > internal 48bit GPU address space. > To complete this a 50 lines primer on GPU: GPUVA - GPU virtual address GPUPA - GPU physical address GPU run programs very much like CPU program expect a program will have many thousands of threads running concurrently. There is a hierarchy of groups for a given program ie threads are grouped together, the lowest hierarchy level have a group size in <= 64 threads on most GPUs. Those programs (call shader for graphic program think OpenGL, Vulkan or compute for GPGPU think OpenCL CUDA) are submited by the userspace against a given address space. In the "old" days (couple years back when dinausor were still roaming the earth) this address space was specific to the GPU and each user space program could create multiple GPU address space. All the memory operation done by the program was against this address space. Hence all PCIE transactions are spawn from a program + address space. GPU use page table + window aperture (the window aperture is going away so you can focus on page table). To translate GPU virtual address into a physical address. The physical address can point to GPU local memory or to system memory or to another PCIE device memory (ie some PCIE BAR). So all PCIE transaction are spawn through this process of GPUVA to GPUPA then GPUPA is handled by the GPU mmu unit that either spawn a PCIE transaction for non local GPUPA or access local memory otherwise. So per say the kernel driver does not configure which transaction is using ATS or peer to peer. Userspace program create a GPU virtual address space and bind object into it. This object can be system memory or some other PCIE device memory in which case we would to do a peer to peer. So you won't find any logic in the kernel. What you find is creating virtual address space and binding object. Above i talk about the old days, nowadays we want the GPU virtual address space to be exactly the same as the CPU virtual address space as the process which initiate the GPU program is using. This is where we use the PASID and ATS. So here userspace create a special "GPU context" that says that the GPU virtual address space will be the same as the program that create the GPU context. A process ID is then allocated and the mm_struct is bind to this process ID in the IOMMU driver. Then all program executed on the GPU use the process ID to identify the address space against which they are running. All of the above i did not talk about DMA engine which are on the "side" of the GPU to copy memory around. GPU have multiple DMA engines with different capabilities, some of those DMA engine use the same GPU address space as describe above, other use directly GPUPA. Hopes this helps understanding the big picture. I over simplify thing and devils is in the details. Cheers, J?r?me