Received: by 10.192.165.148 with SMTP id m20csp1546410imm; Thu, 10 May 2018 12:26:00 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrE6/W2M8jgkMa5CF/TRpjhGw1j2F5zUcyOcougO6gqq807E6VjdswqAn7YvJn8FrZWanbR X-Received: by 2002:a65:47c8:: with SMTP id f8-v6mr2062464pgs.430.1525980360462; Thu, 10 May 2018 12:26:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525980360; cv=none; d=google.com; s=arc-20160816; b=ShYbdQnYJcYZwe7ADb0Bqfr9LkbVYHr0HIvSggdOaMaZ4CfdknDM0i3u8U/ZX0gWkJ 5QIeHeS+PZxg/V0yCbiXR3ZIURcuOWKu4lkoo0vqa9r1kXjoamWaPqGCXMY0ZiRBI3YX FJvpNkglepHsQNj2ZomErhjBKoRifHxiNVppjGUKbaFfSXfu61t9WLMPGF9Gt77OyGls szY8A69trHXs4o7VtCDHpbil/9ATaZaf/yfp77GzKAqPMDEHYtt26n8cT8R9O/m5NE9O yToulvYl3/IvJQTPqYinuQ4ir+8txS6UD+sLte9A2Ig6A6aC+7mvwm7UzVMqTFlzt3Yz 3Ypw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=RSJd5V5B4X19vd9i4ZIrtC10qNt+/vYNbHE6/Z0yXbE=; b=uFyklMU5erw/6LsHK9wtH/iSoSAlFxaxnefytvWO0dhMv3Xus5u7zEo2ApeT1sBFJ4 hyeY6ox2uMCSiWE5ZZaRATJ3uyoLt1Vj6oJe/xkeKwFqLqQa1GoK7oORTndQ7NNgj0G0 jjR/XzIgBDpYCJ6ljH6CiBXZF1Lqgo1qJD7/UGZI6EnOWKkIXz8/+2mTCqEACL8kpMaS 76D/NeBQliCY+OF0ywjyw/3NQAlghk/yRQJWvrAeDAfkyFmKETFuib7vuUFvBu6pp54r TiPQvYE0JYCFGeG4fff/wtc3CxBKSqKRXnm1AYehoY+k5Jn/zeAjkzu2lez2XWIvv8HV moqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bc2-v6si1363210plb.43.2018.05.10.12.25.46; Thu, 10 May 2018 12:26:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752113AbeEJTZD (ORCPT + 99 others); Thu, 10 May 2018 15:25:03 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43622 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751223AbeEJTZB (ORCPT ); Thu, 10 May 2018 15:25:01 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 53DEB407049F; Thu, 10 May 2018 19:25:00 +0000 (UTC) Received: from redhat.com (ovpn-124-156.rdu2.redhat.com [10.10.124.156]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0A2D91208F92; Thu, 10 May 2018 19:24:56 +0000 (UTC) Date: Thu, 10 May 2018 15:24:54 -0400 From: Jerome Glisse To: Alex Williamson Cc: Stephen Bates , Christian =?iso-8859-1?Q?K=F6nig?= , Logan Gunthorpe , Bjorn Helgaas , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , Benjamin Herrenschmidt Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Message-ID: <20180510192454.GC3652@redhat.com> References: <20180508205005.GC15608@redhat.com> <7FFB9603-DF9F-4441-82E9-46037CB6C0DE@raithlin.com> <4e0d0b96-ab02-2662-adf3-fa956efd294c@deltatee.com> <2fc61d29-9eb4-d168-a3e5-955c36e5d821@amd.com> <94C8FE12-7FC3-48BD-9DCA-E6A427E71810@raithlin.com> <20180510144137.GA3652@redhat.com> <20180510131015.4ad59477@w520.home> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180510131015.4ad59477@w520.home> User-Agent: Mutt/1.9.2 (2017-12-15) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 10 May 2018 19:25:00 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 10 May 2018 19:25:00 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jglisse@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 10, 2018 at 01:10:15PM -0600, Alex Williamson wrote: > On Thu, 10 May 2018 18:41:09 +0000 > "Stephen Bates" wrote: > > > Reasons is that GPU are giving up on PCIe (see all specialize link like > > > NVlink that are popping up in GPU space). So for fast GPU inter-connect > > > we have this new links. > > > > I look forward to Nvidia open-licensing NVLink to anyone who wants to use it ;-). > > No doubt, the marketing for it is quick to point out the mesh topology > of NVLink, but I haven't seen any technical documents that describe the > isolation capabilities or IOMMU interaction. Whether this is included > or an afterthought, I have no idea. AFAIK there is no IOMMU on NVLink between devices, walking a page table and being able to sustain 80GB/s or 160GB/s is hard to achieve :) I think idea behind those interconnect is that devices in the mesh are inherently secure ie each single device is suppose to make sure that no one can abuse it. GPU with their virtual address space and contextualize program executions unit are suppose to be secure (a specter like bug might be lurking on those but i doubt it). So for those interconnect you program physical address directly in the page table of the devices and those physical address are un-translated from hard- ware perspective. Note that the kernel driver that do the actual GPU page table programming can do sanity check on value it is setting. So checks can also happens at setup time. But after that assumption is hardware is secure and no one can abuse it AFAICT. > > > > Also the IOMMU isolation do matter a lot to us. Think someone using this > > > peer to peer to gain control of a server in the cloud. > > From that perspective, do we have any idea what NVLink means for > topology and IOMMU provided isolation and translation? I've seen a > device assignment user report that seems to suggest it might pretend to > be PCIe compatible, but the assigned GPU ultimately doesn't work > correctly in a VM, so perhaps the software compatibility is only so > deep. Thanks, Note that each single GPU (in configurations i am aware of) also have a PCIE link with the CPU/main memory. So from that point of view they very much behave like a regular PCIE devices. It is just that each GPUs in the mesh can access each other memory through high bandwidth interconnect. I am not sure how much is public beyond that, i will ask NVidia to try to have someone chime in this thread and shed light on this, if possible. Cheers, J?r?me