Received: by 10.213.65.68 with SMTP id h4csp2719724imn; Mon, 2 Apr 2018 12:42:42 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/F4n7mgekM+RRsHyqRb6edtCZGpNGIkYEhy/GXe8TgnsMl4hbanxnHrCp6DHiKOrovWBSK X-Received: by 2002:a17:902:9:: with SMTP id 9-v6mr11505900pla.42.1522698162652; Mon, 02 Apr 2018 12:42:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522698162; cv=none; d=google.com; s=arc-20160816; b=VypNLB8pzR/5KWmeXyh6ItdZJ5lP5mQBsQHLdhanpdhzedCTMI+uGk/4PXmA8Wzh9V bplIhuY0fgBtPl3daYBm8+8cjzM2cfwqNmVaH2ZM9Rs4NAG2F1t+gBhPmdgXAkTWAUKn xEnXPldYrRpY9+2jRqEfTMuUOePHQEJSX3A15E3ZNBlClcRL/r66dNgSpeRcxoy73i9n ZGUj91OmsoTTCIbxjlm6pUJOGCLrZTe79V+OCyBUQwc8gfZhd3xAPBBhz3s54KfvRuDk pASEeZ3VOpT5W6XNHoDpp4xLcqMU6WFK8fQW5GsvpanmgEkpmyyMVyW6ZtO8UTuHI9wR JVHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=7TOQ1VhlzyH7V5UloZBy4+D2CajvOFezguuR5qeK4oA=; b=t3xakb5Cr/pfCU/RvKk6PMExc/6jvjMb7BrfDcbuFcOWQDkEpxOeL3fWu2PFsnAec3 YFyEv0GRmOjNviriDl8Dx0r7fnhYbNr8RZcIvssdVG49hPJjXVLwXEMY1F8RANcGJUWJ SQGM0N33pwbUlJIl75eNaCKAC1g+Itq6e+btZ6851x/Gt5uFEQmCmNVhVGn7jftuq1iZ 3pww7klSWFBMSdGxIvW85SJXHYBhHtkWbcz1VaEC6XcFIw40xN/UtBd725UkBJftP+ks Yucr8gn8iTT0hVhPzUJSzSweNp3vmYogOqkq0sOTAB+OJKiirNBME1LAXqf97NqGHlqz aPxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7-v6si1040018plk.380.2018.04.02.12.42.28; Mon, 02 Apr 2018 12:42:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756768AbeDBTQy (ORCPT + 99 others); Mon, 2 Apr 2018 15:16:54 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:47708 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752321AbeDBTQx (ORCPT ); Mon, 2 Apr 2018 15:16:53 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 81EBC8151D46; Mon, 2 Apr 2018 19:16:52 +0000 (UTC) Received: from redhat.com (ovpn-122-207.rdu2.redhat.com [10.10.122.207]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B556FD7DE0; Mon, 2 Apr 2018 19:16:51 +0000 (UTC) Date: Mon, 2 Apr 2018 15:16:50 -0400 From: Jerome Glisse To: Logan Gunthorpe Cc: Christian =?iso-8859-1?Q?K=F6nig?= , Christoph Hellwig , Will Davis , Joerg Roedel , linaro-mm-sig@lists.linaro.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, Bjorn Helgaas Subject: Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev() Message-ID: <20180402191649.GB18231@redhat.com> References: <70adc2cc-f7aa-d4b9-7d7a-71f3ae99f16c@gmail.com> <98ce6cfd-bcf3-811e-a0f1-757b60da467a@deltatee.com> <8d050848-8970-b8c4-a657-429fefd31769@amd.com> <20180330015854.GA3572@redhat.com> <0234bc5e-495e-0f68-fb0a-debb17a35761@deltatee.com> <20180330194519.GC3198@redhat.com> <31266710-f6bb-99ee-c73d-6e58afe5c38c@deltatee.com> <20180402172027.GA18231@redhat.com> <6f796779-0ba3-d056-de33-341ee55d6b38@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6f796779-0ba3-d056-de33-341ee55d6b38@deltatee.com> User-Agent: Mutt/1.9.2 (2017-12-15) X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 02 Apr 2018 19:16:52 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 02 Apr 2018 19:16:52 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jglisse@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 02, 2018 at 11:37:07AM -0600, Logan Gunthorpe wrote: > > > On 02/04/18 11:20 AM, Jerome Glisse wrote: > > The point i have been trying to get accross is that you do have this > > information with dma_map_resource() you know the device to which you > > are trying to map (dev argument to dma_map_resource()) and you can > > easily get the device to which the memory belongs because you have the > > CPU physical address of the memory hence you can lookup the resource > > and get the device from that. > > How do you go from a physical address to a struct device generally and > in a performant manner? There isn't good API at the moment AFAIK, closest thing would either be lookup_resource() or region_intersects(), but a more appropriate one can easily be added, code to walk down the tree is readily available. More- over this can be optimize like vma lookup are, even more as resource are seldomly added so read side (finding a resource) can be heavily favor over write side (adding|registering a new resource). > > > IIRC CAPI make P2P mandatory but maybe this is with NVLink. We can ask > > the PowerPC folks to confirm. Note CAPI is Power8 and newer AFAICT. > > PowerPC folks recently told us specifically that Power9 does not support > P2P between PCI root ports. I've said this many times. CAPI has nothing > to do with it. I need to check CAPI, i must have confuse that with NVLink which is also on some powerpc arch. > > > Mapping to userspace have nothing to do here. I am talking at hardware > > level. How thing are expose to userspace is a completely different > > problems that do not have one solution fit all. For GPU you want this > > to be under total control of GPU drivers. For storage like persistent > > memory, you might want to expose it userspace more directly ... > > My understanding (and I worked on this a while ago) is that CAPI > hardware manages memory maps typically for userspace memory. When a > userspace program changes it's mapping, the CAPI hardware is updated so > that hardware is coherent with the user address space and it is safe to > DMA to any address without having to pin memory. (This is very similar > to ODP in RNICs.) This is *really* nice but doesn't solve *any* of the > problems we've been discussing. Moreover, many developers want to keep > P2P in-kernel, for the time being, where the problem of pinning memory > does not exist. What you describe is the ATS(Address Translation Service)/PASID(Process Address Space IDentifier) part of CAPI. Which have also been available for years on AMD x86 platform (AMD IOMMU-v2), thought it is barely ever use. Interesting aspect of CAPI is its cache coherency protocol between devices and CPUs. This in both direction, the usual device access to system memory can be cache coherent with CPU access and participate in cache coherency protocol (bit further than PCIE snoop). But also the other direction the CPU access to device memory can also be cache coherent, which is not the case in PCIE. This cache coherency between CPU and device is what made me assume that CAPI must have Peer To Peer support as peer must be able to talk to each other for cache coherency purpose. But maybe all cache coherency arbritration goes through central directory allievating Peer to Peer requirement. Anyway, like you said, this does not matter for the discussion. The dma_map_resource() can be just stub out on platform that do not support this and they would not allow it. If it get use on other platform and shows enough advantages that users start asking for it then maybe those platform will attention to the hardware requirement. Note that with mmu_notifier there isn't any need to pin stuff (even without any special hardware capabilities), as long as you can preempt what is happening on your hardware to update its page table. Cheers, J?r?me