Received: by 10.192.165.148 with SMTP id m20csp1366499imm; Thu, 10 May 2018 09:34:41 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqifOgB+12NQQHVevM/IXQT8Q/zZhNvtmdMT42UyBNkfUz7iVPpVIulIcxBFBd5XdFT6rjZ X-Received: by 2002:a17:902:8f84:: with SMTP id z4-v6mr2055701plo.194.1525970081801; Thu, 10 May 2018 09:34:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525970081; cv=none; d=google.com; s=arc-20160816; b=ZIeJ5Ll0Or63anlvdhdcsM8lk4RvqB5WvcARU+klw78lQh2hjymiUAHJB2viyvKJWI oYRsIiF7l5bzWAblq7EYkTyAAzdBNH0Z4VS8tMZKvjXPnsHjwwCzK0M3eXvl22ONV/vV kjXQ4bz3bGwxHL8a39G1SdheezF7KBlW1O8hgJl6ZmZkZ3liMz8f9hiOQ1YrfKLxF7Rx 5KXLnICO2NLakuNXR7yAmFEQNCK4c4LElY8bHIr3sNcZJuqjVYZCAk3tScCNnMWFWPH0 JIM7TtpsQCwfh26Ga0q7nKUGt0JfpRXxivDd17kCTBDg/zcRvAqv8OngPU9XsX7qq56b mLwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:arc-authentication-results; bh=16XvQm3JJAJhAHH1kcC55IHsWaYFQJLxEpqLAOp6EtI=; b=Ma2AgEcpiieDRjQDZXwsRXpL1FXjq2cilBkgdkCo6cZ9a0YvGAu3Gb1/EqtAWSEXTN GB+j+0r3AQA5NkQheVLWD9Hdd0qxEy3FScHwq2q5YaFag5jeAbtT8JdBw84SMu8EW1Wk 8tCbN1Qn0DyqCYbqlfBQ63tOemloQ7aQtKdihDy1I0HjWFLw573/R7jZzUA1KFeD9XXF zdd9jOLE7DgJwK2D7TNEXHJweno0gRnFljPvSf53eUWA1daNj193Wh532DKbyQYTI6LO mYkJbA4P3qgQnYS04SuVSHJP0qi5tNiidAJxXCOBgzk+KVUQnCOCd0PtoheG3+c5XvDu MAqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a14-v6si1140912plt.484.2018.05.10.09.34.27; Thu, 10 May 2018 09:34:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966377AbeEJQdF (ORCPT + 99 others); Thu, 10 May 2018 12:33:05 -0400 Received: from ale.deltatee.com ([207.54.116.67]:47958 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965017AbeEJQdC (ORCPT ); Thu, 10 May 2018 12:33:02 -0400 Received: from guinness.priv.deltatee.com ([172.16.1.162]) by ale.deltatee.com with esmtp (Exim 4.89) (envelope-from ) id 1fGoUh-0005y8-Ju; Thu, 10 May 2018 10:32:40 -0600 To: Stephen Bates , =?UTF-8?Q?Christian_K=c3=b6nig?= , Jerome Glisse Cc: Alex Williamson , Bjorn Helgaas , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , Benjamin Herrenschmidt References: <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> <20180508133407.57a46902@w520.home> <5fc9b1c1-9208-06cc-0ec5-1f54c2520494@deltatee.com> <20180508141331.7cd737cb@w520.home> <20180508205005.GC15608@redhat.com> <7FFB9603-DF9F-4441-82E9-46037CB6C0DE@raithlin.com> <4e0d0b96-ab02-2662-adf3-fa956efd294c@deltatee.com> <2fc61d29-9eb4-d168-a3e5-955c36e5d821@amd.com> <94C8FE12-7FC3-48BD-9DCA-E6A427E71810@raithlin.com> From: Logan Gunthorpe Message-ID: Date: Thu, 10 May 2018 10:32:22 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <94C8FE12-7FC3-48BD-9DCA-E6A427E71810@raithlin.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 172.16.1.162 X-SA-Exim-Rcpt-To: benh@kernel.crashing.org, dan.j.williams@intel.com, maxg@mellanox.com, jgg@mellanox.com, bhelgaas@google.com, sagi@grimberg.me, keith.busch@intel.com, axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, helgaas@kernel.org, alex.williamson@redhat.com, jglisse@redhat.com, christian.koenig@amd.com, sbates@raithlin.com X-SA-Exim-Mail-From: logang@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-8.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, GREYLIST_ISWHITE autolearn=ham autolearn_force=no version=3.4.1 Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/05/18 08:16 AM, Stephen Bates wrote: > Hi Christian > >> Why would a switch not identify that as a peer address? We use the PASID >> together with ATS to identify the address space which a transaction >> should use. > > I think you are conflating two types of TLPs here. If the device supports ATS then it will issue a TR TLP to obtain a translated address from the IOMMU. This TR TLP will be addressed to the RP and so regardless of ACS it is going up to the Root Port. When it gets the response it gets the physical address and can use that with the TA bit set for the p2pdma. In the case of ATS support we also have more control over ACS as we can disable it just for TA addresses (as per 7.7.7.7.2 of the spec). Yes. Remember if we are using the IOMMU the EP is being programmed (regardless of whether it's a DMA engine, NTB window or GPUVA) with an IOVA address which is separate from the device's PCI bus address. Any packet addressed to an IOVA address is going to go back to the root complex no matter what the ACS bits say. Only once ATS translates the addres back into the PCI bus address will the EP send packets to the peer and the switch will attempt to root them to the peer and only then do the ACS bits apply. And the direct translated ACS bit allows packets that have purportedly been translated through. > > If I'm not completely mistaken when you disable ACS it is perfectly > > possible that a bridge identifies a transaction as belonging to a peer > > address, which isn't what we want here. > > You are right here and I think this illustrates a problem for using the IOMMU at all when P2PDMA devices do not support ATS. Let me explain: > > If we want to do a P2PDMA and the DMA device does not support ATS then I think we have to disable the IOMMU (something Mike suggested earlier). The reason is that since ATS is not an option the EP must initiate the DMA using the addresses passed down to it. If the IOMMU is on then this is an IOVA that could (with some non-zero probability) point to an IO Memory address in the same PCI domain. So if we disable ACS we are in trouble as we might MemWr to the wrong place but if we enable ACS we lose much of the benefit of P2PDMA. Disabling the IOMMU removes the IOVA risk and ironically also resolves the IOMMU grouping issues. > So I think if we want to support performant P2PDMA for devices that don't have ATS (and no NVMe SSDs today support ATS) then we have to disable the IOMMU. I know this is problematic for AMDs use case so perhaps we also need to consider a mode for P2PDMA for devices that DO support ATS where we can enable the IOMMU (but in this case EPs without ATS cannot participate as P2PDMA DMA iniators). > > Make sense? Not to me. In the p2pdma code we specifically program DMA engines with the PCI bus address. So regardless of whether we are using the IOMMU or not, the packets will be forwarded directly to the peer. If the ACS Redir bits are on they will be forced back to the RC by the switch and the transaction will fail. If we clear the ACS bits, the TLPs will go where we want and everything will work (but we lose the isolation of ACS). For EPs that support ATS, we should (but don't necessarily have to) program them with the IOVA address so they can go through the translation process which will allow P2P without disabling the ACS Redir bits -- provided the ACS direct translation bit is set. (And btw, if it is, then we lose the benefit of ACS protecting against malicious EPs). But, per above, the ATS transaction should involve only the IOVA address so the ACS bits not being set should not break ATS. Logan