Received: by 10.223.185.116 with SMTP id b49csp83150wrg; Fri, 2 Mar 2018 14:06:37 -0800 (PST) X-Google-Smtp-Source: AG47ELvKmK1tGrKJO9SZPvj9rKs8VWbVIWsrURhoSF4MaB1yC3difLAwHZG50NH8lwAbJQxMWZIJ X-Received: by 2002:a17:902:b185:: with SMTP id s5-v6mr6488596plr.109.1520028396980; Fri, 02 Mar 2018 14:06:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520028396; cv=none; d=google.com; s=arc-20160816; b=SlDzaERJ0Hxo2/XtaRn6DlQHC+ldsJN0BcuLita2NL1+Mc3QvtwlOWrKlvrJ3hD6w+ ax6X4V+YCKueiOIBlGpT31cUfoiC1zwPKSoLyXU+Gv+V/At4SApCnyz4HYyh/3ZKd/vO chXHs+fCtEBO7m8OkxfS9YZVxFd0wdiFufJHmsjOal4X+uB0faH3b6MR8h9vjLqzZIPD BsboNXbyDmYHMHGxMEQr06Mu/gRlAO+CYrF+D1T9D3wWBO9+cMLzKIukMS4G/uyInF50 x5lT/UAtvqLwRXU7C6fClxnr0nRWId1iSdoRLGnVQ3hAYlpyZkJnUJCfOawxavg9y2Mr Zb3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=Ziu5FBcU+4+vPeA5U41MmKWoIiW0FNSCHrmtuRWL0DY=; b=ktW96AO8HBY/BnzU6Ab9W6Mb2h7NjkLsLAlXQ55xqA3cuuWgoCO4TetVMx5j2InNhu IERcnI1fvjnk8Jy1SRWAAPW4SxRqho6CCtU61Gl/uE3b9nf02oKj1NWkWke7CBO/8meD gXVvDrbH/+VqZ3zC0FLj1Zl8nYNDd1V0RShDimA0BQ+vj/ans9GlwbWPkS6C1+1C4BXc o7N/AgATns+wrVpb+dLsEj8DpySo7CuYMy0ulPcZ8MK8lV3vWfHul+wBxzUHx+oTU3lK E6QkLV6jaQFopqVpq8wD2JLNWSrzVXQg7yCDVYDiKt9jeogNVZyt5JjFBvJeyJzUIuVR WT1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a13si4509112pgt.572.2018.03.02.14.06.22; Fri, 02 Mar 2018 14:06:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934207AbeCBVpH (ORCPT + 99 others); Fri, 2 Mar 2018 16:45:07 -0500 Received: from gate.crashing.org ([63.228.1.57]:33265 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750924AbeCBVpF (ORCPT ); Fri, 2 Mar 2018 16:45:05 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id w22LiCBs025135; Fri, 2 Mar 2018 15:44:14 -0600 Message-ID: <1520027052.4592.60.camel@kernel.crashing.org> Subject: Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory From: Benjamin Herrenschmidt To: Logan Gunthorpe , Dan Williams Cc: Jens Axboe , Keith Busch , Oliver OHalloran , Alex Williamson , linux-nvdimm , linux-rdma , linux-pci@vger.kernel.org, Linux Kernel Mailing List , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, =?ISO-8859-1?Q?J=E9r=F4me?= Glisse , Jason Gunthorpe , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig Date: Sat, 03 Mar 2018 08:44:12 +1100 In-Reply-To: <1519946734.4592.48.camel@au1.ibm.com> References: <20180228234006.21093-1-logang@deltatee.com> <1519876489.4592.3.camel@kernel.crashing.org> <1519876569.4592.4.camel@au1.ibm.com> <1519936477.4592.23.camel@au1.ibm.com> <2079ba48-5ae5-5b44-cce1-8175712dd395@deltatee.com> <43ba615f-a6e1-9444-65e1-494169cb415d@deltatee.com> <1519945204.4592.45.camel@au1.ibm.com> <595acefb-18fc-e650-e172-bae271263c4c@deltatee.com> <1519946734.4592.48.camel@au1.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.5 (3.26.5-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2018-03-02 at 10:25 +1100, Benjamin Herrenschmidt wrote: > On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote: > > > > On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote: > > > We use only 52 in practice but yes. > > > > > > > That's 64PB. If you use need > > > > a sparse vmemmap for the entire space it will take 16TB which leaves you > > > > with 63.98PB of address space left. (Similar calculations for other > > > > numbers of address bits.) > > > > > > We only have 52 bits of virtual space for the kernel with the radix > > > MMU. > > > > Ok, assuming you only have 52 bits of physical address space: the sparse > > vmemmap takes 1TB and you're left with 3.9PB of address space for other > > things. So, again, why doesn't that work? Is my math wrong > > The big problem is not the vmemmap, it's the linear mapping Allright, so, I think I have a plan to fix this, but it will take a little bit of time. Basically the idea is to have firmware pass to Linux a region that's known to not have anything in it that it can use for the vmalloc space rather than have linux arbitrarily cut the address space in half. I'm pretty sure I can always find large enough "holes" in the physical address space that are outside of both RAM/OpenCAPI/Nvlink and PCIe/MMIO space. If anything, unused chip IDs. But I don't want Linux to have to know about the intimate HW details so I'll pass it from FW. It will take some time to adjust Linux and get updated FW around though. Once that's done, I'll be able to have the linear mapping go through the entire 52-bit space (minus that hole). Of course the hole need to be large enough to hold a vmemmap for a 52-bit space, so that's about 4TB. So I probably need a hole that's at least 8TB. As for the mapping attributes, it should be easy for my linear mapping code to ensure anything that isn't actual RAM is mapped NC. Cheers, Ben.