Received: by 10.223.185.116 with SMTP id b49csp7972436wrg; Thu, 1 Mar 2018 14:36:28 -0800 (PST) X-Google-Smtp-Source: AG47ELug7YQvNzwJT2f5A9aumVooON/0RoizCNrTPfX6RMdmqSxC4PjCVmLXnI5bIPa1Ju0V13OB X-Received: by 10.99.121.140 with SMTP id u134mr2759538pgc.89.1519943788381; Thu, 01 Mar 2018 14:36:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519943788; cv=none; d=google.com; s=arc-20160816; b=zNlUSRaeFWh8JpNexom8ADMPiCEAX8oRfltxV/+PgpioIYUCfPiaHKL3ah9bXaZlmg UerC8qU60HefS5YmKGQFxDYg5SU5UfhGKWRq8Sn1N8e8OQ32NXoZ6SHREgRKAxlB0DHS EYsc6sCJ3i1ybOwba10MzbqHJXRjcZP/Mf8RJWE277JHouhzGNjsXp2Zq5exTofUFRSP 1VHZLgxqdmBEg5Oek+ctvbmH6a2xqAixSUsqm0C1bdiqdpegR8whRokyD73TDIx76oj9 y7kzQYizJnbs+vaj/FnhwQqV6gRiQDaYX9DDH/JW0ZZSHd+4uHH/tqvlCrC03Ywr3Gpa 3f0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=AJtUyC5uKagukM6RVHCMvKFZsT5uGsZKhDyWmpLCXAM=; b=aGRBbJqwscqCmCobdqpRr8gTWkuSkK0oKlSbJDVr7boaHlaRaoZtoghymSXXWiD2qo CJycpP4Z4PhHhVh4N2MZGEyFxDo+5qy9PTg56BOuURwSqc3wSopBJrx78ksWo5gSv1bp TQqDA/K9pJou42O4LcxGlUsPV3ZrCE4j6x/qL62c/lJxrKbn1dqAdUnb+wmeUcI4LAg9 zWmodxDlhNh+abafQhmJU+jXaKablzOpyHoi7REIXv5ppxodRuTs7+pIB85bRrRAlMcw BOKrU8e7Uv40wztjmNHb4QfPCnHfwY6D+Pchx7yQDt+7fPVjnXzxtq5xz+Enk+9MCYCA pR5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g5si3032962pgf.84.2018.03.01.14.36.13; Thu, 01 Mar 2018 14:36:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162651AbeCAWf3 (ORCPT + 99 others); Thu, 1 Mar 2018 17:35:29 -0500 Received: from gate.crashing.org ([63.228.1.57]:48754 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162620AbeCAWf1 (ORCPT ); Thu, 1 Mar 2018 17:35:27 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id w21MYINU030862; Thu, 1 Mar 2018 16:34:20 -0600 Message-ID: <1519943658.4592.34.camel@kernel.crashing.org> Subject: Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory From: Benjamin Herrenschmidt To: Linus Torvalds Cc: Jason Gunthorpe , Dan Williams , Logan Gunthorpe , Linux Kernel Mailing List , linux-pci@vger.kernel.org, linux-nvme , linux-rdma , linux-nvdimm , linux-block , Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Max Gurtovoy , =?ISO-8859-1?Q?J=E9r=F4me?= Glisse , Alex Williamson , Oliver OHalloran Date: Fri, 02 Mar 2018 09:34:18 +1100 In-Reply-To: References: <20180228234006.21093-1-logang@deltatee.com> <1519876489.4592.3.camel@kernel.crashing.org> <1519876569.4592.4.camel@au1.ibm.com> <1519936477.4592.23.camel@au1.ibm.com> <1519936815.4592.25.camel@au1.ibm.com> <20180301205315.GJ19007@ziepe.ca> <1519942012.4592.31.camel@au1.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.5 (3.26.5-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-03-01 at 14:31 -0800, Linus Torvalds wrote: > On Thu, Mar 1, 2018 at 2:06 PM, Benjamin Herrenschmidt wrote: > > > > Could be that x86 has the smarts to do the right thing, still trying to > > untangle the code :-) > > Afaik, x86 will not cache PCI unless the system is misconfigured, and > even then it's more likely to just raise a machine check exception > than cache things. > > The last-level cache is going to do fills and spills directly to the > memory controller, not to the PCIe side of things. > > (I guess you *can* do things differently, and I wouldn't be surprised > if some people inside Intel did try to do things differently with > trying nvram over PCIe, but in general I think the above is true) > > You won't find it in the kernel code either. It's in hardware with > firmware configuration of what addresses are mapped to the memory > controllers (and _how_ they are mapped) and which are not. Ah thanks ! Thanks explains. We can fix that on ppc64 in our linear mapping code by checking the address vs. memblocks to chose the right page table attributes. So the main problem on our side is to figure out the problem of too big PFNs. I need to look at this with Aneesh, we might be able to make things fit with a bit of wrangling. > You _might_ find it in the BIOS, assuming you understood the tables > and had the BIOS writer's guide to unravel the magic registers. > > But you might not even find it there. Some of the memory unit timing > programming is done very early, and by code that Intel doesn't even > release to the BIOS writers except as a magic encrypted blob, afaik. > Some of the magic might even be in microcode. > > The page table settings for cacheability are more like a hint, and > only _part_ of the whole picture. The memory type range registers are > another part. And magic low-level uarch, northbridge and memory unit > specific magic is yet another part. > > So you can disable caching for memory, but I'm pretty sure you can't > enable caching for PCIe at least in the common case. At best you can > affect how the store buffer works for PCIe.