Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1061181pxy; Sat, 1 May 2021 02:33:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz1vxyHNbBAAmO3OOPmprXWhgletAe9yJrM4SUKp3TZSA8JR9ITjJJuh0PGL+5OCddbPjSd X-Received: by 2002:aa7:cf06:: with SMTP id a6mr10471588edy.340.1619861611048; Sat, 01 May 2021 02:33:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619861611; cv=none; d=google.com; s=arc-20160816; b=Hd56aL4tXyflJMKpYGI/1M2woIJCaCObcVrCefWLznlQZc6brCjbK8sdRzj5+AOUM6 G4k3p487SoCHI/ngqhR5v8wiEmJuHPu9o1U9SaYBDl3wd75IxMCkqd1sLa3TLnuMluC7 OMcIYI121uBjql3HR7HtdTSwXox0Ps9pLBxyASi0dlweEZCpH4HM4piRdQyARhGqTj09 IbH0TalpCU6gPtTuTEg7DjGBrf20/nKfGOAvKacDUYGNT1kGQ6NrTLdZE9abjqXkf34S 6rcyXJWZqRvzA1nm27ib8onLEmAbhK4zSXqQG7hIqqRZYUfdOvpZjvo/9FiXTPV1Wjkz ZymQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date; bh=CyNV0WSy5NXM04G94cfMXyMT96Ky6dWOdbPnp7GPL84=; b=pAddGkoTYHVx6rRHL0Gy3GnrmPO13/q6Ir9kANhAqyG665UDouYvCWk2ksnt2WpvLL C73Nkh/M3V3Z/YBReDTpnb+9ZmSAAeCDtElj150Pj2UviApSnfL7/0z5JwRBrHXj+Aa+ kH/DXlE+CsKJzyIAqWuD85/Gftnvm+sWeFyDzaOkihg3cuwAEGyG5y3kqlw4GMTsba4t Ye3jl2nhVDZGZ67H5b4qT/I2psF2rxqtFri/OsBDe4sfcR9hW1FyB5goIAk1/cr3gt+F 0M6fpZVBTAfIeGMHtcrHsU15BvrYbFGkAWml7wL+dpYQBn/lCJRHwuRyaqcts/WpgWqT bGUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y16si4520903edc.193.2021.05.01.02.32.55; Sat, 01 May 2021 02:33:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231833AbhEAJbQ convert rfc822-to-8bit (ORCPT + 99 others); Sat, 1 May 2021 05:31:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:32888 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231713AbhEAJbP (ORCPT ); Sat, 1 May 2021 05:31:15 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 32D206141E; Sat, 1 May 2021 09:30:26 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lclx9-00AJhV-T1; Sat, 01 May 2021 10:30:24 +0100 Date: Sat, 01 May 2021 10:30:22 +0100 Message-ID: <87eeeqvm1d.wl-maz@kernel.org> From: Marc Zyngier To: Vikram Sethi Cc: Shanker Donthineni , Alex Williamson , Will Deacon , Catalin Marinas , Christoffer Dall , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.cs.columbia.edu" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , Jason Sequeira Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA In-Reply-To: References: <20210429162906.32742-1-sdonthineni@nvidia.com> <20210429162906.32742-2-sdonthineni@nvidia.com> <20210429122840.4f98f78e@redhat.com> <470360a7-0242-9ae5-816f-13608f957bf6@nvidia.com> <20210429134659.321a5c3c@redhat.com> <87czucngdc.wl-maz@kernel.org> <1edb2c4e-23f0-5730-245b-fc6d289951e1@nvidia.com> <878s4zokll.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: vsethi@nvidia.com, sdonthineni@nvidia.com, alex.williamson@redhat.com, will@kernel.org, catalin.marinas@arm.com, christoffer.dall@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jsequeira@nvidia.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vikram, On Fri, 30 Apr 2021 17:57:14 +0100, Vikram Sethi wrote: > > Hi Marc, > > > -----Original Message----- > > From: Marc Zyngier > > Sent: Friday, April 30, 2021 10:31 AM > > On Fri, 30 Apr 2021 15:58:14 +0100, > > Shanker R Donthineni wrote: > > > > > > Hi Marc, > > > > > > On 4/30/21 6:47 AM, Marc Zyngier wrote: > > > > > > > >>>> We've two concerns here: > > > >>>> - Performance impacts for pass-through devices. > > > >>>> - The definition of ioremap_wc() function doesn't match the > > > >>>> host kernel on ARM64 > > > >>> Performance I can understand, but I think you're also using it to > > > >>> mask a driver bug which should be resolved first. Thank > > > >> We’ve already instrumented the driver code and found the code path > > > >> for the unaligned accesses. We’ll fix this issue if it’s not > > > >> following WC semantics. > > > >> > > > >> Fixing the performance concern will be under KVM stage-2 page-table > > > >> control. We're looking for a guidance/solution for updating stage-2 > > > >> PTE based on PCI-BAR attribute. > > > > Before we start discussing the *how*, I'd like to clearly understand > > > > what *arm64* memory attributes you are relying on. We already have > > > > established that the unaligned access was a bug, which was the > > > > biggest argument in favour of NORMAL_NC. What are the other > > requirements? > > > Sorry, my earlier response was not complete... > > > > > > ARMv8 architecture has two features Gathering and Reorder > > > transactions, very important from a performance point of view. Small > > > inline packets for NIC cards and accesses to GPU's frame buffer are > > > CPU-bound operations. We want to take advantages of GRE features to > > > achieve higher performance. > > > > > > Both these features are disabled for prefetchable BARs in VM because > > > memory-type MT_DEVICE_nGnRE enforced in stage-2. > > > > Right, so Normal_NC is a red herring, and it is Device_GRE that > > you really are after, right? > > > I think Device GRE has some practical problems. > 1. A lot of userspace code which is used to getting write combined > mappings to GPU memory from kernel drivers does memcpy/memset on it > which can insert ldp/stp which can crash on Device Memory Type. From > a quick search I didn't find a memcpy_io or memset_io in > glibc. Perhaps there are some other functions available, but a lot > of userspace applications that work on x86 and ARM baremetal won't > work on ARM VMs without such changes. Changes to all of userspace > may not always be practical, specially if linking to binaries This seems to go against what Alex was hinting at earlier, which is that unaligned accesses were not expected on prefetchable regions, and Shanker latter confirming that it was an actual bug. Where do we stand here? > > 2. Sometimes even if application is not using memset/memcpy directly, > gcc may insert a builtin memcpy/memset. > > 3. Recompiling all applications with gcc -m strict-align has > performance issues. In our experiments that resulted in an increase > in code size, and also 3-5% performance decrease reliably. Also, it > is not always practical to recompile all of userspace, depending on > who owns the code/linked binaries etc. > > From KVM-ARM point of view, what is it about Normal NC at stage 2 > for Prefetchable BAR (however KVM gets the hint, whether from > userspace or VMA) that is undesirable vs Device GRE? I couldn't > think of a difference to devices whether the combining or > prefetching or reordering happened because of one or the other. The problem I see is that we have VM and userspace being written in terms of Write-Combine, which is: - loosely defined even on x86 - subject to interpretations in the way it maps to PCI - has no direct equivalent in the ARMv8 collection of memory attributes (and Normal_NC comes with speculation capabilities which strikes me as extremely undesirable on arbitrary devices) How do we translate this into something consistent? I'd like to see an actual description of what we *really* expect from WC on prefetchable PCI regions, turn that into a documented definition agreed across architectures, and then we can look at implementing it with one memory type or another on arm64. Because once we expose that memory type at S2 for KVM guests, it becomes ABI and there is no turning back. So I want to get it right once and for all. Thanks, M. -- Without deviation from the norm, progress is not possible.