Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp745434ybt; Fri, 26 Jun 2020 10:29:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzOilO8b04B7k140DJdu2fXRdDFMLAlIcNQynec01WcFUaVdqMVlZ124ZJq1WI/hS+OMEpI X-Received: by 2002:a17:906:3850:: with SMTP id w16mr1080442ejc.205.1593192578381; Fri, 26 Jun 2020 10:29:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593192578; cv=none; d=google.com; s=arc-20160816; b=El/LlYdWsCb9psWlBsdLJ6+LCtVc+WcE1P15cwPh9I3p13oGSunXHoKfJy5TFtSYVs ZtoEe8fVckXUW6NdSX9ycPQYJT06vAzjuEdWyHX7VuQhIMaqsSLlsJOB/Ry+cgOIwvtL 2WY/LXLzCPIziMd07ztsP/Mf/YUj6NnN55bjnMwhmZZjwEZ929SFzoViYS1klwJq/1zf VeG41hQPwSBJQRvSkoYl+UmEhVHwoSX+x8/AgwlOkZYAyvNYPxg0UuKNjJD3bLbQ2q2b LC2Ik2RyvwEm30LvonIhhugKO1aIXG/l9PuD/MXqeCnucZS+JoDugWxkyd5i9DlgjhpP 949A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=JeNCxhRXMQALqZo7wICk69xjFfUhgOlGUz3dw+pfWwE=; b=gH535vyWvjNcqQVKh1d0woa8JmEy27OsRqwWCfxAur+cbZDlcSToB7THgOzCEk49+9 5qYdnNGO10n15ocjI2FB+/GKLaf4fntM6LU7ZHZFNAW0/YLJCXBwoIwzQfhFyr+CFxhD s80xWYVgunudEjpqpVVgOtacC7xuUIFimIFhsGTWw6MGIxbwxoPpzBz9MzsQpQRPBSX1 TKCqr1iVCS8uf9cXDo1vA399tdaYS6wuRArXPzqVvxflAJ3blhWi+oRHp/nyx/PCGRl+ b5T3qJRfozSgAvwnjRX62UuqgcZU7dGJwXlS4RLI+KG3FCLzZo5tusVDKufpQstcLAbx FTbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l7si5437458edv.570.2020.06.26.10.29.15; Fri, 26 Jun 2020 10:29:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727095AbgFZRYn (ORCPT + 99 others); Fri, 26 Jun 2020 13:24:43 -0400 Received: from foss.arm.com ([217.140.110.172]:40306 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725793AbgFZRYj (ORCPT ); Fri, 26 Jun 2020 13:24:39 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 93FDB1FB; Fri, 26 Jun 2020 10:24:38 -0700 (PDT) Received: from [192.168.0.14] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F35653F71E; Fri, 26 Jun 2020 10:24:36 -0700 (PDT) Subject: Re: [RFC PATCH 0/2] MTE support for KVM guest To: Catalin Marinas , Steven Price Cc: Marc Zyngier , Will Deacon , Julien Thierry , Suzuki K Poulose , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Dave Martin , Mark Rutland , Thomas Gleixner , Peter Maydell References: <20200617123844.29960-1-steven.price@arm.com> <20200623174807.GD5180@gaia> <20200624142131.GA27945@gaia> <66ed0732-17ee-8f5a-44af-31ab768d845f@arm.com> <20200624161954.GC27945@gaia> From: James Morse Message-ID: Date: Fri, 26 Jun 2020 18:24:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20200624161954.GC27945@gaia> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi guys, On 24/06/2020 17:24, Catalin Marinas wrote: > On Wed, Jun 24, 2020 at 03:59:35PM +0100, Steven Price wrote: >> On 24/06/2020 15:21, Catalin Marinas wrote: >>> On Wed, Jun 24, 2020 at 12:16:28PM +0100, Steven Price wrote: >>>> On 23/06/2020 18:48, Catalin Marinas wrote: >>>>> This causes potential issues since we can't guarantee that all the >>>>> Cacheable memory slots allocated by the VMM support MTE. If they do not, >>>>> the arch behaviour is "unpredictable". We also can't trust the guest to >>>>> not enable MTE on such Cacheable mappings. >>>> >>>> Architecturally it seems dodgy to export any address that isn't "normal >>>> memory" (i.e. with tag storage) to the guest as Normal Cacheable. Although >>>> I'm a bit worried this might cause a regression in some existing case. >>> >>> What I had in mind is some persistent memory that may be given to the >>> guest for direct access. This is allowed to be cacheable (write-back) >>> but may not have tag storage. >> >> At the moment we don't have a good idea what would happen if/when the guest >> (or host) attempts to use that memory as tagged. If we have a relatively >> safe hardware behaviour (e.g. the tags are silently dropped/read-as-zero) >> then that's not a big issue. But if the accesses cause some form of abort >> then we need to understand how that would be handled. > > The architecture is not prescriptive here, the behaviour is > "unpredictable". It could mean tags read-as-zero/write-ignored or an > SError. This surely is the same as treating a VFIO device as memory and performing some unsupported operation on it. I thought the DT 'which memory ranges' description for MTE was removed. Wouldn't the rules for a guest be the same? If you enable MTE, everything described as memory must support MTE. Something like persistent memory then can't be described as memory, ... we have the same problem on the host. >>>>> 1. As in your current patches, assume any Cacheable at Stage 2 can have >>>>> MTE enabled at Stage 1. In addition, we need to check whether the >>>>> physical memory supports MTE and it could be something simple like >>>>> pfn_valid(). Is there a way to reject a memory slot passed by the >>>>> VMM? >>>> >>>> Yes pfn_valid() should have been in there. At the moment pfn_to_page() is >>>> called without any checks. >>>> >>>> The problem with attempting to reject a memory slot is that the memory >>>> backing that slot can change. So checking at the time the slot is created >>>> isn't enough (although it might be a useful error checking feature). >>> >>> But isn't the slot changed as a result of another VMM call? So we could >>> always have such check in place. >> >> Once you have created a memslot the guest's view of memory follows the user >> space's address space. This is the KVM_CAP_SYNC_MMU capability. So there's >> nothing stopping a VMM adding a memslot backed with perfectly reasonable >> memory then mmap()ing over the top of it some memory which isn't MTE >> compatible. KVM gets told the memory is being removed (via mmu notifiers) >> but I think it waits for the next fault before (re)creating the stage 2 >> entries. (indeed, stage2 is pretty lazy) > OK, so that's where we could kill the guest if the VMM doesn't play > nicely. It means that we need the check when setting up the stage 2 > entry. I guess it's fine if we only have the check at that point and > ignore it on KVM_SET_USER_MEMORY_REGION. It would be nice if we returned > on error on slot setup but > we may not know (yet) whether the VMM intends to enable MTE for the guest. We don't. Memory slots take the VM-fd, whereas the easy-to-add feature bits are per-vcpu. Packing features into the 'type' that create-vm takes is a problem once we run out, although the existing user is the IPA space size, and MTE is a property of the memory system. The meaning of the flag is then "I described this as memory, only let the guest access memory through this range that is MTE capable". What do we do when that is violated? Tell the VMM is the nicest, but its not something we ever expect to happen. I guess an abort is what real hardware would do, (if firmware magically turned off MTE while it was in use). This would need to be kvm's inject_abt64(), as otherwise the vcpu may take the stage2 fault again, forever. For kvm_set_spte_hva() we can't inject an abort (which vcpu?), so not mapping the page and waiting for the guest to access it is the only option... Thanks, James