Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3808617rdb; Thu, 14 Sep 2023 03:24:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGRa8rdibbFEVSOhztuxaEn8yros+jxIVnR81QqMd3ktPBQdJqL3hCeUHKg9oQNRyyaYIAJ X-Received: by 2002:a17:90a:d0c1:b0:268:1be1:745a with SMTP id y1-20020a17090ad0c100b002681be1745amr4321529pjw.29.1694687066827; Thu, 14 Sep 2023 03:24:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694687066; cv=none; d=google.com; s=arc-20160816; b=TosMedzPGSkuhXO19ywDu5QhpNnfhhrH5/5Ub9YYCUuOIS929YImBox54prpA8C/W5 z80FlpuK0dh7pAHLnYvF5AjEWHld002isej/WhB6/MmqFbeU7PF8/equsLzHmFuSGI0Q 76mHv5xUmnJ0keYnSdmIdG8kd9+flRMvDwfr5DTzcmgZU6g648OLdNL7Mo6UfdA0dSg2 W6oNZQ2Emw4D6+pLqyUllHlrPfNRZ1xCjEwQwAtE+vM8Jt6OVUuthT1+7PmMQU7s66VG XzCsxwVb6BSfz81Fg+Z/l0Xs9cZPFk7gwKsHeoqvynnbl7smMdqV6uIwx8DSuYo6riih u0AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=o3ncWp87Xrx57ugLKytKkEWMncW9euHhZSqjHZ5fZmU=; fh=61hsfVoef5Tbbo+Rm06/Hxsz4fAtyORDF8Po5ZVRZDI=; b=aACPwP8lzR/04TTD3tUZiANEer69eajTSUVDYE38BXIVZGXo+RMte2KGI8TihutYBm M/nBCkFdnAF7QUCu15YIFzWHuslIo48WZf5DypxnLwqsxWzM7r6+oYgUuu0ojIqCrQkj VCMNn0C1VKZWV36GkW7msWeJHTyXzodF8jnC+7CBaWRvuPsbDIff3+Wns1zBy8ZRhn1H NJurZTWnB2rSG9f1sf2e/GitnqtxDWmBUs9VNhaH76tw9HD4DpDn3QGsAv96gl49Dqvn uHtZhovEx1ibLerJNXdnVgzqJUC7YkMm7YKSKNht9isJx1ZuWUixvfS7y5Y7epqvbE/5 YbDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=xrpTQCRO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id f7-20020a17090ab94700b0026b47b87042si1412413pjw.35.2023.09.14.03.24.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 03:24:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=xrpTQCRO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id BB57580202E4; Wed, 13 Sep 2023 18:57:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233932AbjINB5J (ORCPT + 99 others); Wed, 13 Sep 2023 21:57:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233914AbjINB4K (ORCPT ); Wed, 13 Sep 2023 21:56:10 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A74AB2132 for ; Wed, 13 Sep 2023 18:55:50 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1c08a15fcf4so3864235ad.3 for ; Wed, 13 Sep 2023 18:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694656550; x=1695261350; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=o3ncWp87Xrx57ugLKytKkEWMncW9euHhZSqjHZ5fZmU=; b=xrpTQCRO188eVYxzxsXhqDjyMVpuiKL83cjmt932352HL+pIlfNJ1nk0b88uIhVN35 UA1yQEFw/KF9HYJpLrtTO02A+8GcoTvd/bTjl7s5Sjy3G3znJOF6Lx+19e7v3HIfkcMt QGTfpF+f3VdBzhzpGVhf4kCcySHDohbUK7UoIldLrlMN6gqwFDO+IdrOsLwzMDAC4d0u q4ZS+19Pe+3ITSyYjlwxYOP8MVoG2v1zX/XL2JPP9zUAKKfIsS0IwCkVkICmj+858FGX VeHwMIycfUHw9+D6UMpAk/lnePkT03YmBkI5FUDf6hPDvMFR+DF07nGakSRhLOwgpIk5 9Mmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694656550; x=1695261350; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=o3ncWp87Xrx57ugLKytKkEWMncW9euHhZSqjHZ5fZmU=; b=DuzGuMW7ObjB1OxahBZrUfMAPGql7bk4eWO6N8sqAbMnJ55v5eoUPe9LLpMshOQzQr PC68JdSQxU8N1G8F4A/9Y3K+jJcdWQITebAt/03rUACb/u4UriV0sOum3zkq2UHb+5m9 LfhI7QjmeMMIB7sJIyr44QeXgTmg4zr6NkOS6PDm+xLOaCzDKEkugkMGPB730bslw6+v sYEmmaD7b6CCxFuefiwFwQs7CoQbwu12dk9Vq+nfs+n5CGLJt1UWknlAk7Aqt3uim/dA cj2jX+EZQ8LL1BqsEJZeRgyEdhgM6aNOTFF27OvrPmyJqjlsjqR45GfY3ck+yigXhKU+ WzVA== X-Gm-Message-State: AOJu0YxRxLczRE6sawApQMVqIuTcM9Wz9246dzl/mZ6YjphVrLf+B2VG f2gZHfaPXveYtO0pvWTYBohfEM9GibY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:6bca:b0:1b8:80c9:a98e with SMTP id m10-20020a1709026bca00b001b880c9a98emr149786plt.13.1694656550053; Wed, 13 Sep 2023 18:55:50 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 13 Sep 2023 18:55:05 -0700 In-Reply-To: <20230914015531.1419405-1-seanjc@google.com> Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230914015531.1419405-8-seanjc@google.com> Subject: [RFC PATCH v12 07/33] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 13 Sep 2023 18:57:38 -0700 (PDT) X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email From: Chao Peng Add a new KVM exit type to allow userspace to handle memory faults that KVM cannot resolve, but that userspace *may* be able to handle (without terminating the guest). KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit conversions between private and shared memory. With guest private memory, there will be two kind of memory conversions: - explicit conversion: happens when the guest explicitly calls into KVM to map a range (as private or shared) - implicit conversion: happens when the guest attempts to access a gfn that is configured in the "wrong" state (private vs. shared) On x86 (first architecture to support guest private memory), explicit conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE, but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable as there is (obviously) no hypercall, and there is no guarantee that the guest actually intends to convert between private and shared, i.e. what KVM thinks is an implicit conversion "request" could actually be the result of a guest code bug. KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to be implicit conversions. Place "struct memory_fault" in a second anonymous union so that filling memory_fault doesn't clobber state from other yet-to-be-fulfilled exits, and to provide additional information if KVM does NOT ultimately exit to userspace with KVM_EXIT_MEMORY_FAULT, e.g. if KVM suppresses (or worse, loses) the exit, as KVM often suppresses exits for memory failures that occur when accessing paravirt data structures. The initial usage for private memory will be all-or-nothing, but other features such as the proposed "userfault on missing mappings" support will use KVM_EXIT_MEMORY_FAULT for potentially _all_ guest memory accesses, i.e. will run afoul of KVM's various quirks. Use bit 3 for flagging private memory so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace needs such information. Note! To allow for future possibilities where KVM reports KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's perspective), not '0'! Due to historical baggage within KVM, exiting to userspace with '0' from deep callstacks, e.g. in emulation paths, is infeasible as doing so would require a near-complete overhaul of KVM, whereas KVM already propagates -errno return codes to userspace even when the -errno originated in a low level helper. Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com Cc: Anish Moorthy Suggested-by: Sean Christopherson Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/api.rst | 24 ++++++++++++++++++++++++ include/linux/kvm_host.h | 15 +++++++++++++++ include/uapi/linux/kvm.h | 24 ++++++++++++++++++++++++ 3 files changed, 63 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 21a7578142a1..e28a13439a95 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6702,6 +6702,30 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_MEMORY_FAULT */ + struct { + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) + __u64 flags; + __u64 gpa; + __u64 size; + } memory; + +KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that +could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the +guest physical address range [gpa, gpa + size) of the fault. The 'flags' field +describes properties of the faulting access that are likely pertinent: + + - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred + on a private memory access. When clear, indicates the fault occurred on a + shared access. + +Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it +accompanies a return code of '-1', not '0'! errno will always be set to EFAULT +or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume +kvm_run.exit_reason is stale/undefined for all other error numbers. + :: /* KVM_EXIT_NOTIFY */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4e741ff27af3..d8c6ce6c8211 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2327,4 +2327,19 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + gpa_t gpa, gpa_t size, + bool is_write, bool is_exec, + bool is_private) +{ + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + vcpu->run->memory_fault.gpa = gpa; + vcpu->run->memory_fault.size = size; + + /* RWX flags are not (yet) defined or communicated to userspace. */ + vcpu->run->memory_fault.flags = 0; + if (is_private) + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index bd1abe067f28..d2d913acf0df 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -274,6 +274,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -541,6 +542,29 @@ struct kvm_run { struct kvm_sync_regs regs; char padding[SYNC_REGS_SIZE_BYTES]; } s; + + /* + * This second exit union holds structs for exit types which may be + * triggered after KVM has already initiated a different exit, or which + * may be ultimately dropped by KVM. + * + * For example, because of limitations in KVM's uAPI, KVM x86 can + * generate a memory fault exit an MMIO exit is initiated (exit_reason + * and kvm_run.mmio are filled). And conversely, KVM often disables + * paravirt features if a memory fault occurs when accessing paravirt + * data instead of reporting the error to userspace. + */ + union { + /* KVM_EXIT_MEMORY_FAULT */ + struct { +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) + __u64 flags; + __u64 gpa; + __u64 size; + } memory_fault; + /* Fix the size of the union. */ + char padding2[256]; + }; }; /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ -- 2.42.0.283.g2d96d420d3-goog