Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp1078464rdb; Wed, 1 Nov 2023 10:37:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEiut4QgVyyijneRsUsFhCmiuewxwfSQjB0srYOPCcHbGPerv/olThLCV8xUQeDalL1lOEg X-Received: by 2002:a17:903:2985:b0:1cc:492c:291b with SMTP id lm5-20020a170903298500b001cc492c291bmr11153355plb.36.1698860239336; Wed, 01 Nov 2023 10:37:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698860239; cv=none; d=google.com; s=arc-20160816; b=Ia96F2VmBSUlzal5qoTmjLVMTO6t/PDJ/wKxlqqEevfbFlqEEBHSelZwjzykK6MmrE JFvJUj/Yx4uTHIYZnCPOXDuR3FdUyFUDTatPYC9x35HiRkmjjHr5FivIVH070MYM5OE0 qtp9LV0NbiDWIOtbULzKTZpNvtlIFXgSDBWviiC6eD06YWCNVql6Fig0Z1t3MkOoFKGe jOvaGKaFQ9ZEh13cx+FmFkEKgB2ALG+kV27X6f6NA7mOTpJMtZGkayKcKS3aJpYS9u1f MPnWeEACn+Vw3L+66LMpdOPPIUg4WzVYX1je3Vz28QgWHqOAoAzJCCn9GTrXkm7T5fnU QAng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Aa5pPv4/g+I3C3eNDgudVhrFstr6S8Xmtwrf0OlljDI=; fh=9VryYmo3TIQsUvGz1X01csZgJo1voRfr2/QgQa1jG2o=; b=xtReRpRX6MIvKWbiu7QXe9RrRJVSQIo2ATje3ObmDyN6CMY+Ph+Tp7fPwOtL2pSOdU JGzPDBKZVMjzq/imHGxpOSKJoAj3k7Mra/8sNZ0d4iF3QmEFcevz8vw9Ct3zS+QHw+Uy u3he4SHD6i9nOXU+ylVhi7VcBMnKm6k8C9IrW72ZEzdHR4qWPzyryeXl3IFbbO4dCHUk 2LI13kUQkM5fJg7USueq6GPSkzUpBMd8m0o8lTgHpF9jmfc5YtC9HeWFFRfS4JvQOH9K ZJsHiEiDr07R+TeM0Tl9xH2SxBIAO1B02jImJc8Tg56XbOvUou/ljYMJAO6Yz/2CRrJ8 3Rkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=1G73i3bp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id g2-20020a170902e38200b001ca4ad8635esi3285008ple.240.2023.11.01.10.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 10:37:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=1G73i3bp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E986A80B1FBE; Wed, 1 Nov 2023 10:37:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344449AbjKARg6 (ORCPT + 99 others); Wed, 1 Nov 2023 13:36:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231233AbjKARgx (ORCPT ); Wed, 1 Nov 2023 13:36:53 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FAB3ED for ; Wed, 1 Nov 2023 10:36:50 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da0cb98f66cso6455442276.2 for ; Wed, 01 Nov 2023 10:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698860209; x=1699465009; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Aa5pPv4/g+I3C3eNDgudVhrFstr6S8Xmtwrf0OlljDI=; b=1G73i3bpqP3lEIiYf1Xa+PqF+ry3FPPmHtzhkTz2ESLqKv/PJo83UzY0L3tGLSLtgw FEf6apQxu3ajl2eaf2jpMXM8oiMYlwvL+Gaj7CmFke3L5WVAzOh6QUSsRjnNhhvyggQP /0obTqP8SXrHs5tm5Lcs1IoffSqT4ZAyvwPXIthiiQ2/DYs3pRsDS0oN02OgNcKyJmxV +il1EDZ1G9lZ1tby+sgxcieBPfodwSlNZ7rJ6OOgSa83N+qrWIIICrK86C1Ay71kdg1m h7J1Ii0mSVbVxFFpS5Ul/5maHoLcclIuFP3aDRfEoQUS+LZ7DrcFMxeDrV3szI4QTAZm WoHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698860209; x=1699465009; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aa5pPv4/g+I3C3eNDgudVhrFstr6S8Xmtwrf0OlljDI=; b=LfTsGkMi5PmCHna65jac9dWx9BrKyUij2qKCviqvOkmFfpEqY8QTDwiobwM82yX855 9ZyKbFUpHpzQv+cK3et1iSN3N01Io7Sn7GRU8+05zME7MqchJZkVxpo+YtCFjjogGBCF X1DO/YeYK2VX40+NTWoezUE25UQlKfUEkFdq5Afz3n7/AAcoR4P+xLvThwbI5i86I3SD ttBhM7iK5G32ul12MLRpnwhWvOAbFe44dSXVYiCDrLFWElYVeHI+D7yIfz6CGyLp49Cn Gfb8zfLlh2ksT5UWo7IyOqbDTB5qoUJHKz7075yRuVPPTmPEjACUmXG31pwpoMPpl4Mj pt7Q== X-Gm-Message-State: AOJu0Yyw+Q1oyumXPR1w+PDiwUKK2LwagJB+71w1TPSW+qKE2iorUyLa TUm5b/J85iZ7QT8liCYiwE/gl5+tbuA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:168c:b0:da0:3e46:8ba5 with SMTP id bx12-20020a056902168c00b00da03e468ba5mr304342ybb.8.1698860209654; Wed, 01 Nov 2023 10:36:49 -0700 (PDT) Date: Wed, 1 Nov 2023 10:36:48 -0700 In-Reply-To: <482bfea6f54ea1bb7d1ad75e03541d0ba0e5be6f.camel@intel.com> Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-10-seanjc@google.com> <482bfea6f54ea1bb7d1ad75e03541d0ba0e5be6f.camel@intel.com> Message-ID: Subject: Re: [PATCH v13 09/35] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace From: Sean Christopherson To: Kai Huang Cc: "viro@zeniv.linux.org.uk" , "aou@eecs.berkeley.edu" , "brauner@kernel.org" , "oliver.upton@linux.dev" , "chenhuacai@kernel.org" , "paul.walmsley@sifive.com" , "palmer@dabbelt.com" , "maz@kernel.org" , "pbonzini@redhat.com" , "mpe@ellerman.id.au" , "willy@infradead.org" , "anup@brainfault.org" , "akpm@linux-foundation.org" , Xiaoyao Li , "kvm-riscv@lists.infradead.org" , "mic@digikod.net" , "liam.merwick@oracle.com" , "kvm@vger.kernel.org" , Isaku Yamahata , "kirill.shutemov@linux.intel.com" , "david@redhat.com" , "tabba@google.com" , "amoorthy@google.com" , "linuxppc-dev@lists.ozlabs.org" , "michael.roth@amd.com" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-riscv@lists.infradead.org" , "chao.p.peng@linux.intel.com" , "linux-mips@vger.kernel.org" , Vishal Annapurve , "vbabka@suse.cz" , "mail@maciej.szmigiero.name" , "yu.c.zhang@linux.intel.com" , "qperret@google.com" , "dmatlack@google.com" , Yilun Xu , "isaku.yamahata@gmail.com" , "ackerleytng@google.com" , "jarkko@kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-mm@kvack.org" , Wei W Wang Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 01 Nov 2023 10:37:09 -0700 (PDT) On Wed, Nov 01, 2023, Kai Huang wrote: > > > +7.34 KVM_CAP_MEMORY_FAULT_INFO > > +------------------------------ > > + > > +:Architectures: x86 > > +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. > > + > > +The presence of this capability indicates that KVM_RUN will fill > > +kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if > > +there is a valid memslot but no backing VMA for the corresponding host virtual > > +address. > > + > > +The information in kvm_run.memory_fault is valid if and only if KVM_RUN returns > > +an error with errno=EFAULT or errno=EHWPOISON *and* kvm_run.exit_reason is set > > +to KVM_EXIT_MEMORY_FAULT. > > IIUC returning -EFAULT or whatever -errno is sort of KVM internal > implementation. The errno that is returned to userspace is ABI. In KVM, it's a _very_ poorly defined ABI for the vast majority of ioctls(), but it's still technically ABI. KVM gets away with being cavalier with errno because the vast majority of errors are considered fatal by userespace, i.e. in most cases, userspace simply doesn't care about the exact errno. A good example is KVM_RUN with -EINTR; if KVM were to return something other than -EINTR on a pending signal or vcpu->run->immediate_exit, userspace would fall over. > Is it better to relax the validity of kvm_run.memory_fault when > KVM_RUN returns any -errno? Not unless there's a need to do so, and if there is then we can update the documentation accordingly. If KVM's ABI is that kvm_run.memory_fault is valid for any errno, then KVM would need to purge kvm_run.exit_reason super early in KVM_RUN, e.g. to prevent an -EINTR return due to immediate_exit from being misinterpreted as KVM_EXIT_MEMORY_FAULT. And purging exit_reason super early is subtly tricky because KVM's (again, poorly documented) ABI is that *some* exit reasons are preserved across KVM_RUN with vcpu->run->immediate_exit (or with a pending signal). https://lore.kernel.org/all/ZFFbwOXZ5uI%2Fgdaf@google.com > [...] > > > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -2327,4 +2327,15 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) > > /* Max number of entries allowed for each kvm dirty ring */ > > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > > > +static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, > > + gpa_t gpa, gpa_t size) > > +{ > > + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; > > + vcpu->run->memory_fault.gpa = gpa; > > + vcpu->run->memory_fault.size = size; > > + > > + /* Flags are not (yet) defined or communicated to userspace. */ > > + vcpu->run->memory_fault.flags = 0; > > +} > > + > > KVM_CAP_MEMORY_FAULT_INFO is x86 only, is it better to put this function to > ? I'd prefer to keep it in generic code, as it's highly likely to end up there sooner than later. There's a known use case for ARM (exit to userspace on missing userspace mapping[*]), and I'm guessing pKVM (also ARM) will also utilize this API. [*] https://lore.kernel.org/all/20230908222905.1321305-8-amoorthy@google.com