Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2056722rdb; Tue, 3 Oct 2023 09:00:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGRo0ardb5O6e2PH4R0Gj85Yl4GUa3mCMboYMNNM5LWnoFD7IZdSJswKvurPUMfHsv7bqWE X-Received: by 2002:a17:902:e5c1:b0:1c6:777:712c with SMTP id u1-20020a170902e5c100b001c60777712cmr2953plf.51.1696348801325; Tue, 03 Oct 2023 09:00:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696348801; cv=none; d=google.com; s=arc-20160816; b=gX831M8g0xi+PrMDO46mHPCoPPq9hPKsgA9PW8MrkMVRgJJ2GBXT1wQfqwBZHN2BmO QLPTAL9Lhzqn8aZkorcZ4UJk6dMOlRD61ehcpPe6ajrbrTH1grG+WDGWiUigEDkCVVUD nTOy58Ysm2J8DoXQBwq1mGPgrrZXNaxzU0OGNDgE2TFYOXzUI6a+SNGJb4kearmOrFf7 6fifGHFInoOUbWQ4rvY65BxqLRI+fdQaknGJYFbuTfIrWwcZni/7qK/OLm6M2y38qVMc u+00Hf0j4DxrAA80Z2HhzheWkpd7Otx4Hl1/pN2SY+84gQinmCzFDwgqCKt2ZSnXpWWi dMMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; fh=yLtTN0oo4ApYgkMPqx5JpYBd7GcgJV58XMgxiBnEYns=; b=Yf0cWDnOt6EseAFU7XkAyhJqoyN+TY7biduqle1qL4NNccCwiacmIDpt+kbyqfBZOA OdzVkVWvCS/56vFn9ybjbfTWAFS/H5uhT+mfsviXXiaX0OmXk8R+harNjd+E16Ra2E0E kKY2H2S1QyXreSNojmQcr8JxqT3eo6T1loYgvchQx2zrf34XVINEhgIcNoTIPEsE2AWd 1R9BlETA1a6ryt1HoBGhz+TqkAR5zTyIRSxNkXJTJCUqzOrMc5+YORaajES4SYqont4D q9w1mosWj9XGzzgfSlzeYLEmDLmYt1wxxoVJT/b9xwps8I8L+4OR20l9bDhS7NJIX4No /pLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3VHfewwI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id u7-20020a170902e5c700b001bf1973eaeasi1840851plf.577.2023.10.03.09.00.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 09:00:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3VHfewwI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 6696C8108BEE; Tue, 3 Oct 2023 08:59:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239961AbjJCP7h (ORCPT + 99 others); Tue, 3 Oct 2023 11:59:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231669AbjJCP7f (ORCPT ); Tue, 3 Oct 2023 11:59:35 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF2DAD for ; Tue, 3 Oct 2023 08:59:29 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a2318df875so14997327b3.2 for ; Tue, 03 Oct 2023 08:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696348769; x=1696953569; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=3VHfewwIV0zOljG/ol2pjU+0vBAWkOLQRPOSIDg5brqaS+50EHIiCl+ps7d84LasCC jz7EfaUe1TP4lksfzPRXjmqvTy2qhmWi6rE0bUp0X2soN22L9f7Tpzv33HVo7nNh1DpK xVGLumlTXW1AIwOdhs8KaR033H4JRtK7qJDnzZJbOZVOAIuVgRgMfBtgHnv9p3YdnuwS poxsb3AIswlAkEV08xxGKdQ+PCQ4u8RqOWnx7mo22z94u7AEoRP+Ocm9CDkjVMbyosWx CaUYYlWPcDj5t6s+bBhCEmtwG+DQmiZ1F93IWvadR9nnbUGaPS4AG69AHcpqDEFwXAXk PMAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696348769; x=1696953569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=e9UF62hfZwNW8ZgLPfIAKZjlhZybOinbcMUR4RG2hdSXPMhC/MtumzMOA6wC11Hp6A 8DfaACSHT9Yp47TRatPE1pwMYnHAVPrgO2VHDHiHo8IYK4Quk2HfvjHISG6K30CBDy1B fwFTWivW3dfbp02tZH21ASbGvdXaPBZdNIcJ+NptJUEXUshDaufhSVq2DPTTwxDRg6Ei Gvgf+HbiDy4pUkflfWF0CGTGDSHARCWoF/m9m28/JK6/003yqXJqtiTWs21Qbr6+V4gx DTu1QXF466ata+zpIUzYRbt3SyQPjwU5A+eIZevYz+r2D4gx21pY+UE253QZgJJ16ZjS lobw== X-Gm-Message-State: AOJu0YwtCmoZq2Amrmud7Ai5Z206WbI+NKdrKYpJLlnsTi7acLV1mq0D u5I4O10NLFk+YbD1DOALUiyRw6Go6qk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:3604:0:b0:d7a:c85c:725b with SMTP id d4-20020a253604000000b00d7ac85c725bmr227114yba.7.1696348768967; Tue, 03 Oct 2023 08:59:28 -0700 (PDT) Date: Tue, 3 Oct 2023 08:59:27 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 03 Oct 2023 08:59:52 -0700 (PDT) On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI.