Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5761655pxb; Mon, 28 Mar 2022 17:23:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHsdmsQmD/SyIRhWHpqWCYO40yxLUB1mNx6MKP5mOdxyILg0LNuWDzmXhXnGgG+5Ce740C X-Received: by 2002:a05:6870:8188:b0:de:2141:90e4 with SMTP id k8-20020a056870818800b000de214190e4mr933944oae.178.1648513437022; Mon, 28 Mar 2022 17:23:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648513437; cv=none; d=google.com; s=arc-20160816; b=C6+Lk23l7lUypRj0ftroYD750RPVnwAqWUeJcwd5UD9cfFRkUIheB5HpKO6vCIandi n2FWL8AETRYZ+uVOm6VytwyUmE1PGzsNtBhciFB1QtjXmuaY+T8v537TGzkB1d/SmL4k Hoe2wXmlv/9TD8z8NFfBeNxUJx4pTeTXdlH+cG+ccR2RsD2WyMNLCCX8PA5TxYyORQh0 /km5r+N3VHaes010viDoMiGAH0Evqg7K0imZtU+OQPH8LK918bTPGqfIcoGKLoqlNTO8 s67nPhp7V4SRFJ2+hkAB/q0BWYO4LXGlA3mrfK25BUhv+H/2jPfRbhPpE/YLr9FAxMps M4ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=R5oHCy312DV02dgL2mbxc6XsFyRp35/nzV9uAHuDVL8=; b=e/fci+/ZnU0SUcfZuHXKdoqSlxk6LGLf6Nzic6+OWcqT+OdOkqiIboPxWq4+HVPZWz qpSRh//kDUJJs4oroGJL+9f53lbjotZfHEYucRn4l4rOgg653eYFW4I/GnILkNRy68QL iggztNzK1VTyk920VVhFTk4MFs857oxCpigExaf2R08AS8e9QIIxA1RkLNH8PHtvK5/b lrzHc4r+zN7EavsjP27CHu4uD8G4BawbSyrtgVTtWUPAw5lmwn+6GUWY72iigi+u5qeT yPe/xCrgSlHONav0BZOm24Mz29FXe7W9AWQBcJa9xPMDMA+9Vg1AUQLT2pRgdVFiI9pK /N2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VdeMGDhB; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id z1-20020a9d62c1000000b005cb2fc13877si11991221otk.243.2022.03.28.17.23.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 17:23:56 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VdeMGDhB; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A9E4BADD50; Mon, 28 Mar 2022 16:56:37 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231226AbiC1X54 (ORCPT + 99 others); Mon, 28 Mar 2022 19:57:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231213AbiC1X5y (ORCPT ); Mon, 28 Mar 2022 19:57:54 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF15548E46 for ; Mon, 28 Mar 2022 16:56:11 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id o3-20020a17090a3d4300b001c6bc749227so1043115pjf.1 for ; Mon, 28 Mar 2022 16:56:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=R5oHCy312DV02dgL2mbxc6XsFyRp35/nzV9uAHuDVL8=; b=VdeMGDhBvYtE+tVfMkspuND3/7DIoOMPUCVTDtTlEioK4Zp5hqbOfTFDiZt5K44ayS fnvlBEdPsoSL7x/DbpE7cgK+SfpEKdBSB73Z0G459wlOPyvqyXpjszHn5jZxV954Y1m+ kGEdjSVuItAQwgX97aOEemfU1wnkxTdxglSoMzxoatlN8MoeBrVjAW62nJb0oTxRhDev ZbdqY4WeMFHrp8BIIgBZ/dpZ8uGNofipq0r54d/7/2DNqRhUnOG6G0byQfKsrs8taS9d UJB9bfRxpxBlPxluUE6silYECHfQnTGdtWDbmLuSXWXNy9ykNzkceeJBNRFaESkJx2jL vn/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=R5oHCy312DV02dgL2mbxc6XsFyRp35/nzV9uAHuDVL8=; b=hko7ytpVhJrdKaGZxZIqJsH8lavNHvjkCOxPM0/Wqc4KeA+6merixoIhdqKTTsMZfP mxpJOnEHX8SRTQ272q3w/ldMJN6ffryX1oH8Uu9k7vw9svOH4KOwoRr+1WctnDnT9XUA NaajRoe2S5nHPgrsvV2703s1Z7DhdyPmGEVPMOXBybOo/W3Cn+Oi/z50qCKaa7F4K8gt GO77PSyzFh9gJWAHl+Mv8HayclfnOKTMxzYOsT2WsSY6XmGOo+BRfwIxyr+q7jZoKkje PRpga+NzeLDtUc+lVuBuSMbU3XWChglI18qzYG8K0Il+AFrZI37P/AVaZL3lHyExQLy/ VlRg== X-Gm-Message-State: AOAM531VoFASr8OpsBOeXtdZd5bPdjgBtIvMoj6VZa4OBj0rVY83NrYZ 6id212vyiGqugIDsuI/ea9MdIQKVHZx1cQ== X-Received: by 2002:a17:90a:889:b0:1c9:8baa:3eeb with SMTP id v9-20020a17090a088900b001c98baa3eebmr1612687pjc.44.1648511771099; Mon, 28 Mar 2022 16:56:11 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id p128-20020a625b86000000b004fa666a1327sm16920043pfb.102.2022.03.28.16.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 16:56:10 -0700 (PDT) Date: Mon, 28 Mar 2022 23:56:06 +0000 From: Sean Christopherson To: Chao Peng Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: Re: [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages Message-ID: References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-9-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220310140911.50924-9-chao.p.peng@linux.intel.com> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 10, 2022, Chao Peng wrote: > @@ -2217,4 +2220,34 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +#ifdef CONFIG_MEMFILE_NOTIFIER > +static inline long kvm_memfile_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, > + int *order) > +{ > + pgoff_t index = gfn - slot->base_gfn + > + (slot->private_offset >> PAGE_SHIFT); This is broken for 32-bit kernels, where gfn_t is a 64-bit value but pgoff_t is a 32-bit value. There's no reason to support this for 32-bit kernels, so... The easiest fix, and likely most maintainable for other code too, would be to add a dedicated CONFIG for private memory, and then have KVM check that for all the memfile stuff. x86 can then select it only for 64-bit kernels, and in turn select MEMFILE_NOTIFIER iff private memory is supported. diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ca7b2a6a452a..ee9c8c155300 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -48,7 +48,9 @@ config KVM select SRCU select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM - select MEMFILE_NOTIFIER + select HAVE_KVM_PRIVATE_MEM if X86_64 + select MEMFILE_NOTIFIER if HAVE_KVM_PRIVATE_MEM + help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent And in addition to replacing checks on CONFIG_MEMFILE_NOTIFIER, the probing of whether or not KVM_MEM_PRIVATE is allowed can be: @@ -1499,23 +1499,19 @@ static void kvm_replace_memslot(struct kvm *kvm, } } -bool __weak kvm_arch_private_memory_supported(struct kvm *kvm) -{ - return false; -} - static int check_memory_region_flags(struct kvm *kvm, const struct kvm_userspace_memory_region *mem) { u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; - if (kvm_arch_private_memory_supported(kvm)) - valid_flags |= KVM_MEM_PRIVATE; - #ifdef __KVM_HAVE_READONLY_MEM valid_flags |= KVM_MEM_READONLY; #endif +#ifdef CONFIG_KVM_HAVE_PRIVATE_MEM + valid_flags |= KVM_MEM_PRIVATE; +#endif + if (mem->flags & ~valid_flags) return -EINVAL; > + > + return slot->pfn_ops->get_lock_pfn(file_inode(slot->private_file), > + index, order); In a similar vein, get_locK_pfn() shouldn't return a "long". KVM likely won't use these APIs on 32-bit kernels, but that may not hold true for other subsystems, and this code is confusing and technically wrong. The pfns for struct page squeeze into an unsigned long because PAE support is capped at 64gb, but casting to a signed long could result in a pfn with bit 31 set being misinterpreted as an error. Even returning an "unsigned long" for the pfn is wrong. It "works" for the shmem code because shmem deals only with struct page, but it's technically wrong, especially since one of the selling points of this approach is that it can work without struct page. OUT params suck, but I don't see a better option than having the return value be 0/-errno, with "pfn_t *pfn" for the resolved pfn. > +} > + > +static inline void kvm_memfile_put_pfn(struct kvm_memory_slot *slot, > + kvm_pfn_t pfn) > +{ > + slot->pfn_ops->put_unlock_pfn(pfn); > +} > + > +#else > +static inline long kvm_memfile_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, > + int *order) > +{ This should be a WARN_ON() as its usage should be guarded by a KVM_PRIVATE_MEM check, and private memslots should be disallowed in this case. Alternatively, it might be a good idea to #ifdef these out entirely and not provide stubs. That'd likely require a stub or two in arch code, but overall it might be less painful in the long run, e.g. would force us to more carefully consider the touch points for private memory. Definitely not a requirement, just an idea.