Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1927681ioo; Mon, 23 May 2022 06:29:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwXW/lDO2k//psRv5+moiGVN/7cURlDal1SyZ4CL+iPxurzirY+zsdjajruIj9r5g3dR+io X-Received: by 2002:a17:902:ce8d:b0:161:761f:6999 with SMTP id f13-20020a170902ce8d00b00161761f6999mr22534598plg.125.1653312559629; Mon, 23 May 2022 06:29:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653312559; cv=none; d=google.com; s=arc-20160816; b=yXueBVOGAzSleY2D4UlJzo92PJJ15AQYgmma8zDi7L/FsvPTRZsdKHn/AHIklmYjAw kT51O0xwbEkiVa2D4C4S6UF9MllnkE6JFBg8XzwiHQjSBMWimLhReg4PvndSzWcTm5oN 0w2RcTRDaCGzVD+BiNDla8Edyxlnst7Fma+ut1kBJkv1CysOgMQrHdcDltpvUPOV28JB gCK8rSQefj8lNU/vHMVfOMawhh5RYwLBHy1yZR3h/U3JVSUHxb6AWnf4T8DAyN8s4dhS sMJdNtzc56bvn7z6dvZCvtpI1p/fNjeDTSECbAbiwSMlseun/UZa/BGqR9kXp8HN9Xvi AiZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=YI6WykCbaWV6UNIBcmMswQRr3yQbPWUy5c1Bt2eETTA=; b=SSe7qcASxHupczI0THVZLVot0kixtvhYzn/cZlWBQc3n6J4FvBq963HGXPEv4mN92U bml9/y7I2IwQsa/jGQFnIhooVXIK/XqA3v5rHYrke8/4Qvhu90Nj8KuQWQuc0GETeMGM GrG20OPHrZBVbEZ1uRXo/bT+E+px4M+LPMhiNKIEJdyDyc2ZVCGx/HN1K8eTrLK/262C nS09yvan4pxcAgc2NraZvt4NIAhi9DCQlvyfNsuUGDF8s6RHNFpOW8WpjsLQNhwckNHH qOqKk1NRlUcd6Tw+uzbNT9LZv3zAAbrWSNJaecD/mhXNh5JHMwpmEFWvsZ5HXttxr3Id mkKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="DshTQ/kT"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id z5-20020a636505000000b0039d40eb5ca9si10475279pgb.644.2022.05.23.06.29.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 06:29:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="DshTQ/kT"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 24AAB22512; Mon, 23 May 2022 06:26:52 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236216AbiEWNZo (ORCPT + 99 others); Mon, 23 May 2022 09:25:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236213AbiEWNZd (ORCPT ); Mon, 23 May 2022 09:25:33 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF0803586F; Mon, 23 May 2022 06:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653312332; x=1684848332; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=EW9q/Cvw2jkobQ73y88ztoczGNs8Sdg5pZ9/RmN9nZs=; b=DshTQ/kTyQv0uT1tcBo4W3SOvUTPKJ7nLWQUgJB8Cicb2qDIwsmBgIV6 Ku0yn3jN6TZ2oO7bwFCOwwWddj31eJJnaJAMfPARPV40P4ym4/fPupuDw J5qZT0tZu8L8XndvyhIN2z0/UC6iFKtxc8t2wnmIi+JTJ5SXry1QlP1Zt SDJVuodSeSIUpYocgwOhdtiNr8Ny88toa7NVUywN8mBuKqmClOOYUPmDk BymNq9i0BZ6iTE18/s3yw+rIb32kIR/Y5NSZ44omKJdzSdDrbQL+fEbwl O3MnP5r7JxqE3fLFFYchJSXGwigWO8RmE2HeDzSX1hQq+DbpD1SqXke08 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10355"; a="272940238" X-IronPort-AV: E=Sophos;i="5.91,246,1647327600"; d="scan'208";a="272940238" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 May 2022 06:25:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,246,1647327600"; d="scan'208";a="608195288" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.101]) by orsmga001.jf.intel.com with ESMTP; 23 May 2022 06:25:20 -0700 Date: Mon, 23 May 2022 21:21:54 +0800 From: Chao Peng To: Sean Christopherson Cc: Andy Lutomirski , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Subject: Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory Message-ID: <20220523132154.GA947536@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> <20220519153713.819591-5-chao.p.peng@linux.intel.com> <8840b360-cdb2-244c-bfb6-9a0e7306c188@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 20, 2022 at 06:31:02PM +0000, Sean Christopherson wrote: > On Fri, May 20, 2022, Andy Lutomirski wrote: > > The alternative would be to have some kind of separate table or bitmap (part > > of the memslot?) that tells KVM whether a GPA should map to the fd. > > > > What do you all think? > > My original proposal was to have expolicit shared vs. private memslots, and punch > holes in KVM's memslots on conversion, but due to the way KVM (and userspace) > handle memslot updates, conversions would be painfully slow. That's how we ended > up with the current propsoal. > > But a dedicated KVM ioctl() to add/remove shared ranges would be easy to implement > and wouldn't necessarily even need to interact with the memslots. It could be a > consumer of memslots, e.g. if we wanted to disallow registering regions without an > associated memslot, but I think we'd want to avoid even that because things will > get messy during memslot updates, e.g. if dirty logging is toggled or a shared > memory region is temporarily removed then we wouldn't want to destroy the tracking. Even we don't tight that to memslots, that info can only be effective for private memslot, right? Setting this ioctl to memory ranges defined in a traditional non-private memslots just makes no sense, I guess we can comment that in the API document. > > I don't think we'd want to use a bitmap, e.g. for a well-behaved guest, XArray > should be far more efficient. What about the mis-behaved guest? I don't want to design for the worst case, but people may raise concern on the attack from such guest. > > One benefit to explicitly tracking this in KVM is that it might be useful for > software-only protected VMs, e.g. KVM could mark a region in the XArray as "pending" > based on guest hypercalls to share/unshare memory, and then complete the transaction > when userspace invokes the ioctl() to complete the share/unshare. OK, then this can be another field of states/flags/attributes. Let me dig up certain level of details: First, introduce below KVM ioctl KVM_SET_MEMORY_ATTR struct kvm_memory_attr { __u64 addr; /* page aligned */ __u64 size; /* page aligned */ #define KVM_MEMORY_ATTR_SHARED (1 << 0) #define KVM_MEMORY_ATTR_PRIVATE (1 << 1) __u64 flags; } Second, check the KVM maintained guest memory attributes in page fault handler (instead of checking memory existence in private fd) Third, the memfile_notifier_ops (populate/invalidate) will be removed from current code, the old mapping zapping can be directly handled in this new KVM ioctl(). Thought? Since this info is stored in KVM, which I think is reasonable. But for other potential memfile_notifier users like VFIO, some KVM-to-VFIO APIs might be needed depends on the implementaion. It is also possible to maintain this info purely in userspace. The only trick bit is implicit conversion support that has to be checked in KVM page fault handler and is in the fast path. Thanks, Chao