Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp783039pxb; Wed, 15 Sep 2021 13:07:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/MLzSu4rM8AFCqlYzJZN0csGz1oicxZk3EWM0Hu69qvsOXnejHe8UJ03Lb8629/ORogCT X-Received: by 2002:a17:907:2632:: with SMTP id aq18mr1978249ejc.211.1631736468812; Wed, 15 Sep 2021 13:07:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631736468; cv=none; d=google.com; s=arc-20160816; b=kS1ofllTzUnNbRLmtH8AOXgcwA580X8DuYRpXasaFjH63yqEu/Ts8vJhb58vZY3W4L VZKNuYpgCIcrqUjYVMrtjsy+cvwhX8m6LygC+PhRERJ8CIlVW5SNCTCL30S9hSeyf0mp 608QVazl2sJS4gaNKd1vIQLSqrCoXueYVPUt3/Ma03t73AV/ti8biQY3aZON8yi8J/jp 0JqRx7eea6l9lDBA9NE5w8D0Ozoj5p1TuCGUX/K5Q+caG+szK0kjnpIFJALegmm+cZ2B umBXi4s+0rxBMYZsJKLq1GifxXgU/Fe5jyRlasg7XIemX1sRHFT1o4O5CfECsJxSSLiM vOjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=PJmcziiXmV8RIDbZ5D5UGsw521UNmkikWzjsy2xatzI=; b=XVMQCkzCE/6l2qgAWXv01uIBAKQ2ynLkcN5WlHZV7Sm2cHhmNv2PU37RNqpX6o3Imq DPaV+SZVlUMBVmMtAG6Z0g1/9TqZOOVHf4J4SwQK3XSLfcCxnRBgCCzDCuFfmHKo5Y6c Dz3s36I5GyFt+BhbjbDKF7gJzakGF/fUSWzG/GCuEPgWWfWPcEf2iOuDLiDrcyEIkb8W 3GsHygrgiJiPzTyj0MWizOdFGY5uE2Hvv4HMYdg8OfuwK5Z7fslHm0gxz4XRg3/d6Wg/ 1Pyb0sUyHoY24mQ+IPYAw6WSKombskdWl8LllXIAD6frb6apIJ5890qJS/purgEce4RR msAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=qYfsUO72; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n17si1031464ejl.475.2021.09.15.13.07.22; Wed, 15 Sep 2021 13:07:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=qYfsUO72; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231690AbhIOUGV (ORCPT + 99 others); Wed, 15 Sep 2021 16:06:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231487AbhIOUGU (ORCPT ); Wed, 15 Sep 2021 16:06:20 -0400 Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C9C7C061575 for ; Wed, 15 Sep 2021 13:04:51 -0700 (PDT) Received: by mail-lf1-x135.google.com with SMTP id h16so8963877lfk.10 for ; Wed, 15 Sep 2021 13:04:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=PJmcziiXmV8RIDbZ5D5UGsw521UNmkikWzjsy2xatzI=; b=qYfsUO72jzT4TWWeJIe6bnU4+xaGNqkuwhxq7qPUJWGcVmmE5nrH7w4PwgOVP/45i6 D6JCUo6tVAYsJy51Xvn+OiLGsnYNha2MvjzXv8QYcO3NdMMAV2fcBogsClVqHnnVc+HB EUuJGikXwdM6Ga3rUvHG502bJiXyBCWfkOnQ1JuFr5XXVCh+Zxfk9Q1F10pmP0vagj41 bHQvOeDYwZh2UkF9bB58Wc/gHQwSTrRs4iZLWMinpicl1khA6fit3JvQSwMDRvmXheqF wkLv1G99JxZdvfklwZSY6QAuLXEdLfO2HVMQvfBXlR4G/HjFD5sqRF1Ow6kcW2PlrVYg +AZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=PJmcziiXmV8RIDbZ5D5UGsw521UNmkikWzjsy2xatzI=; b=ewfHmmWzqT39ZtvIGnUpeT+aSSLPxzs6YKqo6E3h2fHsq35sDW29zkX4gPudhKItmp wupSPZLJ1ybCWLcKPatfCbUjZ3c15+TDDYiRwrdLaD063pY6o4f9ILDIqwISP/evgeF7 RQg+bkXo3WcW/vfA+TWNJjivzyfbRbqMudEplD9XgTUaO5ybt+UUveMsxGQsauQQKHvc 7XaQk9YId8Mz6Tgdita4PovwpisUo6LewtvPdArVVoFMNPY1BRmSZOsQ7TCP5uZg0BjK vWbBsHVcMpNbYxcEvQRSdR6byk1pz3VL2l7MKAVSVFKHGPVMz1oE/yE6bhDLdFSzHFnq 65wA== X-Gm-Message-State: AOAM533idrR1HcgZNqW0L9AHiVH0lAhhfBe1P5PEjjfSm49su+CWii9i v3r2TkHKZn0Eom88Ti/sEuaoeQ== X-Received: by 2002:a05:6512:1329:: with SMTP id x41mr1241997lfu.9.1631736289779; Wed, 15 Sep 2021 13:04:49 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id t15sm93908ljo.102.2021.09.15.13.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Sep 2021 13:04:48 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 26905102F9E; Wed, 15 Sep 2021 23:04:52 +0300 (+03) Date: Wed, 15 Sep 2021 23:04:52 +0300 From: "Kirill A. Shutemov" To: David Hildenbrand Cc: "Kirill A. Shutemov" , Chao Peng , Andy Lutomirski , Sean Christopherson , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, Kuppuswamy Sathyanarayanan , Dave Hansen , Yu Zhang Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20210915200452.wp6ippdvjz6zpv6a@box.shutemov.name> References: <20210824005248.200037-1-seanjc@google.com> <20210902184711.7v65p5lwhpr2pvk7@box.shutemov.name> <20210903191414.g7tfzsbzc7tpkx37@box.shutemov.name> <02806f62-8820-d5f9-779c-15c0e9cd0e85@kernel.org> <20210910171811.xl3lms6xoj3kx223@box.shutemov.name> <20210915195857.GA52522@chaop.bj.intel.com> <51a6f74f-6c05-74b9-3fd7-b7cd900fb8cc@redhat.com> <20210915142921.bxxsap6xktkt4bek@black.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 15, 2021 at 04:59:46PM +0200, David Hildenbrand wrote: > > > > I don't think we are, it still feels like we are in the early prototype > > > phase (even way before a PoC). I'd be happy to see something "cleaner" so to > > > say -- it still feels kind of hacky to me, especially there seem to be many > > > pieces of the big puzzle missing so far. Unfortunately, this series hasn't > > > caught the attention of many -MM people so far, maybe because other people > > > miss the big picture as well and are waiting for a complete design proposal. > > > > > > For example, what's unclear to me: we'll be allocating pages with > > > GFP_HIGHUSER_MOVABLE, making them land on MIGRATE_CMA or ZONE_MOVABLE; then > > > we silently turn them unmovable, which breaks these concepts. Who'd migrate > > > these pages away just like when doing long-term pinning, or how is that > > > supposed to work? > > > > That's fair point. We can fix it by changing mapping->gfp_mask. > > That's essentially what secretmem does when setting up a file. > > > > > > Also unclear to me is how refcount and mapcount will be handled to prevent > > > swapping, > > > > refcount and mapcount are unchanged. Pages not pinned per se. Swapping > > prevented with the change in shmem_writepage(). > > So when mapping into the guest, we'd increment the refcount but not the > mapcount I assume? No. The only refcount hold page cache. But we inform KVM via callback before removing the page from the page cache. It is similar to mmu_notifier scheme KVM uses at the moment. > > > > > > who will actually do some kind of gfn-epfn etc. mapping, how we'll > > > forbid access to this memory e.g., via /proc/kcore or when dumping memory > > > > It's not aimed to prevent root to shoot into his leg. Root do root. > > IMHO being root is not an excuse to read some random file (actually used in > production environments) to result in the machine crashing. Not acceptable > for distributions. Reading does not cause problems. Writing does. > I'm still missing the whole gfn-epfn 1:1 mapping discussion we identified as > requirements. Is that supposed to be done by KVM? How? KVM memslots that represents a range of GFNs refers to memfd (and holds file pin) plus offset in the file. This info enough to calculate offset in the file and find PFN. memfd tied 1:1 to struct kvm and KVM would make sure that there's only one possible gfn for a file offset. > > > ... and how it would ever work with migration/swapping/rmap (it's clearly > > > future work, but it's been raised that this would be the way to make it > > > work, I don't quite see how it would all come together). > > > > Given that hardware supports it migration and swapping can be implemented > > by providing new callbacks in guest_ops. Like ->migrate_page would > > transfer encrypted data between pages and ->swapout would provide > > encrypted blob that can be put on disk or handled back to ->swapin to > > bring back to memory. > > Again, I'm missing the complete picture. To make swapping decisions vmscan > code needs track+handle dirty+reference information. How would we be able to > track references? Does the hardware allow for temporary unmapping of > encrypted memory and faulting on it? How would page_referenced() continue > working? "we can add callbacks" is not a satisfying answer, at least for me. > Especially, when it comes to eventual locking problems and races. HW doesn't support swapping yet, so details will be just speculation. IIUC, there's an accessed bit in EPT that can be used for tracking. > Maybe saying "migration+swap is not supported" is clearer than "we can add > callbacks" and missing some details on the bigger picture. > > Again, a complete design proposal would be highly valuable, especially to > get some more review from other -MM folks. Otherwise there is a high chance > that this will be rejected late when trying to upstream and -MM people > stumbling over it (we've had some similar thing happening just recently > unfortunately ...). I only work on core-mm side of the story. We will definitely need to look at whole picture again once all pieces are somewhat ready. -- Kirill A. Shutemov