Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3506908pxb; Mon, 4 Apr 2022 19:11:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzTXyGFh1tqTo2uF4cN3MEh7X0SRcb2wyUa+IQAHA1ybR0BpSem8MgV06AvFohTmQ1M7ZBX X-Received: by 2002:a17:90b:4f8d:b0:1c6:408b:6b0d with SMTP id qe13-20020a17090b4f8d00b001c6408b6b0dmr1422499pjb.90.1649124695870; Mon, 04 Apr 2022 19:11:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649124695; cv=none; d=google.com; s=arc-20160816; b=Rz6wXqXhJ8rTwE6EpuhGTfDQEmETPVlvtqgkYDiFjmNVfnsTK6wsb7cqj1zVfB3pEb DgX0hnzFtgNYO/na8ksuh8TQbVOcQxFUtzRLA0xxaPooJVxo1Bigoxg5doQjC8iFhMS+ 4a+2Ux+xxl7uQpmgp5HgE85PoBsvoKr7P3ddHbl9dsCnBOua57VcPCfXMW0DiBnSxIS3 BEE0+w6S47dV6R1FvdCIDleZp+UyCRFPj7L46jtN+IMoo77Nkg+tBtPey/1qD1YGfsT7 ZdPcR0h+GD30c3yUV5DnKx1uAJ/ns9YBm14iH251wTt//nLihvyHEtScV3N9FEfRDnW4 G3sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=i+zr5M68aloOJzDo2BfmM/AByzuuaBC8sU7HTQ424Ng=; b=eDCT+hyEwkLRlHDTDw50bhlVGhcN1K70S3ytZsEQA7aUrVO89o7VkVheCkeTf0hLx8 ppa831Pi4Sg0d42cleaukm1/fcoMvq8E+zdrXSEj8Pt7jezmuIrY49gzOkDW4ZmICNlw RB3JRpyEqRH9ywUdtWcM6YUvKN0a5suxbhca89Mn8YaB5OzPrCF8LiczTsBWj082QeED luJkTcaJKsfjFjSbO3+DvhJm7jj0XNzYqgaljv6DVfxm8NdEWPIIl3OcXy3CtJdvFmWq isatAlc65UacBmM5JureM41uFprbM7vkNZZ81KC1Gduc4N812kEyJVPX3t/5jIqAhA95 JVfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=P1lRqzui; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id h9-20020a170902f70900b00153b2d16549si10240822plo.337.2022.04.04.19.11.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 19:11:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=P1lRqzui; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 99DAB8A6CE; Mon, 4 Apr 2022 17:34:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378302AbiDDPD3 (ORCPT + 99 others); Mon, 4 Apr 2022 11:03:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1358345AbiDDPDX (ORCPT ); Mon, 4 Apr 2022 11:03:23 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDC352F382 for ; Mon, 4 Apr 2022 08:01:26 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id x20so3578065edi.12 for ; Mon, 04 Apr 2022 08:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=i+zr5M68aloOJzDo2BfmM/AByzuuaBC8sU7HTQ424Ng=; b=P1lRqzuiE1i5AQgVXMAadjQxuZaNvOCmKCWVC7NoAosWtxg6O6uGhV1Fs5d9UThrZ1 zNrRtXos2DqCwVW5VD+GKbPk1i42rM9aiUb9yc0F+sIK7PRpivdnddQGLO7YoWG7oZmw 1x+zSzZfSodiP6Ntsi6OzrVuazT7b0XFcTsGnlR/cfyiEBMAvlIIi+by8VcRPuSXfbaq jvnXsXf/ijVmNTh2jlG406awsf8vRVsNEmYxTWAk77H/Jd0WSG835ZDrosun+PmVeAme A2bgEVeBBvzo/mxIlfxdxxqbrAD3YyIIefiix/stRu7ICwOCzguPsSvKzSzjE2W+YgM+ 7sjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=i+zr5M68aloOJzDo2BfmM/AByzuuaBC8sU7HTQ424Ng=; b=F2XnFfyRsEu6V1FubK4wOGanIoQQ8VoPsoivJpH07sQgHAFJUerCWLVMZMqsqb98wi fxM6TDC8hcG04Etf8VKLF2LuumN25q9KDCvWC7wk3mHcOlQywKuKQ+NAqsnuYoto+4TB kJ/hZAsAMg53e/dB3gZkxpfY37qDgAszM+2EncNpv7lUoV0fDyxxNDMUfkJlmTxWEHXu fQLP+PmpAChWRnviNmTzjV0ccCXTPyQ7+Sg3fKn6B8oWoGJ0uIVEDgVo3MiKH2IbX5EI 5x9j0u2RWb5jWwbLPrb0La4HpiLJdwCjb9jKIKIDm5nyzEW4/FAUbpVZc3OxhV3Rrhaf 9Ljw== X-Gm-Message-State: AOAM533CiG6ymnfMc0fRPvx1FespxJiDghIK0sLwXM6Z940i1CVfYnVL s7SbbYADtiNTA+dbi+d4Fz+ZJQ== X-Received: by 2002:aa7:cdc9:0:b0:419:197e:14d9 with SMTP id h9-20020aa7cdc9000000b00419197e14d9mr498264edw.375.1649084485024; Mon, 04 Apr 2022 08:01:25 -0700 (PDT) Received: from google.com (30.171.91.34.bc.googleusercontent.com. [34.91.171.30]) by smtp.gmail.com with ESMTPSA id y14-20020a056402440e00b00416046b623csm5690519eda.2.2022.04.04.08.01.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 08:01:24 -0700 (PDT) Date: Mon, 4 Apr 2022 15:01:21 +0000 From: Quentin Perret To: Andy Lutomirski Cc: Sean Christopherson , Steven Price , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Marc Zyngier , Will Deacon Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <88620519-029e-342b-0a85-ce2a20eaf41b@arm.com> <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote: > On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote: > > On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote: > > > > To answer your original question about memory 'conversion', the key > > thing is that the pKVM hypervisor controls the stage-2 page-tables for > > everyone in the system, all guests as well as the host. As such, a page > > 'conversion' is nothing more than a permission change in the relevant > > page-tables. > > > > So I can see two different ways to approach this. > > One is that you split the whole address space in half and, just like SEV and TDX, allocate one bit to indicate the shared/private status of a page. This makes it work a lot like SEV and TDX. > > The other is to have shared and private pages be distinguished only by their hypercall history and the (protected) page tables. This saves some address space and some page table allocations, but it opens some cans of worms too. In particular, the guest and the hypervisor need to coordinate, in a way that the guest can trust, to ensure that the guest's idea of which pages are private match the host's. This model seems a bit harder to support nicely with the private memory fd model, but not necessarily impossible. Right. Perhaps one thing I should clarify as well: pKVM (as opposed to TDX) has only _one_ page-table per guest, and it is controllex by the hypervisor only. So the hypervisor needs to be involved for both shared and private mappings. As such, shared pages have relatively similar constraints when it comes to host mm stuff -- we can't migrate shared pages or swap them out without getting the hypervisor involved. > Also, what are you trying to accomplish by having the host userspace mmap private pages? What I would really like to have is non-destructive in-place conversions of pages. mmap-ing the pages that have been shared back felt like a good fit for the private=>shared conversion, but in fact I'm not all that opinionated about the API as long as the behaviour and the performance are there. Happy to look into alternatives. FWIW, there are a couple of reasons why I'd like to have in-place conversions: - one goal of pKVM is to migrate some things away from the Arm Trustzone environment (e.g. DRM and the likes) and into protected VMs instead. This will give Linux a fighting chance to defend itself against these things -- they currently have access to _all_ memory. And transitioning pages between Linux and Trustzone (donations and shares) is fast and non-destructive, so we really do not want pKVM to regress by requiring the hypervisor to memcpy things; - it can be very useful for protected VMs to do shared=>private conversions. Think of a VM receiving some data from the host in a shared buffer, and then it wants to operate on that buffer without risking to leak confidential informations in a transient state. In that case the most logical thing to do is to convert the buffer back to private, do whatever needs to be done on that buffer (decrypting a frame, ...), and then share it back with the host to consume it; - similar to the previous point, a protected VM might want to temporarily turn a buffer private to avoid ToCToU issues; - once we're able to do device assignment to protected VMs, this might allow DMA-ing to a private buffer, and make it shared later w/o bouncing. And there is probably more. IIUC, the private fd proposal as it stands requires shared and private pages to come from entirely distinct places. So it's not entirely clear to me how any of the above could be supported without having the hypervisor memcpy the data during conversions, which I really don't want to do for performance reasons. > Is the idea that multiple guest could share the same page until such time as one of them tries to write to it? That would certainly be possible to implement in the pKVM environment with the right tracking, so I think it is worth considering as a future goal. Thanks, Quentin