Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2825638pxb; Mon, 19 Apr 2021 15:14:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzL4Qftp11J7W7iKEo9FEyz12rUBSBtcpC0w0fJeTpNZCoROQZ3NqP7ASaDRhknwtOUwOg2 X-Received: by 2002:a17:90b:388a:: with SMTP id mu10mr1280998pjb.203.1618870475895; Mon, 19 Apr 2021 15:14:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618870475; cv=none; d=google.com; s=arc-20160816; b=LIzj5Zs1QNE9GD01eh8A2whuqD1SpZ9sv9s+E3mA5oFU9j4Y8Wb2fcpCWwYEeUMkD8 1UazYJjEcZcYUpQQFM5RqfyH5DNl5wPtJ6RVNAFH7CGbQZcNqbJrRnHtaerqbNI2eApa 8qQ+8lgFeD3kaTYNGeP1370qyszSSkgvf5uFBemQMMCJs0jEl+afWCv8a4/vRYaVC9/w O1QCxP1evRdxEdKuxO15Zky5b4Cgalsf2DYP12YtnQlU0iDyEvXxtduUClj8VmiJPAgk etBHGZYQnhdHFcgjkWkLf26A0enJscvapnC4az5jZt7iUKJTEJl3QcSVY1xxhQ+RZUUM zXrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=Yt1B5gSPcHjxLjaX0l94gOJJ7HZzNFqMG6taUUcuBkE=; b=eard3z6CiTmuoJWajH9sQDqckmx8loZ9Rm17bBJ5/paJ9aI0B5LgIuCb1sYZYjAwBh UAgBD0jfChvjCnBFuiv88Xo0SCegAhCXQFE3tFlLxgEJ4IaJT7bsEuphfWE0dxSQY1jr u+w1LhqRD9f+Gmnk4iiyvHrDTMah2CNV2G8A5icdKs3Gu+3ikhY8mWbL7lO8GL+Daa23 Ym1t9jYFqtkCZQcCcgRNSMW40DLPQ2D76N+QFQnWbO7uQw83HV8oOy7Tymdk5FTK9lrF mNZ847DaRqORxqZVQzrLHRnIqLBkhpw9HSYDkuTgRq74UPzlZpJ0nV2R2JBhmXlwJX+8 4T0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dd2/LBmY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ei11si821076pjb.123.2021.04.19.15.14.23; Mon, 19 Apr 2021 15:14:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dd2/LBmY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240750AbhDSSMm (ORCPT + 99 others); Mon, 19 Apr 2021 14:12:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:24115 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240427AbhDSSMl (ORCPT ); Mon, 19 Apr 2021 14:12:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618855931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yt1B5gSPcHjxLjaX0l94gOJJ7HZzNFqMG6taUUcuBkE=; b=Dd2/LBmYRxcxedKWqiZID8kTkzmuv3lLjXkxjPCgG0tC+hgPbZiem5ROCjRriiXPJTOOZs RwzXg8tvTariXX3IwLoOjTw1vYZ7zzAyQdXNYV2f8ZdpG9kcutLyvg+gLBg4xFHE8Ke66v qc6XsZUNbW4SsFINKyr9l0EPBq6dgbc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-101-G2rGUJBFMpupBDF1EsvwRw-1; Mon, 19 Apr 2021 14:12:09 -0400 X-MC-Unique: G2rGUJBFMpupBDF1EsvwRw-1 Received: by mail-wm1-f70.google.com with SMTP id z1-20020a7bc1410000b0290136839f16aaso891105wmi.0 for ; Mon, 19 Apr 2021 11:12:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=Yt1B5gSPcHjxLjaX0l94gOJJ7HZzNFqMG6taUUcuBkE=; b=T3T5hIjceqTs+5fxtNB8Q43uN87dMgHUT20lCievcvwdGwf9oBUur59mUTUXbPqsdW bKE2R7RRVxTgqKIW7Cc5cCe7U/dpP+uXNpL9KEuEF2nF5xJtVCG56PTZTMAqPrqIfsYq yldEuu8jb+ryN1dZOmHVDE+0l0VNDaEMmXyiBvMUjLukrED1Zn5yjygHQ/MI8zEJZNC5 pcNBBAvrf67juAGbvjHl+SXyErTRRh7fdce2N0wLdV4VZFBkYe+kzwKZO9XrAPGmav1l fwT/dVF4O3OaE8flaq/C+qyPZVci9rWMCjgT5b5pDRN/CwqiY0lkxR5AJhZPwhpbmWxR 1O+g== X-Gm-Message-State: AOAM5326ljb0tHPamDGTKPwz18ow2al/3KQHJffJE//1l51q7pNulewF vFOT84QsGxaP80QR5baUH+5uGeIODrR0NT3SNdsL3m8X/LzkZQqkdwq3GEcWGlpCgGiwh7z7doE FKHG+KQ+zjiMO4IouYMF1puoZU1Ko5PTVRsArmDQ71UmoGEy1gylIaZ4oMLvORxgyoDA1Dm7j X-Received: by 2002:adf:9245:: with SMTP id 63mr15301838wrj.324.1618855928183; Mon, 19 Apr 2021 11:12:08 -0700 (PDT) X-Received: by 2002:adf:9245:: with SMTP id 63mr15301787wrj.324.1618855927818; Mon, 19 Apr 2021 11:12:07 -0700 (PDT) Received: from [192.168.3.132] (p5b0c69b8.dip0.t-ipconnect.de. [91.12.105.184]) by smtp.gmail.com with ESMTPSA id i15sm22513508wrr.73.2021.04.19.11.12.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Apr 2021 11:12:07 -0700 (PDT) Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages To: Sean Christopherson , "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> <20210416154106.23721-14-kirill.shutemov@linux.intel.com> <20210419142602.khjbzktk5tk5l6lk@box.shutemov.name> <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> From: David Hildenbrand Organization: Red Hat Message-ID: Date: Mon, 19 Apr 2021 20:12:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.04.21 20:09, Sean Christopherson wrote: > On Mon, Apr 19, 2021, Kirill A. Shutemov wrote: >> On Mon, Apr 19, 2021 at 04:01:46PM +0000, Sean Christopherson wrote: >>> But fundamentally the private pages, are well, private. They can't be shared >>> across processes, so I think we could (should?) require the VMA to always be >>> MAP_PRIVATE. Does that buy us enough to rely on the VMA alone? I.e. is that >>> enough to prevent userspace and unaware kernel code from acquiring a reference >>> to the underlying page? >> >> Shared pages should be fine too (you folks wanted tmpfs support). > > Is that a conflict though? If the private->shared conversion request is kicked > out to userspace, then userspace can re-mmap() the files as MAP_SHARED, no? > > Allowing MAP_SHARED for guest private memory feels wrong. The data can't be > shared, and dirty data can't be written back to the file. > >> The poisoned pages must be useless outside of the process with the blessed >> struct kvm. See kvm_pfn_map in the patch. > > The big requirement for kernel TDX support is that the pages are useless in the > host. Regarding the guest, for TDX, the TDX Module guarantees that at most a > single KVM guest can have access to a page at any given time. I believe the RMP > provides the same guarantees for SEV-SNP. > > SEV/SEV-ES could still end up with corruption if multiple guests map the same > private page, but that's obviously not the end of the world since it's the status > quo today. Living with that shortcoming might be a worthy tradeoff if punting > mutual exclusion between guests to firmware/hardware allows us to simplify the > kernel implementation. > >>>> - Add a new GUP flag to retrive such pages from the userspace mapping. >>>> Used only for private mapping population. >>> >>>> - Shared gfn ranges managed by userspace, based on hypercalls from the >>>> guest. >>>> >>>> - Shared mappings get populated via normal VMA. Any poisoned pages here >>>> would lead to SIGBUS. >>>> >>>> So far it looks pretty straight-forward. >>>> >>>> The only thing that I don't understand is at way point the page gets tied >>>> to the KVM instance. Currently we do it just before populating shadow >>>> entries, but it would not work with the new scheme: as we poison pages >>>> on fault it they may never get inserted into shadow entries. That's not >>>> good as we rely on the info to unpoison page on free. >>> >>> Can you elaborate on what you mean by "unpoison"? If the page is never actually >>> mapped into the guest, then its poisoned status is nothing more than a software >>> flag, i.e. nothing extra needs to be done on free. >> >> Normally, poisoned flag preserved for freed pages as it usually indicate >> hardware issue. In this case we need return page to the normal circulation. >> So we need a way to differentiate two kinds of page poison. Current patch >> does this by adding page's pfn to kvm_pfn_map. But this will not work if >> we uncouple poisoning and adding to shadow PTE. > > Why use PG_hwpoison then? > I already raised that reusing PG_hwpoison is not what we want. And I repeat, to me this all looks like a big hack; some things you (Sena) propose sound cleaner, at least to me. -- Thanks, David / dhildenb