Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1210897ybe; Wed, 11 Sep 2019 11:06:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqykBfvUV4QhPyZZA+/mgWqpTPYvRem4DuyQ15+BjYVYHBGNbg9DrWMF60SVZMY2jXlc1PNW X-Received: by 2002:a17:906:5c49:: with SMTP id c9mr31279258ejr.78.1568225184381; Wed, 11 Sep 2019 11:06:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568225184; cv=none; d=google.com; s=arc-20160816; b=MMV1pUiZXAocU6wfjiXS/5+6ZKDL5NZXGGr6BaPXMLzcOVdHlPXiNmULEIo2xLB2Ug 1P9qbHDKDVODAaMD+EtXj9hKebmp1gBF1IMYBK8zD2gfgsowzOlzM0lOuPznhgv1z5lN PU5pm+NjDFK0IhrO/+tBzSfB5XYJ2UuQ0FZnAxXPWSaM2gWhw93Cys19l94QbhKaokUc CXItu502eVlW4SCTnxSImEQKJM0SSl8BWMDxxqgIBPd6nSkFepU+Pjh0CW5xKSawqBLn KF39aiS2pDjpovDWUV2OSTkEjDLcRcErFFNksFSYc+gN3ZS6PJ6TUZEis8wCdutKopaW 7CMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ejBJaBYyLMzf+F9bRcUJTvTgpC06O9LFKChCfLQC7kE=; b=JSSO/awzLZrHFnvSbgO9aymmcDhBfqSJPeqlLjLMTyFqG5KA+V00qVQVftmBFHaCbo B9PwB+DoK4/nchIvfXSgfxv168SzmJ9l9dwFYdAHtXeghUPXJX/8hhzcarp7Mwv0EVNN l7n7xIafitR8hYJDZ0Q76KDUj18Ie4TZuh8+MQ0PqsW+/SCXn9X3vMX/7COtXwVMGHly VOP0X0g5Ja9tbQiJCxG62S/NOGWtKdVyh82Z4VtJp9vUZRFPe+els9m/A6rB7sVjlXxp /we0fOcXtQ3iqUJmzy19/kbEl0c/VqqjDiSGDP5eiEdKs5TZ1a/L1cSvnOma9qCoVblI LK5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rh25mxvW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d58si14817334eda.62.2019.09.11.11.05.55; Wed, 11 Sep 2019 11:06:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rh25mxvW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729931AbfIKSDg (ORCPT + 99 others); Wed, 11 Sep 2019 14:03:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:45294 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729699AbfIKSDf (ORCPT ); Wed, 11 Sep 2019 14:03:35 -0400 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 069BD21D7B for ; Wed, 11 Sep 2019 18:03:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568225014; bh=0OW0R0944mrm7VDRjfWc+dTciELz4dFGwuNUyC/pvHY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=rh25mxvWIo9i4lbqkPpHwEhMSxYb6KHnc8BEwdrdZfJGK8o5lCM+/eMKmj5iC6v2d bj+jPhZ9Lq137zzKXuSEpdEph2AQ2LxhEMQa4WBJK22sY4V02ZUY2LXbeesrtI8QY0 1QDOUTbyfLvXI5BYAX9ioK7Z8k6JeqkT8D8Fps+4= Received: by mail-wr1-f42.google.com with SMTP id t16so25625910wra.6 for ; Wed, 11 Sep 2019 11:03:33 -0700 (PDT) X-Gm-Message-State: APjAAAX1qogk5Hky6wkChx4kgiERKbVh+HdCXFgOyG7ITtY+F1hr8hr4 XJaUEal3kboLWfBhQCJ1DwS41tZDGJJa9OeEUIs+EA== X-Received: by 2002:adf:ee10:: with SMTP id y16mr4387873wrn.47.1568225012407; Wed, 11 Sep 2019 11:03:32 -0700 (PDT) MIME-Version: 1.0 References: <20190905103541.4161-1-thomas_os@shipmail.org> <20190905103541.4161-2-thomas_os@shipmail.org> <608bbec6-448e-f9d5-b29a-1984225eb078@intel.com> <20190905152438.GA18286@infradead.org> <10185AAF-BFB8-4193-A20B-B97794FB7E2F@amacapital.net> <92171412-eed7-40e9-2554-adb358e65767@shipmail.org> <76f89b46-7b14-9483-e552-eb52762adca0@shipmail.org> In-Reply-To: <76f89b46-7b14-9483-e552-eb52762adca0@shipmail.org> From: Andy Lutomirski Date: Wed, 11 Sep 2019 11:03:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit To: =?UTF-8?Q?Thomas_Hellstr=C3=B6m_=28VMware=29?= Cc: Andy Lutomirski , Christoph Hellwig , Dave Hansen , LKML , X86 ML , pv-drivers@vmware.com, Thomas Hellstrom , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , =?UTF-8?Q?Christian_K=C3=B6nig?= , Marek Szyprowski , Tom Lendacky Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 11, 2019 at 12:49 AM Thomas Hellstr=C3=B6m (VMware) wrote: > > Hi, Andy. > > On 9/11/19 6:18 AM, Andy Lutomirski wrote: > > > > As a for-real example, take a look at arch/x86/entry/vdso/vma.c. The > > "vvar" VMA contains multiple pages that are backed by different types > > of memory. There's a page of ordinary kernel memory. Historically > > there was a page of genuine MMIO memory, but that's gone now. If the > > system is a SEV guest with pvclock enabled, then there's a page of > > decrypted memory. They all share a VMA, they're instantiated in > > .fault, and there is no hackery involved. Look at vvar_fault(). > > So this is conceptually identical to TTM. The difference is that it uses > vmf_insert_pfn_prot() instead of vmf_insert_mixed() with the vma hack. > Had there been a vmf_insert_mixed_prot(), the hack in TTM wouldn't be > needed. I guess providing a vmf_insert_mixed_prot() is a to-do for me to > pick up. :) > > Having said that, the code you point out is as fragile and suffers from > the same shortcomings as TTM since > a) Running things like split_huge_pmd() that takes the vm_page_prot and > applies it to new PTEs will make things break, (although probably never > applicable in this case). Hmm. There are no vvar huge pages, so this is okay. I wonder how hard it would be to change the huge page splitting code to copy the encryption and cacheability bits from the source entry instead of getting them from vm_page_prot, at least in the cases relevant to VM_MIXEDMAP and VM_PFNMAP. > b) Running mprotect() on that VMA will only work if sme_me_mask is part > of _PAGE_CHG_MASK (which is addressed in a reliable way in my recent > patchset), otherwise, the encryption setting will be overwritten. > Indeed. Thanks for the fix! > > >> We could probably get away with a WRITE_ONCE() update of the > >> vm_page_prot before taking the page table lock since > >> > >> a) We're locking out all other writers. > >> b) We can't race with another fault to the same vma since we hold an > >> address space lock ("buffer object reservation") > >> c) When we need to update there are no valid page table entries in the > >> vma, since it only happens directly after mmap(), or after an > >> unmap_mapping_range() with the same address space lock. When another > >> reader (for example split_huge_pmd()) sees a valid page table entry, i= t > >> also sees the new page protection and things are fine. > >> > >> But that would really be a special case. To solve this properly we'd > >> probably need an additional lock to protect the vm_flags and > >> vm_page_prot, taken after mmap_sem and i_mmap_lock. > >> > > This is all horrible IMO. > > I'd prefer to call it fragile and potentially troublesome to maintain. Fair enough. > > That distinction is important because if it ever comes to a choice > between adding a new lock to protect vm_page_prot (and consequently slow > down the whole vm system) and using the WRITE_ONCE solution in TTM, we > should know what the implications are. As it turns out previous choices > in this area actually seem to have opted for the lockless WRITE_ONCE / > READ_ONCE / ptl solution. See __split_huge_pmd_locked() and > vma_set_page_prot(). I think it would be even better if the whole thing could work without ever writing to vm_page_prot. This would be a requirement for vvar in the unlikely event that the vvar vma ever supported splittable huge pages. Fortunately, that seems unlikely :)