Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp691020ybe; Wed, 11 Sep 2019 03:13:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqxozkZ33mm6gL7tC4ZxBkkTT1OSYPi1ahBcUGnuAPD4CxZEIgu3bUkDo6IxvqLeajLj5MEd X-Received: by 2002:a17:906:5915:: with SMTP id h21mr28326535ejq.28.1568196802373; Wed, 11 Sep 2019 03:13:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568196802; cv=none; d=google.com; s=arc-20160816; b=wP808MqFfD1QAu+gHjS/LByi9O/4UoxKZTQPu9JRdUELQ6kfFOfdhXv4aEmIfkPBvk hrb6KIt/DSKv0akIo0Ey6Ly/IOvxbV64dYceQuJpre3z8951+XQLwJJZogPfLVSr2guw 04MonBuzznk6SKVStOF2VogzIFcXP+jXrbq2gztOuCvIsTW2pTs867gmkuUDI29VZUzi L+gVnaPDOdltlOxw/ZI+0acfswe8GnMTmWGEtn67sEpHdHmpa8FafknxLAM+Ij30WYAt jVre1B5/p1XnNXwMkihO5J95gBAgoQB568PPQaMNwDgMtHyVvIR+ii7566PKd+n1axRQ JjKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=Ju9WXAHj9Tifay1JzfmcN/rESWw4qxRWPIzegyvikz4=; b=EJXGdXlpHYlyWVouAgYFmHq56ZZ6TUBYI/2lZHu6Wq2UwH0e6C0xAT6gIj9Jk9qpTV eIuSLrpNVbxDcplRDkVzd3OBaTjhKOI5x6cNtDtC9Cl3Xoq41+tq/cih8ps7WPtAZ74B nekwfc439CkXK1JTDFjzbg/hc8/3j/K6XvP9hzOfR+b3EO7cpN54VYKIhDP269EfI0eW byauApDsnMrfMKdPpk18YrVpP3qPftTMnwDj90xOw6KhFpz6TlwyoGsAQ+uYxzGR/LRa 9WRPmjmLF5+o3mnyam0Boq3yGn7l+JSCpJM6WxjevKgDQMFBeprMSCk0ujurr2MnYn1T 1yCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@shipmail.org header.s=mail header.b=O71ZZZYV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c26si10585162edx.353.2019.09.11.03.12.58; Wed, 11 Sep 2019 03:13:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@shipmail.org header.s=mail header.b=O71ZZZYV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727601AbfIKKK1 (ORCPT + 99 others); Wed, 11 Sep 2019 06:10:27 -0400 Received: from pio-pvt-msa1.bahnhof.se ([79.136.2.40]:40280 "EHLO pio-pvt-msa1.bahnhof.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726781AbfIKKK0 (ORCPT ); Wed, 11 Sep 2019 06:10:26 -0400 Received: from localhost (localhost [127.0.0.1]) by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTP id B58023F83F; Wed, 11 Sep 2019 12:10:24 +0200 (CEST) Authentication-Results: pio-pvt-msa1.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=shipmail.org header.i=@shipmail.org header.b="O71ZZZYV"; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se X-Spam-Flag: NO X-Spam-Score: -2.099 X-Spam-Level: X-Spam-Status: No, score=-2.099 tagged_above=-999 required=6.31 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no Received: from pio-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (pio-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cjPnNEQH7o4U; Wed, 11 Sep 2019 12:10:23 +0200 (CEST) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 88D913F838; Wed, 11 Sep 2019 12:10:21 +0200 (CEST) Received: from localhost.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id B1E233601AA; Wed, 11 Sep 2019 12:10:21 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=shipmail.org; s=mail; t=1568196621; bh=O+WFacO8LkYZdAVKiWt+E7uv69sjpf0qYY4jHLX9jcs=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=O71ZZZYV7VMDsK0EwDguVfzwg6Ow42jbyj71uKdKEVds7ezFVuYWYPyMbZnglbgXM nQNLrdzksE1NId8PB4XmDp8MTSBSTkLwieS2q907KAo2TZK860ohPEbwLT+Gw89LY6 nS65tVMswln2/GZZ2wrxs7SnfAboPGRsdTHJ/B04= Subject: TTM huge page-faults WAS: Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit To: "Koenig, Christian" Cc: "linux-kernel@vger.kernel.org" , "pv-drivers@vmware.com" , Thomas Hellstrom , "dri-devel@lists.freedesktop.org" References: <20190905103541.4161-1-thomas_os@shipmail.org> <20190905103541.4161-2-thomas_os@shipmail.org> <608bbec6-448e-f9d5-b29a-1984225eb078@intel.com> <20190905152438.GA18286@infradead.org> <10185AAF-BFB8-4193-A20B-B97794FB7E2F@amacapital.net> <92171412-eed7-40e9-2554-adb358e65767@shipmail.org> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m_=28VMware=29?= Organization: VMware Inc. Message-ID: Date: Wed, 11 Sep 2019 12:10:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org removing people that are probably not interested from CC adding dri-devel On 9/11/19 11:08 AM, Koenig, Christian wrote: > Am 10.09.19 um 21:26 schrieb Thomas Hellström (VMware): >> On 9/10/19 6:11 PM, Andy Lutomirski wrote: >>>> On Sep 5, 2019, at 8:24 AM, Christoph Hellwig >>>> wrote: >>>> >>>>> On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas Hellström (VMware) >>>>> wrote: >>>>>> On 9/5/19 4:15 PM, Dave Hansen wrote: >>>>>> Hi Thomas, >>>>>> >>>>>> Thanks for the second batch of patches!  These look much improved >>>>>> on all >>>>>> fronts. >>>>> Yes, although the TTM functionality isn't in yet. Hopefully we >>>>> won't have to >>>>> bother you with those though, since this assumes TTM will be using >>>>> the dma >>>>> API. >>>> Please take a look at dma_mmap_prepare and dma_mmap_fault in this >>>> branch: >>>> >>>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-mmap-improvements >>>> >>>> they should allow to fault dma api pages in the page fault handler. >>>> But >>>> this is totally hot off the press and not actually tested for the last >>>> few patches.  Note that I've also included your two patches from this >>>> series to handle SEV. >>> I read that patch, and it seems like you’ve built in the assumption >>> that all pages in the mapping use identical protection or, if not, >>> that the same fake vma hack that TTM already has is used to fudge >>> around it.  Could it be reworked slightly to avoid this? >>> >>> I wonder if it’s a mistake to put the encryption bits in vm_page_prot >>> at all. >> From my POW, the encryption bits behave quite similar in behaviour to >> the caching mode bits, and they're also in vm_page_prot. They're the >> reason TTM needs to modify the page protection in the fault handler in >> the first place. >> >> The problem seen in TTM is that we want to be able to change the >> vm_page_prot from the fault handler, but it's problematic since we >> have the mmap_sem typically only in read mode. Hence the fake vma >> hack. From what I can tell it's reasonably well-behaved, since >> pte_modify() skips the bits TTM updates, so mprotect() and mremap() >> works OK. I think split_huge_pmd may run into trouble, but we don't >> support it (yet) with TTM. > Ah! I actually ran into this while implementing huge page support for > TTM and never figured out why that doesn't work. Dropped CPU huge page > support because of this. By incident, I got slightly sidetracked the other day and started looking at this as well. Got to the point where I figured out all the hairy alignment issues and actually got huge_fault() calls, but never implemented the handler. I think that's definitely something worth having. Not sure it will work for IO memory, though, (split_huge_pmd will just skip non-page-backed memory) but if we only support VM_SHARED (non COW) vmas there's no reason to split the huge pmds anyway. Definitely something we should have IMO. >> We could probably get away with a WRITE_ONCE() update of the >> vm_page_prot before taking the page table lock since >> >> a) We're locking out all other writers. >> b) We can't race with another fault to the same vma since we hold an >> address space lock ("buffer object reservation") >> c) When we need to update there are no valid page table entries in the >> vma, since it only happens directly after mmap(), or after an >> unmap_mapping_range() with the same address space lock. When another >> reader (for example split_huge_pmd()) sees a valid page table entry, >> it also sees the new page protection and things are fine. > Yeah, that's exactly why I always wondered why we need this hack with > the vma copy on the stack. > >> But that would really be a special case. To solve this properly we'd >> probably need an additional lock to protect the vm_flags and >> vm_page_prot, taken after mmap_sem and i_mmap_lock. > Well we already have a special lock for this: The reservation object. So > memory barriers etc should be in place and I also think we can just > update the vm_page_prot on the fly. I agree. This is needed for huge pages. We should make this change, and perhaps add the justification above as a comment. /Thomas > Christian. > >> /Thomas >> >> >> >>