Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp2928511pxb; Tue, 12 Jan 2021 01:59:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJwnCUu5TNZMX9y0DZWb7lFrWkwJUvg6Qu9I8RriK+hz3aGZU5/tGXC6BCZogk95/xMxAgcp X-Received: by 2002:a17:906:d81:: with SMTP id m1mr2701415eji.550.1610445579166; Tue, 12 Jan 2021 01:59:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610445579; cv=none; d=google.com; s=arc-20160816; b=WaG/Gl8Op8PTqFLspuRUwfk9TarHt6UbHNPUDp5tjizN4z+uYJLy/ihQg9PNS6Dtqk eAcfUz61spEdBLu84lV6pzcqSExuvQrRRrpO4ysP4Bu6rouSIearJW4378Keg0dYOsX1 wLC2VoX8hUEzM21WeZYj0YOivz/GKyZFdS5i20h+JOMxao1fr6JxxpvgZZVEV1Zvr2aq pn9Om3lIzQZglGMYrCuU07UaYqdDyi6d52IYBESUS8DgtPImpTOzDLhwhen4fVLD2kuL i4H9Gx5DkKIPmdqjng9wmSSdbHO8APU+zHvMinVJYsq9nrQfAxpxbIZgbOkgPMuZp0tq +OAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=6Ose25Stsa51S7paVM2cR2t4IRVZY53vBBpbLwwbkAA=; b=JAWxSsX1NBc59Vvq9oM24Np2bbZ71bizBv/U44T3BlyDOwjmgdy9A/KCiKG+aweDEq 1+72LagzrRg+2la419o3Aulpz9yfx2Yv5tXO9VkJ8JYvuLdRLuonxSMhd3xBrdr7je7H Wj4RD7SOXihVRRORTIp1VpsRSW53gD5tmAQKg8jk4+80t4sJ/1LNyC+Y3LBnJAexRag9 aBlr0nufk24LTOziEmlHvepxhC6DLGSZxa7a83Q+Otwy2iaFo9FS2C1iauTnIQmptML6 3fu83tXnGfG7Z5g20azTeh0zqcIzeKDrf/CYq/zFeF7m73nVu5Ldo3rSwAizhDP61WOc GYew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2020-01-29 header.b=OghPzvJl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cb16si923936ejb.392.2021.01.12.01.59.15; Tue, 12 Jan 2021 01:59:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2020-01-29 header.b=OghPzvJl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727016AbhALAYh (ORCPT + 99 others); Mon, 11 Jan 2021 19:24:37 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:56074 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390497AbhAKWot (ORCPT ); Mon, 11 Jan 2021 17:44:49 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10BMdih7051839; Mon, 11 Jan 2021 22:43:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=6Ose25Stsa51S7paVM2cR2t4IRVZY53vBBpbLwwbkAA=; b=OghPzvJl8YKoQgZy+h246WtI8gJr1vTaM4fGqtfvDD4fahBYOUU4hiOIcq3Dng8s4Hzk 3H+t61iDmRgbzOPeXjjsZUYcV76d9XnKjpoAOGe+z1hqA93Ar7+wzFDKyb7YJP6UQnTH w+/T3kNBCao6O3DsdnYRGzlEajcS8pn1VQ1PDdfEky7U1AhlB2ZTffNTRc7W5E13fx/x zo7RA6wj49BDQjBD+OqpjHrBgwPNFwM2wa/A3lGd2Tv4YLVAjjlbwuaLXZ/lB9oascl2 HUTkDh33+PDz2W4Y0DoO+osPTxLC40i3M5JyIQz/r9XAlMytVg3bRKrRviOmYKcssNWB qw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 360kcykpnx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 11 Jan 2021 22:43:01 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10BMeTU6011908; Mon, 11 Jan 2021 22:43:01 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 360kefuysq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 11 Jan 2021 22:43:01 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 10BMgpJ8003688; Mon, 11 Jan 2021 22:42:51 GMT Received: from [192.168.2.112] (/50.38.35.18) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 11 Jan 2021 14:42:51 -0800 Subject: Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE To: Axel Rasmussen , Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Michel Lespinasse , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Oliver Upton References: <20210107190453.3051110-1-axelrasmussen@google.com> From: Mike Kravetz Message-ID: <48f4f43f-eadd-f37d-bd8f-bddba03a7d39@oracle.com> Date: Mon, 11 Jan 2021 14:42:48 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <20210107190453.3051110-1-axelrasmussen@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9861 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 spamscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101110127 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9861 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 impostorscore=0 bulkscore=0 adultscore=0 suspectscore=0 malwarescore=0 lowpriorityscore=0 clxscore=1011 mlxlogscore=999 mlxscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101110127 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/7/21 11:04 AM, Axel Rasmussen wrote: > Overview > ======== > > This series adds a new userfaultfd registration mode, > UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults. > By "minor" fault, I mean the following situation: > > Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). > One of the mappings is registered with userfaultfd (in minor mode), and the > other is not. Via the non-UFFD mapping, the underlying pages have already been > allocated & filled with some contents. The UFFD mapping has not yet been > faulted in; when it is touched for the first time, this results in what I'm > calling a "minor" fault. As a concrete example, when working with hugetlbfs, we > have huge_pte_none(), but find_lock_page() finds an existing page. > > We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, > userspace resolves the fault by either a) doing nothing if the contents are > already correct, or b) updating the underlying contents using the second, > non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, > or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel > "I have ensured the page contents are correct, carry on setting up the mapping". > One quick thought. This is not going to work as expected with hugetlbfs pmd sharing. If you are not familiar with hugetlbfs pmd sharing, you are not alone. :) pmd sharing is enabled for x86 and arm64 architectures. If there are multiple shared mappings of the same underlying hugetlbfs file or shared memory segment that are 'suitably aligned', then the PMD pages associated with those regions are shared by all the mappings. Suitably aligned means 'on a 1GB boundary' and 1GB in size. When pmds are shared, your mappings will never see a 'minor fault'. This is because the PMD (page table entries) is shared. -- Mike Kravetz