Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp3181918imc; Wed, 13 Mar 2019 10:53:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqycAPs7gvdNAxRV2p7MpbjhEDbwClZBXE1qT7kedEChrk13TxOWmjsKIypnSoSjRdZy/lyR X-Received: by 2002:a63:4e05:: with SMTP id c5mr41992603pgb.393.1552499606364; Wed, 13 Mar 2019 10:53:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552499606; cv=none; d=google.com; s=arc-20160816; b=dIt/jVwcj1noSfhzYdO3hnF2tHhww/AlBYuFU0LA1XH+jEXFrymM1q0XDD/UO/SkI3 IB+MBj3rqxqBBIt4DKtJcuw6SjCUurr/cBhYy5Z1yQW82BjSlmGq0JtrlHyB5WB9mxLD FvO2c7EqUnZIZS3bLD331AlMgoJzKVlHrksL3970AjFcmKtmXI3Gx5UjX9J9BVcJ7+jC HmZU/0MkDofviWsJzsZplRn0zL5Qau4Z98Hd6hyV/IK5orKxum1L+LYFS41PdG+PaMg9 ag25GW7jIGxihWJgNy66nKotbp9SP15T5ldAImgYM/YFf6rogurMhdlRhENQgT1zTsFH wpOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=RMimn9X+ayHxW3yKR0QTtUya8qLch5jMuq89RAFqKac=; b=Rk4qYcFSdSi90HHq+25jOVHqyGPioHoP2L/IT8mUTnAikQDz5MnCinE5PzCxnXAHNe igrS5wQHSO+5WoSlH8Wi0qH2483fct0DQFGl6w4NCejgKa6HCHLvWPrBeOkHGbG9alEX OyHeU0+NyvvxZ35OpC/XSf31y3HK4I3WvO8nvoUrveseFuVdz2JuA9xeR9VXark4yKo8 0B+UXC5BHZZY7ThsY2YdISV0fnhu2LnmQzDF1sTl8OlmG9nuiGUaUvBHevQRKQqB85pz LRj/hk91i/4EhGppqSOfD22DN9hqsIfA8NJQYK3RtFy7Vxsq8JkW9eAiOI3OIJMUaeHl nlbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3VV16VNB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k1si3116964pld.332.2019.03.13.10.53.09; Wed, 13 Mar 2019 10:53:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3VV16VNB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727042AbfCMRwo (ORCPT + 99 others); Wed, 13 Mar 2019 13:52:44 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:41516 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726923AbfCMRwn (ORCPT ); Wed, 13 Mar 2019 13:52:43 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x2DHn4VU152429; Wed, 13 Mar 2019 17:51:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=RMimn9X+ayHxW3yKR0QTtUya8qLch5jMuq89RAFqKac=; b=3VV16VNBm3L6MCzCWFd5y/4Llk+2zXObtCnzb6fGGKIin14cI/q9IXBuiumwdTZz7jiU eb7ZY4/HA8PWO99pJvixJIPEJe9agBfE8Sq0TQSw22MQq+Z9cXyecoiSVzImmQIKUPJu 8gT+K6q5ODU5XiRN3SZPNOfFehXfX7N9HzaKAZ1jw7zu5Kxvg0yBD25s4l712ecFnpyo eTl/yz7NJjeDtgSGGKz/KnjdHbJ2uddGa+gK6Oak1RibCvA5XCdCc7dXz6Lu8fQP7+Wo bJCJRjAAucimZsupSdgA7g6alPYCkLwttR6gaGpbvRaXXsSvLfbQlktHkb/AtzVC2tld BA== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2r44wucm0j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Mar 2019 17:51:00 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x2DHox5d012995 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Mar 2019 17:50:59 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x2DHopDf008087; Wed, 13 Mar 2019 17:50:51 GMT Received: from [192.168.1.222] (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 13 Mar 2019 10:50:51 -0700 Subject: Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users To: Peter Xu Cc: linux-kernel@vger.kernel.org, Paolo Bonzini , Hugh Dickins , Luis Chamberlain , Maxime Coquelin , kvm@vger.kernel.org, Jerome Glisse , Pavel Emelyanov , Johannes Weiner , Martin Cracauer , Denis Plotnikov , linux-mm@kvack.org, Marty McFadden , Maya Gokhale , Andrea Arcangeli , Mike Rapoport , Kees Cook , Mel Gorman , "Kirill A . Shutemov" , linux-fsdevel@vger.kernel.org, "Dr . David Alan Gilbert" , Andrew Morton References: <20190311093701.15734-1-peterx@redhat.com> <58e63635-fc1b-cb53-a4d1-237e6b8b7236@oracle.com> <20190313060023.GD2433@xz-x1> From: Mike Kravetz Message-ID: Date: Wed, 13 Mar 2019 10:50:48 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <20190313060023.GD2433@xz-x1> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9194 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903130124 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/12/19 11:00 PM, Peter Xu wrote: > On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote: >> On 3/11/19 2:36 AM, Peter Xu wrote: >>> >>> The "kvm" entry is a bit special here only to make sure that existing >>> users like QEMU/KVM won't break by this newly introduced flag. What >>> we need to do is simply set the "unprivileged_userfaultfd" flag to >>> "kvm" here to automatically grant userfaultfd permission for processes >>> like QEMU/KVM without extra code to tweak these flags in the admin >>> code. >> >> Another user is Oracle DB, specifically with hugetlbfs. For them, we would >> like to add a special case like kvm described above. The admin controls >> who can have access to hugetlbfs, so I think adding code to the open >> routine as in patch 2 of this series would seem to work. > > Yes I think if there's an explicit and safe place we can hook for > hugetlbfs then we can do the similar trick as KVM case. Though I > noticed that we can not only create hugetlbfs files under the > mountpoint (which the admin can control), but also using some other > ways. The question (of me... sorry if it's a silly one!) is whether > all other ways to use hugetlbfs is still under control of the admin. > One I know of is memfd_create() which seems to be doable even as > unprivileged users. If so, should we only limit the uffd privilege to > those hugetlbfs users who use the mountpoint directly? Wow! I did not realize that apps which specify mmap(MAP_HUGETLB) do not need any special privilege to use huge pages. Honestly, I am not sure if that was by design or a bug. The memfd_create code is based on the MAP_HUGETLB code and also does not need any special privilege. Not to sidetrack this discussion, but people on Cc may know if this is a bug or by design. My opinion is that huge pages are a limited resource and should be under control. One needs to be a member of a special group (or root) to access via System V interfaces. The DB use case only does mmap of files in an explicitly mounted filesystem. So, limiting it in that manner would work for them. > Another question is about fork() of privileged processes - for KVM we > only grant privilege for the exact process that opened the /dev/kvm > node, and the privilege will be lost for any forked childrens. Is > that the same thing for OracleDB/Hugetlbfs? I need to confirm with the DB people, but it is my understanding that the exact process which does the open/mmap will be the one using userfaultfd. -- Mike Kravetz