Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp3971778imc; Thu, 14 Mar 2019 09:17:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqyN9LFloyYbFYEIMUaEiyJJGDMwr7uEeY01fZsSdSxB5cyeUmWCBMVa68njGzfcV+uZaQi2 X-Received: by 2002:a17:902:ab8e:: with SMTP id f14mr51773560plr.84.1552580247949; Thu, 14 Mar 2019 09:17:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552580247; cv=none; d=google.com; s=arc-20160816; b=qdaTshagA57rdcGqem/zIFG5ANqvkCPOGlkaH0zhBmAIWnwvQdPru3Y5Xd16mx9KCx NstirBv4W4gSZwXYGpC9TMm51lqqynghzNQCXHi+KuGubLXB1XwS4bcsMMfL4oeECIs+ pEbszLrMpyphVUtpFScirn9YmKsoCXJxxsenVdtmhvKn2jmqT5z4GN3ESn9t78hh99lQ 3SZST6EmknWovUYb3Eo4jcc+WT+UVyH8YV3T0mzRDWBMU4f+EmsxVLefAEa8j7RF2ikO l+wy40mgNTdxfD6WM7vnn/JajxxLc5Zpl0qjIu62Qm78310nWVoAvDrRWAg4dzMHFHtK 23Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=krRP3KY1fGKj7cf+zc7TftkVbuz3X/Fy/C3+kHv3U/A=; b=KmU+vyM9D8De6gzsx55aa9D0ZRneFta2VxhAzTHOoiIyv/p6ABq/zIkoHDpO6GkB5L bdW9WyGPKE/2ldQWATHYrQT0FsNmOXdhx0lIvB5+IG9nHwbx0a9eTfubZiLd1wvtH+Pw gykfkTmeuL3GfE25W7NC/WO1jkt5aVpsJPREzZnv2ZsWIHU+rcBbOLKglMJjv0D+i8E7 QdM0Ce0BP49CUxiAQiA3NEBQW/vLmj4XA3HvWli4aC4nYh6/xydg+vfWj1kYXYVMLEj6 P7YSNS1Nh2uQyfJHX6EmCqBqQdsnDjZPIZBLcP/NCf/3W24noLugSJzYM+JDXGNZ+u9e h5Yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e10si12389446pgo.404.2019.03.14.09.17.12; Thu, 14 Mar 2019 09:17:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727705AbfCNQQj (ORCPT + 99 others); Thu, 14 Mar 2019 12:16:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48346 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727175AbfCNQQj (ORCPT ); Thu, 14 Mar 2019 12:16:39 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7F86F8553B; Thu, 14 Mar 2019 16:16:38 +0000 (UTC) Received: from sky.random (ovpn-121-1.rdu2.redhat.com [10.10.121.1]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 375A117CEB; Thu, 14 Mar 2019 16:16:34 +0000 (UTC) Date: Thu, 14 Mar 2019 12:16:30 -0400 From: Andrea Arcangeli To: Paolo Bonzini Cc: Peter Xu , Mike Kravetz , linux-kernel@vger.kernel.org, Hugh Dickins , Luis Chamberlain , Maxime Coquelin , kvm@vger.kernel.org, Jerome Glisse , Pavel Emelyanov , Johannes Weiner , Martin Cracauer , Denis Plotnikov , linux-mm@kvack.org, Marty McFadden , Maya Gokhale , Mike Rapoport , Kees Cook , Mel Gorman , "Kirill A . Shutemov" , linux-fsdevel@vger.kernel.org, "Dr . David Alan Gilbert" , Andrew Morton Subject: Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Message-ID: <20190314161630.GS25147@redhat.com> References: <20190311093701.15734-1-peterx@redhat.com> <58e63635-fc1b-cb53-a4d1-237e6b8b7236@oracle.com> <20190313060023.GD2433@xz-x1> <3714d120-64e3-702e-6eef-4ef253bdb66d@redhat.com> <20190313185230.GH25147@redhat.com> <1934896481.7779933.1552504348591.JavaMail.zimbra@redhat.com> <20190313234458.GJ25147@redhat.com> <298b9469-abd2-b02b-5c71-529b8976a46c@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <298b9469-abd2-b02b-5c71-529b8976a46c@redhat.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 14 Mar 2019 16:16:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 14, 2019 at 11:58:15AM +0100, Paolo Bonzini wrote: > On 14/03/19 00:44, Andrea Arcangeli wrote: > > Then I thought we can add a tristate so an open of /dev/kvm would also > > allow the syscall to make things more user friendly because > > unprivileged containers ideally should have writable mounts done with > > nodev and no matter the privilege they shouldn't ever get an hold on > > the KVM driver (and those who do, like kubevirt, will then just work). > > I wouldn't even bother with the KVM special case. Containers can use > seccomp if they want a fine-grained policy. We can have a single boolean 0|1 and stick to a simpler sysctl and no gid and if you want to use userfaultfd you need to enable it for all users. I agree seccomp already provides more than enough granularity to do more finegrined choices. So this will be for who's paranoid and prefers to disable userfaultfd as a whole as an hardening feature like the bpf sysctl allows: it will allow to block uffd syscall without having to rebuild the kernel with CONFIG_USERFAULTFD=n in environments where seccomp cannot be easily enabled (i.e. without requiring userland changes). That's very fine with me, but then it wasn't me complaining in the first place. Kees? If the above is ok, we can implement it as a static key, not that the syscall itself is particularly performance critical but it'll be simple enough as a boolean (only the ioctl are performance critical but those are unaffected). The blog post about UAF is not particularly interesting in my view, unless both of the following points are true 1) it can be also proven that the very same two UAF bugs, cannot be exploited by other means (as far as I can tell it can be exploited by other means regardless of userfaultfd) and 2) the slab randomization was actually enabled (99% of the time in all POC all randomization features like kalsr are incidentally disabled first to facilitate publishing papers and blog posts, but those are really the features intended to reduce the reproduciblity of exploits against UAF bugs, not disabling userfaultfd which only provides a minor advantage, and unlike in PoC environments, we enable those slab randomization in production 100% of the time whenever they're available in the kernel). Thanks, Andrea