Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4835439imw; Tue, 19 Jul 2022 14:21:13 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tl+o60B1M9ONhQ8rHv1EA0lKqVU3uZizvNxETs7/KcQpd0/MFv84SHNOmmyu9Xe9Mjnqzt X-Received: by 2002:a17:907:7396:b0:72d:a080:86a9 with SMTP id er22-20020a170907739600b0072da08086a9mr30089150ejc.49.1658265673679; Tue, 19 Jul 2022 14:21:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658265673; cv=none; d=google.com; s=arc-20160816; b=zQhmi7HgkFOlCvk5dufecUwU9FBJpFqxR3nqI7+A1JieFUIRQTR5A65f5WrSEHlYvn jGZSV4FBsraNqzNPCNNZuGrpH+yQDsx6NHAlzmQ2V+tt22TWf2a5RBBzdcSyNxvg9pQb Sk7EeA+pz13kxXVSJJqhDS632xwfArTixKwJBm597FZRwN7H2cyCJa3GXsypaJAAHAfT CWqq4Rdz+omNIoq1tCXh54FUrJT2pjZGWsj18nyDqhafQWCm0G+ZDN0kgF/CxEoTwiMl Q/7zndST4SPLkWpYpCUHTOkAvmn0vsUzNgJhkQG/jcoZKz1ip44xiQ0NOVg6fQQHF7rc 7VdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=qQ4UdihwA3u3irfHx6eGnZO09RoO1lQsj+Krqj+1YXg=; b=ery2mMoVE+5A60pf+g65LL7y7qbRWDMmiVjlhHAXJ08Lw705OYIWMJYctVDERkMD5e qLThGuG8qOk9O7z+gu0NyATNzBzKngqo5y2/T8hLg1SLIeQiwOnwEytha83+vNRiJstn TTuCVHAZ2rgXvtzGajDYsQlY/1VLoInqJr/n504Bm8eKHOCUltT/GyVMJRZnYr9f8eHw xT79SXClRbakd8Z5D4qSvTA2cdCnve77wgj4kd39mU+a44Rc64Ijj5qX0Pq5ukjaA3eA ffpfC0JmdqLCW6+/53qeN+gYrX5NFbk0sTz8E7YaUbe25fpcsIrmwLhfGQnh/LwRRHNI f3sA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RoWbuINs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hp40-20020a1709073e2800b00722e7756f3dsi24899575ejc.462.2022.07.19.14.20.48; Tue, 19 Jul 2022 14:21:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RoWbuINs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237282AbiGSVSt (ORCPT + 99 others); Tue, 19 Jul 2022 17:18:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233162AbiGSVSs (ORCPT ); Tue, 19 Jul 2022 17:18:48 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3537F5F9B4 for ; Tue, 19 Jul 2022 14:18:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658265527; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qQ4UdihwA3u3irfHx6eGnZO09RoO1lQsj+Krqj+1YXg=; b=RoWbuINsL8OwNS97R9W0oNIQ9QUQF2McjHve7ZtabNaRRs5ih4ph9n5e4tv8WUpuftbcHA clc2u1dYHrioCLYzRiT0M6eP7kdnBFlhrBGflaVMnWHR5cwDptLu3F2A18G94oyVu2VmRk 4mCb1w32CaUZ5ORkVbTbe4XKvnnBSb4= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-589-j_rRPVZtMa6rQywjMjas9A-1; Tue, 19 Jul 2022 17:18:38 -0400 X-MC-Unique: j_rRPVZtMa6rQywjMjas9A-1 Received: by mail-qt1-f198.google.com with SMTP id x16-20020ac85f10000000b0031d3262f264so11090873qta.22 for ; Tue, 19 Jul 2022 14:18:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=qQ4UdihwA3u3irfHx6eGnZO09RoO1lQsj+Krqj+1YXg=; b=sV3ax9DjoOkkKfzcNh5uQFdXeCRkFn96r3PbHhl2/HnzsOk0fSvP15Zm0B6FxjKf/E IAfNnS8bbcCKwR9KAzPrgHCsPgfFthmZX3QxPIoP5ZZt39dhtmO4bR/y7o/eo7f19olA IrpdmvICllh+IZrGx1RLkmYFKtkwIIndPSS54b9spbDjOGTdXrraMLiaZQgkjoPhqQri pinh1WEygiV8SF9E6GDbM25YxpejaYEixyQhQ/OiX0Hj3CQiWF5ileaRB+SurRXjI7T9 hJbm0YRraosENKECAOASEIYDWKrvKz3WN9NTC3t0csSV22gQQtk2zZlwhB8L7zGPMFJc miOg== X-Gm-Message-State: AJIora8DFDsKgh2NLVedpGrYGWOt/iB/G8vhNhkrZr1kpOijE0ps8v0Y DggATB5s0uPoYrQsOQyRgTJ339h1ElNh291vV1XsdJotQJPkm6QEF6gP2ROpYlf2QDyvbt1oWIA A/2eMl+IUQvLCP6eZXq5te6zF X-Received: by 2002:a05:620a:469f:b0:6b6:74c:6b10 with SMTP id bq31-20020a05620a469f00b006b6074c6b10mr2002411qkb.80.1658265517809; Tue, 19 Jul 2022 14:18:37 -0700 (PDT) X-Received: by 2002:a05:620a:469f:b0:6b6:74c:6b10 with SMTP id bq31-20020a05620a469f00b006b6074c6b10mr2002393qkb.80.1658265517591; Tue, 19 Jul 2022 14:18:37 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-37-74-12-30-48.dsl.bell.ca. [74.12.30.48]) by smtp.gmail.com with ESMTPSA id s10-20020ac85eca000000b0031ede43512bsm8530570qtx.44.2022.07.19.14.18.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 14:18:37 -0700 (PDT) Date: Tue, 19 Jul 2022 17:18:34 -0400 From: Peter Xu To: Axel Rasmussen Cc: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v4 2/5] userfaultfd: add /dev/userfaultfd for fine grained access control Message-ID: References: <20220719195628.3415852-1-axelrasmussen@google.com> <20220719195628.3415852-3-axelrasmussen@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220719195628.3415852-3-axelrasmussen@google.com> X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 19, 2022 at 12:56:25PM -0700, Axel Rasmussen wrote: > Historically, it has been shown that intercepting kernel faults with > userfaultfd (thereby forcing the kernel to wait for an arbitrary amount > of time) can be exploited, or at least can make some kinds of exploits > easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we > changed things so, in order for kernel faults to be handled by > userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl > must be configured so that any unprivileged user can do it. > > In a typical implementation of a hypervisor with live migration (take > QEMU/KVM as one such example), we do indeed need to be able to handle > kernel faults. But, both options above are less than ideal: > > - Toggling the sysctl increases attack surface by allowing any > unprivileged user to do it. > > - Granting the live migration process CAP_SYS_PTRACE gives it this > ability, but *also* the ability to "observe and control the > execution of another process [...], and examine and change [its] > memory and registers" (from ptrace(2)). This isn't something we need > or want to be able to do, so granting this permission violates the > "principle of least privilege". > > This is all a long winded way to say: we want a more fine-grained way to > grant access to userfaultfd, without granting other additional > permissions at the same time. > > To achieve this, add a /dev/userfaultfd misc device. This device > provides an alternative to the userfaultfd(2) syscall for the creation > of new userfaultfds. The idea is, any userfaultfds created this way will > be able to handle kernel faults, without the caller having any special > capabilities. Access to this mechanism is instead restricted using e.g. > standard filesystem permissions. > > Signed-off-by: Axel Rasmussen Thanks, this looks much better. Acked-by: Peter Xu -- Peter Xu