Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2828015rwd; Fri, 19 May 2023 10:40:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ65vu7wI/oOT9n/DtVCLiKQPMozUjOBVXq9EOTHr8o6wj5bmFlBDzGR0Toy+8CONtxpMQHi X-Received: by 2002:a05:6a00:244a:b0:645:fc7b:63db with SMTP id d10-20020a056a00244a00b00645fc7b63dbmr3987592pfj.20.1684518035172; Fri, 19 May 2023 10:40:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684518035; cv=none; d=google.com; s=arc-20160816; b=s0jkyz5WvV9V+RY8VK7TvB8OkZpUIgNDlBHaCjWjCEcdh+oXcmUbL+cUteSAD4ZU/W WxsU8RcxkOD+xT+GYWBbfxGgzzDGhYieFWCBTYU+viBwa1JKnSbydjk9pPwWLtTovdc/ xebdq0hdKAaZOKTgQNBl55GD2zizGZPw4r93t5zB2AnBYSjLGbRMws7pB1MBHJQxm+te 4vYaUtEcYeiH/Ei0mYpRTiRuzdzZ3XYXZM4veE/j4MOBr7p1ow3VW92lN+6fBwBBb60A dqHShyGMbf22I4iBOBmth886L7lNfYNRkpT2Oc45/zPfmukMryMAkT2FtEFH7tVNZ182 ZVAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=28CIrmSeW20zxT1lrEW3DjvASU36Pu1EBFA82UM2iKs=; b=XH7oy4hluq6P3AWhkjci+TB9lGbDv/mt6gQSf+m3oe5omKMx7omWPjgRvOzKgf3QbT glO5qfoQUnIWQsABhu/PA+n3iBTW8MyJyrJVRghiCBcNn9d1JpvnIUY4hUvOirXznmpE HqJntZ5au5879h45QGDQdI1HmCWpzm+cuJoO2a/WkteAZCGkP4B6sFTwfNANm+eMlKBg h/U7ngUT3MfoQbMWs9iGomtcvWqkbNd/jiSK81IhJfOs79wqug42zEzNE2jwUe2T5J8j AU2/pKK79Gu8i7y8YrU0vgeL/g9f3Ehho6wZ2Tb0Qk3VMT7ecr/PZP1IjNuAE5Z5x8RO CaQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1Iyutpx1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k27-20020aa79d1b000000b00646cda8fe3bsi4262171pfp.81.2023.05.19.10.40.17; Fri, 19 May 2023 10:40:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1Iyutpx1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232014AbjESRdS (ORCPT + 99 others); Fri, 19 May 2023 13:33:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229912AbjESRdP (ORCPT ); Fri, 19 May 2023 13:33:15 -0400 Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 567D710DC for ; Fri, 19 May 2023 10:32:50 -0700 (PDT) Received: by mail-qv1-xf36.google.com with SMTP id 6a1803df08f44-623f24b7ec9so119536d6.3 for ; Fri, 19 May 2023 10:32:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684517569; x=1687109569; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=28CIrmSeW20zxT1lrEW3DjvASU36Pu1EBFA82UM2iKs=; b=1Iyutpx1ohuULHbEgEiXeeQ1FIPdxRvYDFE5PYOWHG5EX1xjrOwlqkkaUlN+7l8TzD szXzmjvzth53x9OKnK+vbmrjVx2S/TQI1MVGqVpSZJ6b50XxZ57wRuiglgYZ5X3eElYM U14n6OeUEtrMvHAiSeRKLqNhiYLDX4PpmHF0WSYKVa/jqWU52r+Q6mf2TnlXpKAu/tEt Cd3PRltGv5+jf/a0H2dXDXUz2W4VV/N4y5oyaMzn8RMP5DPh7hhH1kRhUsQ+MO6KgFPN aEYdn0/kvp6XECDLQmzu/PZqZYwMJLAqGKgQOeoR4Xk0MlAFFTRUeMpzldu9/WztkDu1 iU1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684517569; x=1687109569; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=28CIrmSeW20zxT1lrEW3DjvASU36Pu1EBFA82UM2iKs=; b=dumt2BYfv2r+7XI0n+yvZvUMWBPXD10mURGrAVGKJoRjVzEtt/sZ7bmgYlWusfc+YQ U7Dksqq4Yd/B8zRDqbFrAV01uVlj2lndNP0haryS4+kO0YF5Bk5DVEWt42v++hew+KvZ 2p3mrSYB/Azxm8aI0OdJ1XmcGwUWIF2PIppMmY3v2R5tNtlszFrThRQr36CPW+j9aPSA eb3IXiw6LIizHsO2tOPAxB3/iwAcSyJIRPHsNtxJG5U77BDnjQ+vXbi58IYmy8pJZHvK vWB27R6/AOfs5lEOIwC4Cv6WXjFo2eqXRv/WmlyCFd4GSlECIhXpk68GJa5bD/4uu1MM Er5w== X-Gm-Message-State: AC+VfDx5xJKStOoAdMBRboYNlKCZ9cUNN9XdXzGKLQevsAvF/AxHHY1a TCPQLNb4Q6jVJisNH1d0uSJXXhnFZzDj6GiUEjVwIA== X-Received: by 2002:a05:6214:416:b0:621:7d4:e059 with SMTP id z22-20020a056214041600b0062107d4e059mr6079776qvx.10.1684517569077; Fri, 19 May 2023 10:32:49 -0700 (PDT) MIME-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> <32fdc2c8-b86b-92f3-1d5e-64db6be29126@redhat.com> In-Reply-To: From: Axel Rasmussen Date: Fri, 19 May 2023 10:32:13 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl To: Peter Xu Cc: Jiaqi Yan , David Hildenbrand , James Houghton , Alexander Viro , Andrew Morton , Christian Brauner , Hongchen Zhang , Huang Ying , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Shuah Khan , ZhangPeng , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Anish Moorthy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 19, 2023 at 9:20=E2=80=AFAM Peter Xu wrote: > > Hi, Jiaqi, > > On Fri, May 19, 2023 at 08:04:09AM -0700, Jiaqi Yan wrote: > > I don't think CAP_ADMIN is something we can work around: a VMM must be > > a good citizen to avoid introducing any vulnerability to the host or > > guest. > > > > On the other hand, "Userfaults allow the implementation of on-demand > > paging from userland and more generally they allow userland to take > > control of various memory page faults, something otherwise only the > > kernel code could do." [3]. I am not familiar with the UFFD internals, > > but our use case seems to match what UFFD wants to provide: without > > affecting the whole world, give a specific userspace (without > > CAP_ADMIN) the ability to handle page faults (indirectly emulate a > > HWPOISON page (in my mind I treat it as SetHWPOISON(page) + > > TestHWPOISON(page) operation in kernel's PF code)). So is it fair to > > say what Axel provided here is "provide !ADMIN somehow"? > > > > [3]https://docs.kernel.org/admin-guide/mm/userfaultfd.html > > Userfault keywords on "user", IMHO. We don't strictly need userfault to > resolve anything regarding CAP_ADMIN problems. MADV_DONTNEED also dosn't > need CAP_ADMIN, same to any new madvise() if we want to make it useful fo= r > injecting poisoned ptes with !ADMIN and limit it within current->mm. > > But I think you're right that userfaultfd always tried to avoid having > ADMIN and keep everything within its own scope of permissions. > > So again, totally no objection on make it uffd specific for now if you gu= ys > are all happy with it, but just to be clear that it's (to me) mostly for > avoiding another WAKE, and afaics that's not really for solving the ADMIN > issue here. How about this plan: Since the concrete use case we have (postcopy live migration) is UFFD-specific, let's leave it as a UFFDIO_* operation for now. If in the future we come up with a non-UFFD use case, we can add a new MADV_* which does this operation at that point. From my perspective they could even share most of the same implementation code. I don't think it's a big problem keeping the UFFDIO_* version too at that point, because it still provides some (perhaps small) value: - Combines the operation + waking into one syscall - It allows us to support additional UFFD flags which modify / extend the operation in UFFD-specific ways, if we want to add those in the future Seem reasonable? If so, I'll send a v2 with documentation updates. > > Thanks, > > -- > Peter Xu >