Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3669112rdh; Thu, 28 Sep 2023 21:07:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH97RHJDa1YIFDHUgK3Og2Sp2JYQUI33H54ttnJmKeIshJanrAH2sFIXLORPWHBhf76RHwH X-Received: by 2002:aca:1b0b:0:b0:3a7:4987:d44 with SMTP id b11-20020aca1b0b000000b003a749870d44mr2711581oib.20.1695960446346; Thu, 28 Sep 2023 21:07:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695960446; cv=none; d=google.com; s=arc-20160816; b=LVkmjZ+FW1MB3XellE7hlCGqhjoLmni2OzeamiMORzBr6WHMfqhz7GFq2G2QCxAThp STvm/HY60N20C0QVALAtl/+g1ZtiaWGrZpz4E9UrhS2jw6tCMtKPAwHb8aL+L9pm7CjB YeU0kDTUMpWnXxdEKmH3yAFNkGLkJvEcIyXGojPxvoze9V8AHn6rhNkvb30DLQiQY9fa MyHU58ALKnCGCe/CiyJk4z+j3g44G/Ozi9eslo57jh1Ptl4I1aO6sRu6pj2WGQon8CQF nGWYdswIIPjtVgZZhix+A4M4VUMd2fNh1Hqk9KhURBZd4VT/x33tzU10uQDGaEFP1i5C DDKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; fh=xyuTRsZX7wtF+u/4ePLa0wZ8rmw92T/3PM8rEwwu9/U=; b=uSFlRhs78Ti5FZPjhwOkv3Afl2oOcgzYynOh2jFpoA1ExQpzOqHNygp3zGTUNnpxmF OCAnv8xHJ7ZV4HsVtbDkzB+Nsr8GZUNauhC+86AybQOV+zexeDdUQ+aDP/K1vCvB8ZBw WADXAz/iJVmM+Sw6OxdWTWo+8F8tPO5nfxQlLw2MMdQrE2OcM8xVq5BlDfBd1RNI/8yV iEklmDiaMd5O8LqH8VndMSiMi72v4TzkzjNKaK3hrCW7+IAMw1XYNiF9j1QXbZlBtjlY IJgK3v63/XeMjx4QDP0otqPFU6x6Oowxnm9DcRgXjeh6vRs63DuuxVaaj0EW7ZS+sK4o 2otw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="oXkJ+/kO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id z11-20020a6552cb000000b005824bad8f85si5086535pgp.854.2023.09.28.21.07.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 21:07:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="oXkJ+/kO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 54CFB8074794; Thu, 28 Sep 2023 13:11:59 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231806AbjI1ULa (ORCPT + 99 others); Thu, 28 Sep 2023 16:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231535AbjI1UL3 (ORCPT ); Thu, 28 Sep 2023 16:11:29 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71A2E1AB for ; Thu, 28 Sep 2023 13:11:27 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-d84f18e908aso14646149276.1 for ; Thu, 28 Sep 2023 13:11:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695931886; x=1696536686; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; b=oXkJ+/kOisvvFWmZn+U3UvSMyyMNBILn6rB3C0WyZ183u3DX9oVJAyRPMe8J8pylDo XPoju93xXtJXkMo3MS2e1pnlMvlid46PDOBAgHYVPQtQytwCnD1QXAPu9lxxo+MbdzwP 3YEKHtQqyhCoi+6Rhu/ubgVSNJIt06kFzIYNH6jIjQj8frJk7ZE24+q/qKRV/Ke2i6WA hlQPJXr28RGMrGrReUyM2PdbDwmoRGql0bKbSnl/WadalGc0HvaiSlfPjH2xwjtFdxQB ejOusjf2d/FKTGT4d+n26jCb+b297wgb2UuJxcqnpGMzsL/9264r8u3rDzFhxrHxkg+X BKTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695931886; x=1696536686; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; b=N3LOOzkOXp/naXpLyvQ7hoTEmuwnFfJRqXZpF3xOjrjKyvcXeL0zxeQzve5SoeaJ4e Xq3ZES16DdD2YqE+RhU3PAZ9sr8ifMmeBRNkuAzfHyNTPGoDncQOaRDSYVRVE8GeI1eW EFnc8h/UaFqdXI3P0XJ8PxEwC7gR3SbEmLJqU+WfH+oHVXN4MFWiSGqXy8V9CQmAOqdn Ct/c6i2yA/EEQHRXVQOXY92vOITAT+491QKyXTEkXDjPgP+itN1nJxoFA+1vuRVCQsiO Pvu5L991tVtxMx1BfQxIQA/FS0yyyTR5iQln88fprbqxgLZMGMYcyyLF85se689VAF04 2+vA== X-Gm-Message-State: AOJu0YxkJyeR56D2fZOIHO8leMPalPeI6KFfB0Qjs3toLZMlZKjuN2Pl UqQy63PCMUuFpcJYAVlxfIzYiCiygB7Gjrh6wLY+3g== X-Received: by 2002:a25:a144:0:b0:d78:1502:9330 with SMTP id z62-20020a25a144000000b00d7815029330mr2066119ybh.7.1695931886285; Thu, 28 Sep 2023 13:11:26 -0700 (PDT) MIME-Version: 1.0 References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 28 Sep 2023 13:11:12 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 28 Sep 2023 13:11:59 -0700 (PDT) On Thu, Sep 28, 2023 at 11:32=E2=80=AFAM Suren Baghdasaryan wrote: > > On Thu, Sep 28, 2023 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > > > On 27.09.23 20:25, Suren Baghdasaryan wrote: > > >> > > >> I have some cleanups pending for page_move_anon_rmap(), that moves t= he > > >> SetPageAnonExclusive hunk out. Here we should be using > > >> page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cl= eanups] > > >> > > >> I'll send them out soonish. > > > > > > Should I keep this as is in my next version until you post the > > > cleanups? I can add a TODO comment to convert it to > > > folio_move_anon_rmap() once it's ready. > > > > You should just be able to use page_move_anon_rmap() and whatever gets > > in first cleans it up :) > > Ack. > > > > > > > > >> > > >>>> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, > > >>>> + dst_addr)); = >> + > > >>>> + orig_src_pte =3D ptep_clear_flush(src_vma, src_addr, src_p= te); > > >>>> + orig_dst_pte =3D mk_pte(&src_folio->page, dst_vma->vm_page= _prot); > > >>>> + orig_dst_pte =3D maybe_mkwrite(pte_mkdirty(orig_dst_pte), > > >>>> + dst_vma); > > >>> > > >>> I think there's still a theoretical issue here that you could fix b= y > > >>> checking for the AnonExclusive flag, similar to the huge page case. > > >>> > > >>> Consider the following scenario: > > >>> > > >>> 1. process P1 does a write fault in a private anonymous VMA, creati= ng > > >>> and mapping a new anonymous page A1 > > >>> 2. process P1 forks and creates two children P2 and P3. afterwards,= A1 > > >>> is mapped in P1, P2 and P3 as a COW page, with mapcount 3. > > >>> 3. process P1 removes its mapping of A1, dropping its mapcount to 2= . > > >>> 4. process P2 uses vmsplice() to grab a reference to A1 with get_us= er_pages() > > >>> 5. process P2 removes its mapping of A1, dropping its mapcount to 1= . > > >>> > > >>> If at this point P3 does a write fault on its mapping of A1, it wil= l > > >>> still trigger copy-on-write thanks to the AnonExclusive mechanism; = and > > >>> this is necessary to avoid P3 mapping A1 as writable and writing da= ta > > >>> into it that will become visible to P2, if P2 and P3 are in differe= nt > > >>> security contexts. > > >>> > > >>> But if P3 instead moves its mapping of A1 to another address with > > >>> remap_anon_pte() which only does a page mapcount check, the > > >>> maybe_mkwrite() will directly make the mapping writable, circumvent= ing > > >>> the AnonExclusive mechanism. > > >>> > > >> > > >> Yes, can_change_pte_writable() contains the exact logic when we can = turn > > >> something easily writable even if it wasn't writable before. which > > >> includes that PageAnonExclusive is set. (but with uffd-wp or softdir= ty > > >> tracking, there is more to consider) > > > > > > For uffd_remap can_change_pte_writable() would fail it VM_WRITE is no= t > > > set, but we want remapping to work for RO memory as well. Are you > > > > In a VMA without VM_WRITE you certainly wouldn't want to make PTEs > > writable :) That's why that function just does a sanity check that it i= s > > not called in strange context. So one would only call it if VM_WRITE is= set. > > > > > saying that a PageAnonExclusive() check alone would not be enough > > > here? > > > > There are some interesting questions to ask here: > > > > 1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not= ? > > You most probably have to mark the PTE softdirty and not make it writab= le. > > > > 2) VM_UFFD_WP requires similar care I assume? Peter might know. > > Let me look closer into these cases. > I'll also double-check if we need to support uffd_remap for R/O vmas. > I assumed we do but I actually never checked. Ok, I confirmed that we don't need remapping or R/O areas. So, I can use can_change_pte_writable() and keep things simple. Does that sound good? > Thanks! > > > > > -- > > Cheers, > > > > David / dhildenb > >