Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp4102784rdh; Fri, 29 Sep 2023 11:06:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZrr1K/vimbMdo7MVaYIioqFxnHfTKVuLfq3PsHihsgqZ0Ew5fs1pc5wphou1f1cM0Q0Fp X-Received: by 2002:a05:6a20:7349:b0:14d:e615:277c with SMTP id v9-20020a056a20734900b0014de615277cmr7442570pzc.11.1696010799259; Fri, 29 Sep 2023 11:06:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696010799; cv=none; d=google.com; s=arc-20160816; b=NBrlIjwDu7m3/4I7rT8qdRA3KGHTvRJ9CxxwPApJxEsNSt7WpqL1qKDyjo8hOKL9xm 7hDbatILXU9Q1bNIYY+Y3Bxc4Fi0dObGiO4CdWfyWZcr5v1AKsLWwaEy0hB/7PUdvxu0 FtNrk9DrXVaBRv6d6KZXBggHhn/pBC0vi6j7jffbLqPmxCeQ5pwf00Ab/3P/Fx8ThFIZ oKwLfdDTXa8a3yVVVHwtwXEA9f1Hxt2RqeS3hnZCAAvOic3OpXWeJGCjxQDYQhoK0XA+ 2mMYNCLAMIOwEjJ4GqG7EFLSKt7S4CmyOPSAQFwG89sRT38soHl3FumDRsJjv9UdtI3D GBSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; fh=xyuTRsZX7wtF+u/4ePLa0wZ8rmw92T/3PM8rEwwu9/U=; b=JVqMf8B14Q9fYxwfNlE6edLKWMFWPwo8gUOpQNXhEt1GdTqPGLXP4oYxFgtEeAmrIy Z1TTHTYzH0tD/SM6yHMiTnTcPa3Td5RcVGVGquAqtRmGXd8GxzzojI2/Pkd1fldSlCoV tToDwmtZhVOJljaLYv6XQiskHgHvt1hdh/XRy7dEU/uPw4QYjagvv66s7P+IwgJh5Pke Z00SUwCRPAatPzEemPY2yd36mbu/1P3+uwt3gvJdGlPb2pc43zss4DM6MpQ9o6wV8ldh vc7g2iSp4snxV/Qp6pY3LENqEBtgZbjVdbMPZSw3WEMtLHLGlXcPSpdQzawNpy8RVwSG oLOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=UkY4vzdX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id ca16-20020a056a00419000b0069342aef90bsi6434770pfb.3.2023.09.29.11.06.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 11:06:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=UkY4vzdX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B5367802578F; Thu, 28 Sep 2023 11:32:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232009AbjI1Scu (ORCPT + 99 others); Thu, 28 Sep 2023 14:32:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231584AbjI1Scs (ORCPT ); Thu, 28 Sep 2023 14:32:48 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C99C3180 for ; Thu, 28 Sep 2023 11:32:46 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-59e88a28b98so186909137b3.1 for ; Thu, 28 Sep 2023 11:32:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695925966; x=1696530766; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; b=UkY4vzdXozB9ZJFGSbjgebOwQ1LygJA0jGrBxYJxpsbLawnG9ne71gNbNlc0kAYE0N skHunvvJ1PUXj0tWU4gZGh7OcQeaCHf65o+WsJ/FMd3QL8bEEskMTAYflPqIsQ7l1wH9 96qL5qyltLTNKKZjyU9B+Q/Y1m13xBXzG/Z006Ms2PBCNC5OijELFGZdAXnR+EbALott gvZBoWbSIwJnIctUav9/U4++Jp4a4KElBSu0jZdoCOxXr1a7aiwtx+f+aaOZSANJG18g oiNQTYx+8q4y+OpYbF4C6y5YIvgwUa4rEU3LjqdgEKpMR8wvkStaEU9lJ0YaQb6hpA6z coJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695925966; x=1696530766; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; b=BSWEW6LF2Z6U6SNYBZ3agv7JLHXf4+lXLH1aQtAbugC8KWSOlQ4LZVzvaTBDSo8eId zKXllgIeXUpf5ukxtWV5lMzwMX2sglB+v0WDObV+kJIl/LwqBDr1VoderHrmMPaCjNdd ttd4Yf69cibfT+RbwZWZS7CDn2FvxNGOBxhhAh2+2DAnkJ6C6kVMfrSH9td6lWXowLu2 8pFylMRcYeC1r3KplIYm0oG2oHchcydSxxT/K7vxKWx6TZfBrrJWVlrU7uctg5oEViDj mIvOnbRYoE+wpTo5KIYJwoswYXvK3ffO/4J8HYSB96YTA8gpHx7Eam38N0HvX5ymhpFz IduA== X-Gm-Message-State: AOJu0YxsX+Ze/VSyBqNbkBWRqrqEjx+7jSfneLDjfil9m3oUiYZLSdep 3xwPukcUAjA+zdjmnP6dCVZKynfM+nhwuBRanSfHxg== X-Received: by 2002:a0d:d84c:0:b0:56c:e480:2b2b with SMTP id a73-20020a0dd84c000000b0056ce4802b2bmr1604583ywe.12.1695925965774; Thu, 28 Sep 2023 11:32:45 -0700 (PDT) MIME-Version: 1.0 References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> In-Reply-To: <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> From: Suren Baghdasaryan Date: Thu, 28 Sep 2023 11:32:34 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 28 Sep 2023 11:32:54 -0700 (PDT) On Thu, Sep 28, 2023 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > On 27.09.23 20:25, Suren Baghdasaryan wrote: > >> > >> I have some cleanups pending for page_move_anon_rmap(), that moves the > >> SetPageAnonExclusive hunk out. Here we should be using > >> page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my clea= nups] > >> > >> I'll send them out soonish. > > > > Should I keep this as is in my next version until you post the > > cleanups? I can add a TODO comment to convert it to > > folio_move_anon_rmap() once it's ready. > > You should just be able to use page_move_anon_rmap() and whatever gets > in first cleans it up :) Ack. > > > > >> > >>>> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, > >>>> + dst_addr)); >>= + > >>>> + orig_src_pte =3D ptep_clear_flush(src_vma, src_addr, src_pte= ); > >>>> + orig_dst_pte =3D mk_pte(&src_folio->page, dst_vma->vm_page_p= rot); > >>>> + orig_dst_pte =3D maybe_mkwrite(pte_mkdirty(orig_dst_pte), > >>>> + dst_vma); > >>> > >>> I think there's still a theoretical issue here that you could fix by > >>> checking for the AnonExclusive flag, similar to the huge page case. > >>> > >>> Consider the following scenario: > >>> > >>> 1. process P1 does a write fault in a private anonymous VMA, creating > >>> and mapping a new anonymous page A1 > >>> 2. process P1 forks and creates two children P2 and P3. afterwards, A= 1 > >>> is mapped in P1, P2 and P3 as a COW page, with mapcount 3. > >>> 3. process P1 removes its mapping of A1, dropping its mapcount to 2. > >>> 4. process P2 uses vmsplice() to grab a reference to A1 with get_user= _pages() > >>> 5. process P2 removes its mapping of A1, dropping its mapcount to 1. > >>> > >>> If at this point P3 does a write fault on its mapping of A1, it will > >>> still trigger copy-on-write thanks to the AnonExclusive mechanism; an= d > >>> this is necessary to avoid P3 mapping A1 as writable and writing data > >>> into it that will become visible to P2, if P2 and P3 are in different > >>> security contexts. > >>> > >>> But if P3 instead moves its mapping of A1 to another address with > >>> remap_anon_pte() which only does a page mapcount check, the > >>> maybe_mkwrite() will directly make the mapping writable, circumventin= g > >>> the AnonExclusive mechanism. > >>> > >> > >> Yes, can_change_pte_writable() contains the exact logic when we can tu= rn > >> something easily writable even if it wasn't writable before. which > >> includes that PageAnonExclusive is set. (but with uffd-wp or softdirty > >> tracking, there is more to consider) > > > > For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not > > set, but we want remapping to work for RO memory as well. Are you > > In a VMA without VM_WRITE you certainly wouldn't want to make PTEs > writable :) That's why that function just does a sanity check that it is > not called in strange context. So one would only call it if VM_WRITE is s= et. > > > saying that a PageAnonExclusive() check alone would not be enough > > here? > > There are some interesting questions to ask here: > > 1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not? > You most probably have to mark the PTE softdirty and not make it writable= . > > 2) VM_UFFD_WP requires similar care I assume? Peter might know. Let me look closer into these cases. I'll also double-check if we need to support uffd_remap for R/O vmas. I assumed we do but I actually never checked. Thanks! > > -- > Cheers, > > David / dhildenb >