Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3399145rdh; Thu, 28 Sep 2023 10:24:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IET3dmKgZHK1SlRQL4DpyieBU5EONcXr+0SwgKWMx46yrOC2zr071EjtJ3n4M10STt3JmZ/ X-Received: by 2002:a17:90b:1e0d:b0:263:41d2:4e2 with SMTP id pg13-20020a17090b1e0d00b0026341d204e2mr1931353pjb.32.1695921840971; Thu, 28 Sep 2023 10:24:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695921840; cv=none; d=google.com; s=arc-20160816; b=Eq2RjXAziXMnOgBrfgxQyp/5m2pC0PMh5zShdSazk7muHiDeEeRVCVFcfj5iv9ovlK ajt6ysOMDbgJNXZ5MIX3i0zIKiX972QOiweEHnpJcT9mIoE83F9gDC6oJdcZqQqwnzhn Un7pZiqHWHIHXLzFGtHU+IhNNGNx7I4L3Uc6929AiuGBiSRinYAuREdeHbw7kYLm4V0i faonTpaoE4Lc+nfdMrURfaF0cfdiRADJK7eVTQeSrktT6KmcemOAAO8sR6XXku0bwfpo 2dPVyfrzJkavO2R3krbulcIVgOk86OBHizGe+lSen9PQtsOLXuwFvr7I2je4dcgUWWKM V/Qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=iB4+ZJehBJyV1CWWY+MqwDYHFJ2vxRDvHuETyOblfLU=; fh=///Nw36QGym7LHLt/D+tllQc42wpi+lnnDr16pLqzww=; b=V2Aoz83SxQj0Lv7gBRlZr1L43hjT0cakme/7h6fJOfPYPDtQ50iJ67cyug8yO79inb GHLmrafWCTaQupVLEWtnOyQhZMDJkB5/IEm3l+11QmeLQzcHHNxEvhar5GyMYGKeN8jk E5Utkqj92qdopVEkjOqPwQ4BysYzFspADc+2Bd8FLwNt4VboszGb70xQG8rAy0fY0LyB hfAXxwptQvhPmwgtE8GU5RcgIyhYkPvs1NotXtyaQXfyMZt9EsvjRW+KAcwdEoS07anY OHs3nr1c5CP+e4OaaDW4LunqspTKUtL5agkQ35d6INxSMik5kmNW2Wuc3QPfdgqU22Az ERQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XnWLuWkB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id oo12-20020a17090b1c8c00b0027769e8672fsi11447932pjb.119.2023.09.28.10.24.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 10:24:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XnWLuWkB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id D771484049CD; Thu, 28 Sep 2023 10:16:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231651AbjI1RQD (ORCPT + 99 others); Thu, 28 Sep 2023 13:16:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231389AbjI1RQC (ORCPT ); Thu, 28 Sep 2023 13:16:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F01019E for ; Thu, 28 Sep 2023 10:15:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695921318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iB4+ZJehBJyV1CWWY+MqwDYHFJ2vxRDvHuETyOblfLU=; b=XnWLuWkBydHiIPJw4ieqIvLC1MYeopHr9yTY0jSOA0J+HgK2+pwePga7TrDusllb0Ky/kK vh0fQjoTcUPecPdF5xLxdp+yN5pZnYQGIS+0EhTgbEm1b4iazzPg0naAm905dCYJ4fAzMp uA3oM3L2UhwGgVt98wVvRIfzTD7N+KA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-47-4uPOCmYsMpyhFoH_G5Ax3g-1; Thu, 28 Sep 2023 13:15:17 -0400 X-MC-Unique: 4uPOCmYsMpyhFoH_G5Ax3g-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-30e3ee8a42eso9830126f8f.1 for ; Thu, 28 Sep 2023 10:15:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695921316; x=1696526116; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iB4+ZJehBJyV1CWWY+MqwDYHFJ2vxRDvHuETyOblfLU=; b=lUKfp3vyivl9Sd3GuTXUXRDZ/3ssVlVoiUAGR012LLv4JCUjN8N6erZBugMfM8pbS2 AXAObSrSdnyWqkVy8lc45hmKybFkgh1B0iCFHHiFTO7M/fCszLpplHxDvL4fSdvXiG2X F0KcpUOmC2sr3W75Fm4KqLVQZfD2QxjPsA7ma5VpJT5VZYnCDYLN1IG3ZzMId1jN4M8H tC2igYby/elt7SswyJxZKbDk9HaIjLGYwr9g/z1IssT6ZtI64BGw9TsPczlDxRYg/pwh 3SUt/9eEe5dfqedzxbwkZswJ9NFilwX4rVD5E1sWuwqHhwFgRD31WXKtiZxAcgyqsz4x FDTg== X-Gm-Message-State: AOJu0YyqNtyPr2DRzW7ck3vLXpjpv2/iFNCGuaNUz/PK6OjN416zvJFs roT5/fqx56fgB2ZgrZfbP7wMU00qtWGsV+HU9JeCkEHa1Pbv0lBMSKIldO6EMBSaah2Pu2KTgCR a3C/kpqLclBeTJuV92N6pyDqh X-Received: by 2002:adf:fac9:0:b0:319:785a:fce0 with SMTP id a9-20020adffac9000000b00319785afce0mr1778239wrs.26.1695921315864; Thu, 28 Sep 2023 10:15:15 -0700 (PDT) X-Received: by 2002:adf:fac9:0:b0:319:785a:fce0 with SMTP id a9-20020adffac9000000b00319785afce0mr1778223wrs.26.1695921315344; Thu, 28 Sep 2023 10:15:15 -0700 (PDT) Received: from ?IPV6:2003:cb:c718:f00:b37d:4253:cd0d:d213? (p200300cbc7180f00b37d4253cd0dd213.dip0.t-ipconnect.de. [2003:cb:c718:f00:b37d:4253:cd0d:d213]) by smtp.gmail.com with ESMTPSA id q16-20020adf9dd0000000b0031912c0ffebsm7770278wre.23.2023.09.28.10.15.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 28 Sep 2023 10:15:14 -0700 (PDT) Message-ID: <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> Date: Thu, 28 Sep 2023 19:15:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Content-Language: en-US To: Suren Baghdasaryan Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 28 Sep 2023 10:16:21 -0700 (PDT) On 27.09.23 20:25, Suren Baghdasaryan wrote: >> >> I have some cleanups pending for page_move_anon_rmap(), that moves the >> SetPageAnonExclusive hunk out. Here we should be using >> page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cleanups] >> >> I'll send them out soonish. > > Should I keep this as is in my next version until you post the > cleanups? I can add a TODO comment to convert it to > folio_move_anon_rmap() once it's ready. You should just be able to use page_move_anon_rmap() and whatever gets in first cleans it up :) > >> >>>> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, >>>> + dst_addr)); >> + >>>> + orig_src_pte = ptep_clear_flush(src_vma, src_addr, src_pte); >>>> + orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot); >>>> + orig_dst_pte = maybe_mkwrite(pte_mkdirty(orig_dst_pte), >>>> + dst_vma); >>> >>> I think there's still a theoretical issue here that you could fix by >>> checking for the AnonExclusive flag, similar to the huge page case. >>> >>> Consider the following scenario: >>> >>> 1. process P1 does a write fault in a private anonymous VMA, creating >>> and mapping a new anonymous page A1 >>> 2. process P1 forks and creates two children P2 and P3. afterwards, A1 >>> is mapped in P1, P2 and P3 as a COW page, with mapcount 3. >>> 3. process P1 removes its mapping of A1, dropping its mapcount to 2. >>> 4. process P2 uses vmsplice() to grab a reference to A1 with get_user_pages() >>> 5. process P2 removes its mapping of A1, dropping its mapcount to 1. >>> >>> If at this point P3 does a write fault on its mapping of A1, it will >>> still trigger copy-on-write thanks to the AnonExclusive mechanism; and >>> this is necessary to avoid P3 mapping A1 as writable and writing data >>> into it that will become visible to P2, if P2 and P3 are in different >>> security contexts. >>> >>> But if P3 instead moves its mapping of A1 to another address with >>> remap_anon_pte() which only does a page mapcount check, the >>> maybe_mkwrite() will directly make the mapping writable, circumventing >>> the AnonExclusive mechanism. >>> >> >> Yes, can_change_pte_writable() contains the exact logic when we can turn >> something easily writable even if it wasn't writable before. which >> includes that PageAnonExclusive is set. (but with uffd-wp or softdirty >> tracking, there is more to consider) > > For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not > set, but we want remapping to work for RO memory as well. Are you In a VMA without VM_WRITE you certainly wouldn't want to make PTEs writable :) That's why that function just does a sanity check that it is not called in strange context. So one would only call it if VM_WRITE is set. > saying that a PageAnonExclusive() check alone would not be enough > here? There are some interesting questions to ask here: 1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not? You most probably have to mark the PTE softdirty and not make it writable. 2) VM_UFFD_WP requires similar care I assume? Peter might know. -- Cheers, David / dhildenb