Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2072708rwb; Fri, 2 Dec 2022 05:15:00 -0800 (PST) X-Google-Smtp-Source: AA0mqf4MLhc+Xx21kymXPnRo1Hy4qG+YaejdeTmVtdzz80nIVn6c9BtYrLW4mskybulLOQzUQzoE X-Received: by 2002:a17:907:a0ca:b0:7c0:b4bc:eed3 with SMTP id hw10-20020a170907a0ca00b007c0b4bceed3mr5618130ejc.735.1669986900639; Fri, 02 Dec 2022 05:15:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669986900; cv=none; d=google.com; s=arc-20160816; b=OFqWH8Yn0L2nvZYYSBf9Ev138SEY3A9ClNyKcmWMJje5Q14plkNRjqIK0LhNVJlvLa 20+geDl/liz2HoIAAlcObK4U5mrIIwvgvW48VJc7hu94pLN4SrGKnDQw9Irvp97oaB8g CODB6XnOoXHA+kcdUnQTMi9Jh3avJhAFCLQ+yW1VAhm77igGL6PZgvjrh0CewYsGlu5H 8Me/VXLTCUkLGnSNFLWw62alxroP0B9Z8aD+Piymik4mmkZblD23Hmr48jOa6etIEC4E qhWX3ks+Bwbp9X1Qpo9h+9hu+7PPPX1KhvH6qfXRi4qk1Zr7GxZ0uobZ3hJN1/1afWV7 5wbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=3IQT7HVPdE5UCXxrPvYWxf1yUdiJyVxk5WLvcPgmxZA=; b=t/Q4PDSVUkouQSIEOMwaQECycp1kFEQ26NnBhZkcvk33eH6hwvxGl/O4sxDwT+i/XL J4kr/5l5ff+IfE6wo8IcmlgtDcUknBcKhiQFkf6VHwqbnedFT8r+B0mTXInLZjQn3mrE 72SZ6RVrrecW5ps6ogPwEWOCUbkaS94LeXfKRT8THjaTmPP1qFXqxpG+AVdBXZCr621e Kp6CVbdn551qaPs8uY2wC8SU9tgUICh10QtVpaGKCl9RLSfUjhu0ekqTKYvuh60qc0EY vbX1RVSHeIkPsXNMzWZSW6ib4w1dk2bzLf8/09N7gldNN5JYcGp2jxSb39hmo5ZUOg9F b1Kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UhH2msk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w16-20020a05640234d000b0046b999e8862si6152739edc.20.2022.12.02.05.14.35; Fri, 02 Dec 2022 05:15:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UhH2msk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233183AbiLBM2y (ORCPT + 83 others); Fri, 2 Dec 2022 07:28:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232740AbiLBM2w (ORCPT ); Fri, 2 Dec 2022 07:28:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D248889AF4 for ; Fri, 2 Dec 2022 04:27:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669984078; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=3IQT7HVPdE5UCXxrPvYWxf1yUdiJyVxk5WLvcPgmxZA=; b=UhH2msk1WuW9hU3m2Tw9Xo1XOod9Oa5X7mpFacwj/KP+aQli78542FI0QhE+mvMUQJIHpf ttPHlxBCrdkORItTWwRdZCRNMQVN4t1Js68iKZN3xHjOvpHdXBYd53x9Q1wbW2voIT6MST LfzitArF8gqwTB4//UloxEwZmeI+t7g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-25-6wnuje_JMaatQLLlcIK-6g-1; Fri, 02 Dec 2022 07:27:55 -0500 X-MC-Unique: 6wnuje_JMaatQLLlcIK-6g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6C324185A78F; Fri, 2 Dec 2022 12:27:54 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.174]) by smtp.corp.redhat.com (Postfix) with ESMTP id 887AA63A57; Fri, 2 Dec 2022 12:27:51 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Ives van Hoorne , Peter Xu , stable@vger.kernel.org, Andrew Morton , Hugh Dickins , Alistair Popple , Mike Rapoport , Nadav Amit , Andrea Arcangeli Subject: [PATCH RFC] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA Date: Fri, 2 Dec 2022 13:27:48 +0100 Message-Id: <20221202122748.113774-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, we don't enable writenotify when enabling userfaultfd-wp on a shared writable mapping (for now we only support SHMEM). The consequence is that vma->vm_page_prot will still include write permissions, to be set as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting, page migration, ...). This is problematic for uffd-wp: we'd have to manually check for a uffd-wp PTE and manually write-protect that PTE, which is error prone and the logic is the wrong way around. Prone to such issues is any code that uses vma->vm_page_prot to set PTE permissions: primarily pte_modify() and mk_pte(), but there might be more (move_pte() looked suspicious at first but the protection parameter is essentially unused). Instead, let's enable writenotify -- just like we do for softdirty tracking -- such that PTEs will be mapped write-protected as default and we will only allow selected PTEs that are defintly safe to be mapped without write-protection (see can_change_pte_writable()) to be writable. This reverses the logic and implicitly fixes and prevents any such uffd-wp issues. Note that when enabling userfaultfd-wp, there is no need to walk page tables to enforce the new default protection for the PTEs: we know that they cannot be uffd-wp'ed yet, because that can only happen afterwards. For example, this fixes page migration and mprotect() to not map a uffd-wp'ed PTE writable. In theory, this should also fix when NUMA-hinting remaps pages in such (shmem) mappings -- if NUMA-hinting is applicable to shmem with uffd as well. Running the mprotect() reproducer [1] without this commit: $ ./uffd-wp-mprotect FAIL: uffd-wp did not fire Running the mprotect() reproducer with this commit: $ ./uffd-wp-mprotect PASS: uffd-wp fired [1] https://lore.kernel.org/all/222fc0b2-6ec0-98e7-833f-ea868b248446@redhat.com/T/#u Reported-by: Ives van Hoorne Debugged-by: Peter Xu Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Cc: stable@vger.kernel.org Cc: Andrew Morton Cc: Hugh Dickins Cc: Alistair Popple Cc: Mike Rapoport Cc: Nadav Amit Cc: Andrea Arcangeli Signed-off-by: David Hildenbrand --- Based on latest upstream. userfaultfd selftests seem to pass. --- fs/userfaultfd.c | 28 ++++++++++++++++++++++------ mm/mmap.c | 4 ++++ 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 98ac37e34e3d..fb0733f2e623 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) return ctx->features & UFFD_FEATURE_INITIALIZED; } +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, + vm_flags_t flags) +{ + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP); + + vma->vm_flags = flags; + /* + * For shared mappings, we want to enable writenotify while + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply + * recalculate vma->vm_page_prot whenever userfaultfd-wp is involved. + */ + if ((vma->vm_flags & VM_SHARED) && uffd_wp) + vma_set_page_prot(vma); +} + static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -618,7 +633,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, for_each_vma(vmi, vma) { if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~__VM_UFFD_FLAGS; + userfaultfd_set_vm_flags(vma, + vma->vm_flags & ~__VM_UFFD_FLAGS); } } mmap_write_unlock(mm); @@ -652,7 +668,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) octx = vma->vm_userfaultfd_ctx.ctx; if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~__VM_UFFD_FLAGS; + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); return 0; } @@ -733,7 +749,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, } else { /* Drop uffd context if remap feature not enabled */ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~__VM_UFFD_FLAGS; + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); } } @@ -895,7 +911,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) prev = vma; } - vma->vm_flags = new_flags; + userfaultfd_set_vm_flags(vma, new_flags); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } mmap_write_unlock(mm); @@ -1463,7 +1479,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * the next vma was merged into the current one and * the current one has not been updated yet. */ - vma->vm_flags = new_flags; + userfaultfd_set_vm_flags(vma, new_flags); vma->vm_userfaultfd_ctx.ctx = ctx; if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) @@ -1651,7 +1667,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, * the next vma was merged into the current one and * the current one has not been updated yet. */ - vma->vm_flags = new_flags; + userfaultfd_set_vm_flags(vma, new_flags); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; skip: diff --git a/mm/mmap.c b/mm/mmap.c index 74a84eb33b90..ce7526aa5d61 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1525,6 +1525,10 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma)) return 1; + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_wp(vma)) + return 1; + /* Specialty mapping? */ if (vm_flags & VM_PFNMAP) return 0; base-commit: a4412fdd49dc011bcc2c0d81ac4cab7457092650 -- 2.38.1