Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp227240rwr; Tue, 2 May 2023 19:32:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6dKbUl14FLntcgUHL/UCt9sy6Q0FVXPfa2gkUmf7NvEfHW39bimnlTdnjGmhDobok5j12T X-Received: by 2002:a17:90a:de93:b0:23f:9445:318e with SMTP id n19-20020a17090ade9300b0023f9445318emr825663pjv.3.1683081124544; Tue, 02 May 2023 19:32:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683081124; cv=none; d=google.com; s=arc-20160816; b=aiwYEAGW8gEQsWvzB/tNor4MwoRcdkK4JKv8gBmfO6d/9GpoLLB3IWmivQHkeIjmLf sIJswoG1jwe1m7nN2Woi0suSMo9kvwNEatLbmYQOtz8tSHe5IJMnkUzggihnosrgdPKB C9zzFBg7wsskF/gsRk5/uYi7O6XeYo4FzBlbx/X+b3jE3qaWDlNVsob93dNHjAVTEbKe utcKZqBG4Ua5AugQ98hk5g0EHkkd+eP8gHdQXnq6PY6FdhUlJvNQhQFube7dJmZNhAd6 2F0VUXdLi0uThnLCRmCgCPRP/XJl8BYPxzqTs71LJ+0eKdnMznuJwPZ1VtWoI3KlnHNJ mtpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=yT+CCwicc5yc9XGBDJ0u7flA63p1r4q2P/DM0gkYzQU=; b=zWf+7TkhGWFWcUBvnw3lbRDSYyErX28jEkxp8hw0CFoWRiUr61uV3YKZ/ysqbXShQq 5jen5CBeTQFGT9HLKi4q+EWirpadhqjjcQlLEEhBgPN0PHBnMzv6xMYR5/v5s8hH2pQ3 znNPhDSiCN4ko1+Ol8cH3YixrneYr9WhWDS6Pwuhx3CT8tiofdO+Mdr5P2hBEyOSaYuk hc4DDS9d7ufdpWxLWB9Zgk3KRfrlAP/0x4JlMNqYZWGHcxMfbH+lHbCtjq0aJ+35mZco 0cZ+n/CRStTfthL9jn5ZpVG3iYPg1X+GidyJbMvpG2lr5KJJ7IKiBWT4B4EtZ41S9wAC HIbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=f39fPxBD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g4-20020a17090a9b8400b0022915b6dd7asi13004948pjp.145.2023.05.02.19.31.48; Tue, 02 May 2023 19:32:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=f39fPxBD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229637AbjECCSg (ORCPT + 99 others); Tue, 2 May 2023 22:18:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbjECCSe (ORCPT ); Tue, 2 May 2023 22:18:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 041393A9B; Tue, 2 May 2023 19:18:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 94D556222B; Wed, 3 May 2023 02:18:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 53C81C433D2; Wed, 3 May 2023 02:18:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1683080304; bh=Xmt51HxGX9w7I26oXyzXEA9RUwzh1L+7hsYlH4OpXLE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=f39fPxBDGnTUgpaTxjZQeKkZ/zVq1v4h9KI/nLBuGHAWPh0cfBRqLnxPBrGGeq5Xx MuO0qydl1QN6zffGi617yVLvREx5SmNpZLWAiZcik9b+NZNIlX/3zDhzOPcC773irR iOis4Hz6xH16ikl/K7F7qUn7Hj3rSznIi4h8HDeE= Date: Tue, 2 May 2023 19:18:21 -0700 From: Andrew Morton To: Lorenzo Stoakes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , David Hildenbrand , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger Subject: Re: [PATCH v8 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Message-Id: <20230502191821.71c86a2c25f19fe342aa72db@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2 May 2023 23:51:35 +0100 Lorenzo Stoakes wrote: > Writing to file-backed dirty-tracked mappings via GUP is inherently broken > as we cannot rule out folios being cleaned and then a GUP user writing to > them again and possibly marking them dirty unexpectedly. > > This is especially egregious for long-term mappings (as indicated by the > use of the FOLL_LONGTERM flag), so we disallow this case in GUP-fast as > we have already done in the slow path. > > We have access to less information in the fast path as we cannot examine > the VMA containing the mapping, however we can determine whether the folio > is anonymous or belonging to a whitelisted filesystem - specifically > hugetlb and shmem mappings. > > We take special care to ensure that both the folio and mapping are safe to > access when performing these checks and document folio_fast_pin_allowed() > accordingly. > > It's important to note that there are no APIs allowing users to specify > FOLL_FAST_ONLY for a PUP-fast let alone with FOLL_LONGTERM, so we can > always rely on the fact that if we fail to pin on the fast path, the code > will fall back to the slow path which can perform the more thorough check. arm allnoconfig said mm/gup.c:115:13: warning: 'folio_fast_pin_allowed' defined but not used [-Wunused-function] 115 | static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) | ^~~~~~~~~~~~~~~~~~~~~~ so I moved the definition inside CONFIG_ARCH_HAS_PTE_SPECIAL. mm/gup.c | 154 ++++++++++++++++++++++++++--------------------------- 1 file changed, 77 insertions(+), 77 deletions(-) --- a/mm/gup.c~mm-gup-disallow-foll_longterm-gup-fast-writing-to-file-backed-mappings-fix +++ a/mm/gup.c @@ -96,83 +96,6 @@ retry: return folio; } -/* - * Used in the GUP-fast path to determine whether a pin is permitted for a - * specific folio. - * - * This call assumes the caller has pinned the folio, that the lowest page table - * level still points to this folio, and that interrupts have been disabled. - * - * Writing to pinned file-backed dirty tracked folios is inherently problematic - * (see comment describing the writable_file_mapping_allowed() function). We - * therefore try to avoid the most egregious case of a long-term mapping doing - * so. - * - * This function cannot be as thorough as that one as the VMA is not available - * in the fast path, so instead we whitelist known good cases and if in doubt, - * fall back to the slow path. - */ -static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) -{ - struct address_space *mapping; - unsigned long mapping_flags; - - /* - * If we aren't pinning then no problematic write can occur. A long term - * pin is the most egregious case so this is the one we disallow. - */ - if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) != - (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) - return true; - - /* The folio is pinned, so we can safely access folio fields. */ - - /* Neither of these should be possible, but check to be sure. */ - if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio))) - return false; - - /* hugetlb mappings do not require dirty-tracking. */ - if (folio_test_hugetlb(folio)) - return true; - - /* - * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods - * cannot proceed, which means no actions performed under RCU can - * proceed either. - * - * inodes and thus their mappings are freed under RCU, which means the - * mapping cannot be freed beneath us and thus we can safely dereference - * it. - */ - lockdep_assert_irqs_disabled(); - - /* - * However, there may be operations which _alter_ the mapping, so ensure - * we read it once and only once. - */ - mapping = READ_ONCE(folio->mapping); - - /* - * The mapping may have been truncated, in any case we cannot determine - * if this mapping is safe - fall back to slow path to determine how to - * proceed. - */ - if (!mapping) - return false; - - /* Anonymous folios are fine, other non-file backed cases are not. */ - mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS; - if (mapping_flags) - return mapping_flags == PAGE_MAPPING_ANON; - - /* - * At this point, we know the mapping is non-null and points to an - * address_space object. The only remaining whitelisted file system is - * shmem. - */ - return shmem_mapping(mapping); -} - /** * try_grab_folio() - Attempt to get or pin a folio. * @page: pointer to page to be grabbed @@ -2474,6 +2397,83 @@ static void __maybe_unused undo_dev_page #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL /* + * Used in the GUP-fast path to determine whether a pin is permitted for a + * specific folio. + * + * This call assumes the caller has pinned the folio, that the lowest page table + * level still points to this folio, and that interrupts have been disabled. + * + * Writing to pinned file-backed dirty tracked folios is inherently problematic + * (see comment describing the writable_file_mapping_allowed() function). We + * therefore try to avoid the most egregious case of a long-term mapping doing + * so. + * + * This function cannot be as thorough as that one as the VMA is not available + * in the fast path, so instead we whitelist known good cases and if in doubt, + * fall back to the slow path. + */ +static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) +{ + struct address_space *mapping; + unsigned long mapping_flags; + + /* + * If we aren't pinning then no problematic write can occur. A long term + * pin is the most egregious case so this is the one we disallow. + */ + if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) != + (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) + return true; + + /* The folio is pinned, so we can safely access folio fields. */ + + /* Neither of these should be possible, but check to be sure. */ + if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio))) + return false; + + /* hugetlb mappings do not require dirty-tracking. */ + if (folio_test_hugetlb(folio)) + return true; + + /* + * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods + * cannot proceed, which means no actions performed under RCU can + * proceed either. + * + * inodes and thus their mappings are freed under RCU, which means the + * mapping cannot be freed beneath us and thus we can safely dereference + * it. + */ + lockdep_assert_irqs_disabled(); + + /* + * However, there may be operations which _alter_ the mapping, so ensure + * we read it once and only once. + */ + mapping = READ_ONCE(folio->mapping); + + /* + * The mapping may have been truncated, in any case we cannot determine + * if this mapping is safe - fall back to slow path to determine how to + * proceed. + */ + if (!mapping) + return false; + + /* Anonymous folios are fine, other non-file backed cases are not. */ + mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS; + if (mapping_flags) + return mapping_flags == PAGE_MAPPING_ANON; + + /* + * At this point, we know the mapping is non-null and points to an + * address_space object. The only remaining whitelisted file system is + * shmem. + */ + return shmem_mapping(mapping); +} + +/* * Fast-gup relies on pte change detection to avoid concurrent pgtable * operations. * _