Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp1523848rwr; Fri, 5 May 2023 15:53:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4brgOiM/Axg6PNR79V6MK3Ax+atsn208+WVHcgZzWNS1m9lKfpxej4V+/uI+uslCWXAxtk X-Received: by 2002:a05:6a00:198a:b0:641:39cb:1716 with SMTP id d10-20020a056a00198a00b0064139cb1716mr4101917pfl.20.1683327216978; Fri, 05 May 2023 15:53:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683327216; cv=none; d=google.com; s=arc-20160816; b=CEuXDuvRV4hbg4lUH7auvXOjQ8QU8n8NgD1/kNyXfiLx/GBXjYiKTB7/UvXRkvx4UZ GxwgEWC1BNWg63MdXBUYqELohI5k/JP24BBIV7wL+vKPsqZsS/nH0fa2+mcq9FDCXpHr Q7y9tDhpVX9aQI87za0Yd9i13aNwT8oIU5JcGmW3GJ2j3/E7c5xWsF3rCuQ9IwbE0DeA i525K2nejaCC7QvbPqU3bWCRMcEJBRz10uKdvBImbdnaGm7WqXvNQflPKScsj30dQX1X sAWwQBXDsgkwYh/m0UbeV0NxK1TnvhIBB98ucrdh4Q861bO7Y1zCOtoz1oasXhCCpthm bAqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=hTk40pEIa6sggozNPGL/VTk1LAwsz/xKnNcnmk0Uw40=; b=XjB5y+v7lqQ8LcFzhK/DRAlEwgYhbjFHrysoBzPFUQwcEzmzebhkfWR3orrmwfgYrm N/m6563UkIsGjAhIdY99lgd+8+IdyMj19fmmEkiSWTXjPAYabyxTEo5/wvzCPhhpmtm7 037ZH8c4hXAtFzdgzNcRn54cWlUfVsyk1dib3mcPwFt8EuKOAAQSnoMvBe0f98VPfOqT vToV/XLyu6kbNOkF1JGzhAqwWSSfMtNcl80S26XTfMDVsee3F7V+n4yqj/UL5aua5lES X08Jv4Jn1FIsa9G2umaIfOP6ovrml/n/y28qagT+cNekXr3R3FZe41vtG9cYyeWxmPJe LweA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=qUXPIJOE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h129-20020a625387000000b0063b652ff9cdsi2825350pfb.404.2023.05.05.15.53.22; Fri, 05 May 2023 15:53:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=qUXPIJOE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232137AbjEEWIN (ORCPT + 99 others); Fri, 5 May 2023 18:08:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231863AbjEEWIL (ORCPT ); Fri, 5 May 2023 18:08:11 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30A945599; Fri, 5 May 2023 15:08:01 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-3f4000ec6ecso23411115e9.0; Fri, 05 May 2023 15:08:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683324479; x=1685916479; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=hTk40pEIa6sggozNPGL/VTk1LAwsz/xKnNcnmk0Uw40=; b=qUXPIJOEcmVya4t4Z8BVExBOKsEpEyw/+X5BvfMyQWS4PlaTFXXK19eB3j72mNNlXx 5KwlgukleyM6txF/8ewbTiGaIVAL0aS5iPiL2DguJHX1EKvTqQeBFw9kNcGNauP4Rmnq N53ZSsL+dWBOp24Ek5DYYwdR+UEXfMDCZWNDAvOZAZHqaeE4rpT6pLpAIKnPiKqG8wEN /tUv8TLuf2pvomk+ViSv52atgQ6+iy8lhTHOt6WBz7zIPkomsTHnPv5yCkfCgX1fE956 foFDLn8x4GwP8/4sRIH7I2dBPzR4l0uV0HdkxC8YxRxom0JmqU7dC872WFhKm6lwVuYL umag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683324479; x=1685916479; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hTk40pEIa6sggozNPGL/VTk1LAwsz/xKnNcnmk0Uw40=; b=Wh2R/Eer+dPWFZP/yTT/NU3V4rnFcqVkn0P9OA/iuAV+8YfJY+sSIvZ70Yrefc3b6G G29ePhymbo1LMPSRdVtH6HkL4DMyLoStMP4ylG8xiF0GQKHKmamy2YEjTtCKmEYMBsW1 FqjwVRcadby1eP7qOK+qr+xmtcMH0Wr4MEpwo39962g/YVCRlgK0dsNRNyFuorLh83eR jqsl4nvf3Pg72N/O+/IMbm1tFdf6VdKHXFNDPgxsK4vXu4LTMQAvBnxa5ZRJ0Wwjrb8J O8atZGBL6n7Ar28uolEapDpLBLHh5T4f6Nl6a/waLXwW5SeLNYGxF6esdxjt55FfzK2J oVVg== X-Gm-Message-State: AC+VfDx7LSA4WMNcMeUNRAzNwBf0p9wz2yA1SFjEzJsEpyFfwMVMmnrg EgsdW0r4+SpwVP/FsIhiGpE= X-Received: by 2002:a05:600c:218f:b0:3f0:a0bb:58ef with SMTP id e15-20020a05600c218f00b003f0a0bb58efmr2184588wme.25.1683324479379; Fri, 05 May 2023 15:07:59 -0700 (PDT) Received: from localhost (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.gmail.com with ESMTPSA id a20-20020a1cf014000000b003f173c566b5sm9137978wmb.5.2023.05.05.15.07.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 May 2023 15:07:58 -0700 (PDT) Date: Fri, 5 May 2023 23:07:57 +0100 From: Lorenzo Stoakes To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger Subject: Re: [PATCH v8 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Message-ID: <52ac98ca-378e-452c-9dbf-93ea39bb5583@lucifer.local> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 05, 2023 at 04:17:38PM +0200, David Hildenbrand wrote: > > > And there is nothing wrong about pinning an anon page that's still in the > > > swapcache. The following folio_test_anon() check will allow them. > > > > > > The check made sense in page_mapping(), but here it's not required. > > > > Waaaaaaaaaait a second, you were saying before:- > > > > "Folios in the swap cache return the swap mapping" -- you might disallow > > pinning anonymous pages that are in the swap cache. > > > > I recall that there are corner cases where we can end up with an anon > > page that's mapped writable but still in the swap cache ... so you'd > > fallback to the GUP slow path (acceptable for these corner cases, I > > guess), however especially the comment is a bit misleading then. > > > > So are we allowing or disallowing pinning anon swap cache pages? :P > > If we have an exclusive anon page that's still in the swap cache, sure! :) > > I think there are ways that can be done, and nothing would actually break. > (I even wrote selftests in the cow selftests for that to amke sure it works > as expected) > > > > > I mean slow path would allow them if they are just marked anon so I'm inclined > > to allow them. > > Exactly my reasoning. > > The less checks the better (especially if ordinary GUP just allows for > pinning it) :) Yeah a lot of my decision making on this has been trying to be conservative about what we filter for but you get this weird inversion whereby if you're too conservative about what you allow, you are actually being more outlandish about what you disallow and vice-versa. > > > > > > > > > I do agree regarding folio_test_slab(), though. Should we WARN in case we > > > would have one? > > > > > > if (WARN_ON_ONCE(folio_test_slab(folio))) > > > return false; > > > > > > > God help us if we have a slab page at this point, so agreed worth doing, it > > would surely have to arise from some dreadful bug/memory corruption. > > > > Or some nasty race condition that we managed to ignore with rechecking if > the PTEs/PMDs changed :) Well that should be sorted now :) > > > > > + if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio))) > > > > + return false; > > > > + > > > > + /* hugetlb mappings do not require dirty-tracking. */ > > > > + if (folio_test_hugetlb(folio)) > > > > + return true; > > > > + > > > > + /* > > > > + * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods > > > > + * cannot proceed, which means no actions performed under RCU can > > > > + * proceed either. > > > > + * > > > > + * inodes and thus their mappings are freed under RCU, which means the > > > > + * mapping cannot be freed beneath us and thus we can safely dereference > > > > + * it. > > > > + */ > > > > + lockdep_assert_irqs_disabled(); > > > > + > > > > + /* > > > > + * However, there may be operations which _alter_ the mapping, so ensure > > > > + * we read it once and only once. > > > > + */ > > > > + mapping = READ_ONCE(folio->mapping); > > > > + > > > > + /* > > > > + * The mapping may have been truncated, in any case we cannot determine > > > > + * if this mapping is safe - fall back to slow path to determine how to > > > > + * proceed. > > > > + */ > > > > + if (!mapping) > > > > + return false; > > > > + > > > > + /* Anonymous folios are fine, other non-file backed cases are not. */ > > > > + mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS; > > > > + if (mapping_flags) > > > > + return mapping_flags == PAGE_MAPPING_ANON; > > > > > > KSM pages are also (shared) anonymous folios, and that check would fail -- > > > which is ok (the following unsharing checks rejects long-term pinning them), > > > but a bit inconstent with your comment and folio_test_anon(). > > > > > > It would be more consistent (with your comment and also the folio_test_anon > > > implementation) to have here: > > > > > > return mapping_flags & PAGE_MAPPING_ANON; > > > > > > > I explicitly excluded KSM out of fear that could be some breakage given they're > > wrprotect'd + expected to CoW though? But I guess you mean they'd get picked up > > by the unshare and so it doesn't matter + we wouldn't want to exclude an > > PG_anon_exclusive case? > > Yes, unsharing handles that in the ordinary GUP and GUP-fast case. And > unsharing is neither GUP-fast nor even longterm specific (for anon pages). > > Reason I'm brining this up is that I think it's best if we let > folio_fast_pin_allowed() just check for what's absolutely GUP-fast specific. Ack, indeed it's a separate thing, see above for the contradictory nature of wanting to be cautious but then accidentally making your change _more radical_ than you intended...! > > > > > I'll make the change in any case given the unshare check! > > > > I notice the gup_huge_pgd() doesn't do an unshare but I mean, a PGD-sized huge > > page probably isn't going to be CoW'd :P > > I spotted exactly the same thing and wondered about that (after all I added > all that unsharing logic ... so I should know). I'm sure there must be a > reason I didn't add it ;) > > ... probably we should just add it even though it might essentially be dead > code for now (at least the cow selftests would try with each and every > hugetlb size and eventually reveal the problem on whatever arch ends up > using that code ... ). > > Do you want to send a patch to add unsharing to gup_huge_pgd() as well? > Sure will do! > -- > Thanks, > > David / dhildenb >