Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55C87C433FE for ; Wed, 22 Dec 2021 09:58:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244069AbhLVJ6n (ORCPT ); Wed, 22 Dec 2021 04:58:43 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:55200 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231487AbhLVJ6m (ORCPT ); Wed, 22 Dec 2021 04:58:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640167121; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K5G4EVE4ohJb3tDbOzo24CdbVU9kLTuaVeCzI5YYwUo=; b=gDT2wzaaZPsKRifnrIWNYvTguKtbc769l93fAqvR/OEzNSOM3jhLGBJDelgcYCz1XZ21cN 2OAAyKGRm+fmPZbtbR6wyoZPuUaP93wCophrbt2NBWiiAAtbC+hCMVrsc/nTBRL3I0Ns5U NKkw3snVJPcJIiGaxGo4N6q/AK+zGkM= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-zxckO843POC59J9gRZmpkg-1; Wed, 22 Dec 2021 04:58:40 -0500 X-MC-Unique: zxckO843POC59J9gRZmpkg-1 Received: by mail-wm1-f69.google.com with SMTP id b188-20020a1c80c5000000b00345c1603997so47679wmd.1 for ; Wed, 22 Dec 2021 01:58:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:from:to:cc:references:organization:subject :in-reply-to:content-transfer-encoding; bh=K5G4EVE4ohJb3tDbOzo24CdbVU9kLTuaVeCzI5YYwUo=; b=FFE/+1Evjvu/tgiaK4gJ4F3zVDM4EwkyeXK2TPBAl/RYdEZMYoiW3a0MrShuWu8sOr 0kvxUTlSsDIp6Ar19HQgP8paFIlDBKZqMUFeQ7Nby6IPrw7LZ5ruOCW5ZCblfoV5EMVL dPd1C7DMeqvcTulzt19Bp4YNYZ0CxcnPSO/Y7yIEq+8rri3MtQlxfktPFDM1Z/oA1RFs YMSnXmh9F7Fwy4RHvVpWOI0i3oYB3l7hZ/Fy534GPdphH6BJdRES7KLWYF7hjNteR+cu L4GrgkckwClvr2BXrSOXZX3MlM9Mdc31q9+pJ92/a3DAX0lcqVP0Uk67HITtjYyTmLRC FxYQ== X-Gm-Message-State: AOAM532T5GQrtsRJMCnpn2gxjGOyyrR+pH7HYNZATkbEHRBh7CpQFlGz BFMbkV3mdT6dmAlOOMv2zpyku1M7qY6OSWZFZigjC8XSec/j2NQAj7oT90GOpCSxDMNQrqUdqqE 75KBhr3ez1rnNJ+nIWfYvEw4d X-Received: by 2002:a05:600c:2242:: with SMTP id a2mr361334wmm.63.1640167119412; Wed, 22 Dec 2021 01:58:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAcPxZQQUdZzMn38yXVn5CoME1uZa7193DP1I+6cHL0latjqEXQ98ESJ3dAVdMJyoz2zLO5A== X-Received: by 2002:a05:600c:2242:: with SMTP id a2mr361317wmm.63.1640167119151; Wed, 22 Dec 2021 01:58:39 -0800 (PST) Received: from [192.168.3.132] (p5b0c646a.dip0.t-ipconnect.de. [91.12.100.106]) by smtp.gmail.com with ESMTPSA id i12sm1451835wrp.96.2021.12.22.01.58.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Dec 2021 01:58:38 -0800 (PST) Message-ID: Date: Wed, 22 Dec 2021 10:58:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Content-Language: en-US From: David Hildenbrand To: Jason Gunthorpe Cc: Linus Torvalds , Nadav Amit , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" References: <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> <20211221010312.GC1432915@nvidia.com> <900b7d4a-a5dc-5c7b-a374-c4a8cc149232@redhat.com> <20211221190706.GG1432915@nvidia.com> <3e0868e6-c714-1bf8-163f-389989bf5189@redhat.com> Organization: Red Hat Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) In-Reply-To: <3e0868e6-c714-1bf8-163f-389989bf5189@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22.12.21 09:51, David Hildenbrand wrote: > On 21.12.21 20:07, Jason Gunthorpe wrote: >> On Tue, Dec 21, 2021 at 06:40:30PM +0100, David Hildenbrand wrote: >> >>> 2) is certainly the cherry on top. But it just means that R/O pins don't >>> have to be the weird kid. And yes, achieving 2) would require >>> FAULT_FLAG_EXCLUSIVE / FAULT_FLAG_UNSHARED, but it would really 99% do >>> what existing COW logic does, just bypass the "map writable" and >>> "trigger write fault" semantics. >> >> I still don't agree with this - when you come to patches can you have >> this work at the end and under a good cover letter? Maybe it will make >> more sense then. > > Yes. But really, I think it's the logical consequence of what Linus said > [1]: > > "And then all GUP-fast would need to do is to refuse to look up a page > that isn't exclusive to that VM. We already have the situation that > GUP-fast can fail for non-writable pages etc, so it's just another > test." > > We must not FOLL_PIN a page that is not exclusive (not only on gup-fast, > but really, on any gup). If we special case R/O FOLL_PIN, we cannot > enable the sanity check on unpin as suggested by Linus [2]: > > "If we only set the exclusive VM bit on pages that get mapped into > user space, and we guarantee that GUP only looks up such pages, then > we can also add a debug test to the "unpin" case that the bit is > still set." > > There are really only two feasible options I see when we want to take a > R/O FOLL_PIN on a !PageAnonExclusive() anon page > > (1) Fail the pinning completely. This implies that we'll have to fail > O_DIRECT once converted to FOLL_PIN. > (2) Request to mark the page PageAnonExclusive() via a > FAULT_FLAG_UNSHARE and let it succeed. > > > Anything else would require additional accounting that we already > discussed in the past is hard -- for example, to differentiate R/O from > R/W pins requiring two pin counters. > > The only impact would be that FOLL_PIN after fork() has to go via a > FAULT_FLAG_UNSHARE once, to turn the page PageAnonExclusive. IMHO this > is the right thing to do for FOLL_LONGTERM. For !FOLL_LONGTERM it would > be nice to optimize this, to *not* do that, but again ... this would > require even more counters I think, for example, to differentiate > between "R/W short/long-term or R/O long-term pin" and "R/O short-term pin". > > So unless we discover a way to do additional accounting for ordinary 4k > pages, I think we really can only do (1) or (2) to make sure we never > ever pin a !PageAnonExclusive() page. BTW, I just wondered if the optimization should actually be that R/O short-term FOLL_PIN users should actually be using FOLL_GET instead. So O_DIRECT with R/O would already be doing the right thing. And it somewhat aligns with what we found: only R/W short-term FOLL_GET is problematic, where we can lose writes to the page from the device via O_DIRECT. IIUC, our COW logic makes sure that a shared anonymous page that might still be used by a R/O FOLL_GET cannot be modified, because any attempt to modify it would result in a copy. But I might be missing something, just an idea. -- Thanks, David / dhildenb