Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F85AC433FE for ; Tue, 21 Dec 2021 18:28:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241380AbhLUS2u (ORCPT ); Tue, 21 Dec 2021 13:28:50 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43522 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241387AbhLUS2s (ORCPT ); Tue, 21 Dec 2021 13:28:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640111327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NtSOKjHL5xLetWImuyMHWnGACFYDBlyV6QMnr6bSdxg=; b=KhUXtSQ6fECMU/5ju5c5l5ms9AyeZFOv09A2g92GOWc221wIn1/UN6LcILhrnQ5eYt3wO6 N8gyrQeDF/lRBMp/Kg3vO9jujkKC1T/DFYEgKlWisp2y3u7DU+nBU4txfJLWWPW9pz9cGJ cIGFHTEpCm4Uh1BcutMvtCwMJ8mv+m0= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-118-de-zteo9M9ClygdQpEyyiQ-1; Tue, 21 Dec 2021 13:28:46 -0500 X-MC-Unique: de-zteo9M9ClygdQpEyyiQ-1 Received: by mail-wr1-f71.google.com with SMTP id l13-20020adfbd8d000000b001a23a990dbfso4905357wrh.5 for ; Tue, 21 Dec 2021 10:28:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=NtSOKjHL5xLetWImuyMHWnGACFYDBlyV6QMnr6bSdxg=; b=8BcQ6V1r5yOg4t+LJoKzXxUF+2X9L3ocvpZ7FYROAMV+vntvd115e10qka9ARsdWsY c5++1o16IdwU1qpb27W3fL/ubVfgRiJMFRmDJZCmk2ACjqiHrVzWCWtJM896I37AkMsM a80NKMg31BThUQW+h1X8hUesICZgGmcjb5O0od6FDoUz5EmZLrQ5f45cYMTfwRabN5cn MTYv4nWtZ0qx+R+AU8v9XNDtBxLbKTGnU6XnCEm44K6NW4fSUC8AyICAROVQw5tJl97v KtsisoIDVYH00kiuooCDwRbOtR6Datl+tA5iBtrRrbI4QuKhC2mr4vzzUg31kpG5FQVN 0GzA== X-Gm-Message-State: AOAM530/1tbGoBXnM1XN//p0af9dTkObl099Bf8BafOET+q0/SH8AX92 drj2POX9GCJeqLWN0v34PDxF4lqsPgv4t/bv39F1Ex/JuWA0LBHbN/lIjPfsisShyL0GOWZex0W fuJNkq2bxhGpB1o8GRznGn23s X-Received: by 2002:a05:6000:144c:: with SMTP id v12mr3571363wrx.266.1640111325109; Tue, 21 Dec 2021 10:28:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJwz4R0LUDyW9KEiVeNRITaI4SKFHU55LagiK5twVirj2yo/Q6yNnWnJ9xrKESiQeF8/jrJCBQ== X-Received: by 2002:a05:6000:144c:: with SMTP id v12mr3571343wrx.266.1640111324857; Tue, 21 Dec 2021 10:28:44 -0800 (PST) Received: from [192.168.3.132] (p5b0c64a4.dip0.t-ipconnect.de. [91.12.100.164]) by smtp.gmail.com with ESMTPSA id u9sm2953335wmm.7.2021.12.21.10.28.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 21 Dec 2021 10:28:44 -0800 (PST) Message-ID: Date: Tue, 21 Dec 2021 19:28:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Content-Language: en-US To: Linus Torvalds Cc: Jason Gunthorpe , Nadav Amit , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" , Jan Kara References: <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> <20211221010312.GC1432915@nvidia.com> <900b7d4a-a5dc-5c7b-a374-c4a8cc149232@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21.12.21 19:00, Linus Torvalds wrote: > On Tue, Dec 21, 2021 at 9:40 AM David Hildenbrand wrote: >> >>> I do think the existing "maybe_pinned()" logic is fine for that. The >>> "exclusive to this VM" bit can be used to *help* that decision - >>> because only an exclusive page can be pinned - bit I don't think it >>> should _replace_ that logic. >> >> The issue is that O_DIRECT uses FOLL_GET and cannot easily be changed to >> FOLL_PIN unfortunately. So I'm *trying* to make it more generic such >> that such corner cases can be handled as well correctly. But yeah, I'll >> see where this goes ... O_DIRECT has to be fixed one way or the other. >> >> John H. mentioned that he wants to look into converting that to >> FOLL_PIN. So maybe that will work eventually. > > I'd really prefer that as the plan. > > What exactly is the issue with O_DIRECT? Is it purely that it uses > "put_page()" instead of "unpin", or what? > > I really think that if people look up pages and expect those pages to > stay coherent with the VM they looked it up for, they _have_ to > actively tell the VM layer - which means using FOLL_PIN. > > Note that this is in absolutely no way a "new" issue. It has *always* > been true. If some O_DIORECT path depends on pinning behavior, it has > never worked correctly, and it is entirely on O_DIRECT, and not at all > a VM issue. We've had people doing GUP games forever, and being burnt > by those games not working reliably. > > GUP (before we even had the notion of pinning) would always just take > a reference to the page, but it would not guarantee that that exact > page then kept an association with the VM. > > Now, in *practice* this all works if: > > (a) the GUP user had always written to the page since the fork > (either explicitly, or with FOLL_WRITE obviously acting as such) > > (b) the GUP user never forks afterwards until the IO is done > > (c) the GUP user plays no other VM games on that address > > and it's also very possible that it has worked by pure luck (ie we've > had a lot of random code that actively mis-used things and it would > work in practice just because COW would happen to cut the right > direction etc). > > Is there some particular GUP user you happen to care about more than > others? I think it's a valid option to try to fix things up one by > one, even if you don't perhaps fix _all_ cases. Yes, of course. The important part for me is to have a rough idea in how to tackle all pieces and have a reliable design/approach. Besides the security issue, highest priority is getting R/W pins (FOLL_WRITE) right, including O_DIRECT, because that can silently break existing use cases. Lower priority is getting R/O pins on anonymous memory right, because that never worked reliably. Lowest priority is getting R/O pins on MAP_PRIVATE file memory right. I'd appreciate if someone could work on the O_DIRECT FOLL_PIN conversion while I struggle with PageAnonExclusive() and R/W pins :) [noting that I'll not get too much done within the next 2 weeks] -- Thanks, David / dhildenb