Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7185550rwr; Tue, 2 May 2023 10:37:29 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ45JdlGf42N43Wljhqhp+D281sI6xaxHgXtorxezIDSWNZFAsru5/f+BJbO/7Tqx/Djqea9 X-Received: by 2002:a17:90b:384f:b0:247:2437:d5c4 with SMTP id nl15-20020a17090b384f00b002472437d5c4mr19060911pjb.13.1683049049100; Tue, 02 May 2023 10:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683049049; cv=none; d=google.com; s=arc-20160816; b=c2JUYUotFoG1Gl3otNJAAtHObZjMIwQsgvf99ZB16IA+Ea44xP+Fraa2JcOo8NY+6d VXxX/Vfhb/PmUaz1YvPLx4Gc2mgGWL0C2yOxQSKOWSqPkuI7tXnl4bdn3WOSyKpnLTOT Qizkm4/4FoUxC1eps8Wl5UxMmsHJ2+yLtRT3G7ubZRuH68OT/vwRyREZkMgq3RYpQ8Fm CHfogY52hIYyzCS05ha4PIxCEQCGZFJ/X0tUbGLq+MX9t5Bwkmf4YVKhfufhryd+PVKX SD3uC0KijYT3kZJAGo/w0Dt3O1gD+CU474yCc2Lh9NRnjW1k+qtD5LWVWaUkJ6zEuBEI hEmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=FxFefL2TaEuqaJkcFu7xJC55lri7raM2nXvu+Ph57wY=; b=VHhtGsLJ9xT9+Gxv0wlITxr+72lEdl6rM+8v/0flwM/WE0Vf9J3yQ30/Vyhww0wD2Z Hs7Z4eSWrFDfJC7pUXamvEe39Zg9QtfmFbNOh24EnAlru5ezPBqMazoT5N2wmTL9VdSy evSA/+Kzcw5ptEdp2LJNPLChiNs9XwNJhr/XFg7Tn6Mo1rDyJ1hV55E7NY5swX0gzpbB CZTM1YlFATficioTdwHOK4L3PEfbeegbLVGafizFAxD5G1rrYbiG3Em1CFHWeal1KnqO F4CMc0c1/9d60QvqAGIYXaCCC2XR0P6CtGLnB7H3omgNSzab7pZCsCN+csuu1Xec6UKw Wr5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NskkJKtD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n17-20020a17090ac69100b0024dfab55f54si5544957pjt.79.2023.05.02.10.37.14; Tue, 02 May 2023 10:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NskkJKtD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234390AbjEBRRa (ORCPT + 99 others); Tue, 2 May 2023 13:17:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231609AbjEBRR2 (ORCPT ); Tue, 2 May 2023 13:17:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81E211B5 for ; Tue, 2 May 2023 10:16:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683047802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FxFefL2TaEuqaJkcFu7xJC55lri7raM2nXvu+Ph57wY=; b=NskkJKtDbddPg8p7HSo0SbpNY11EMgMs/LYk7ZRHA58Tm1eCMGp34dTebNhSbqXwWXtn6N Lh9bxIE7l3DHtG6W86O3FmA2O5WN8myyrfXnuSsxwmI9JjzbtIriQWR1iHCg/FTt8ehRw9 JGLKe+lqWqzrP6VbS6idmaoJEV4rAdM= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-457-A2oatiY_O1y9SoJXZOlH-g-1; Tue, 02 May 2023 13:16:41 -0400 X-MC-Unique: A2oatiY_O1y9SoJXZOlH-g-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3f3157128b4so110764135e9.0 for ; Tue, 02 May 2023 10:16:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683047800; x=1685639800; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FxFefL2TaEuqaJkcFu7xJC55lri7raM2nXvu+Ph57wY=; b=hWK5hJOMcvnPFJEOrJew3NQgn3xXeRJOCi3YhYdiFnO4OdO7HY8//NMj72wI1/XFXS QobuBWPDyMRd/H45Mk7Sshkd7X/Yg7CZVmqYSXHXLych+dX42P8QR99bMo4EbaCJUfLO 7M12qrABCuP1mUnxjrWH+VZEmnWOWliVoX3i6xHM8FQfjiitfllc0oX00KycSZcIdM8U w+9uXuSrN7kMmxOc7UWa1zBwmjN6dEE+B873nMD+NyHyMyNetdOF6wbfr0ogJb6dImNc vz+/s9by4EgoV/kN/YOvlq1TM0HXbnQDY96JuQOGtFsAwwVI2WEUqFIJSwJo4c9C/iF3 cd3w== X-Gm-Message-State: AC+VfDzN1YHtwTd6N/qJFInX4kKuuKa2AOVrr3ZG8o3y1XQRh1CvGnse 6OlEo+0fGsXvpZ/fHTM+/FaO10eipkchzcE8Q76wKy/oZwf0kH935E2lXiQgzr4GooTel4Fn8Dd WFqS38SGd17r3wklb7oGY/jrfjTXF/CV9 X-Received: by 2002:a05:600c:190e:b0:3f1:96a8:3560 with SMTP id j14-20020a05600c190e00b003f196a83560mr16415002wmq.10.1683047800070; Tue, 02 May 2023 10:16:40 -0700 (PDT) X-Received: by 2002:a05:600c:190e:b0:3f1:96a8:3560 with SMTP id j14-20020a05600c190e00b003f196a83560mr16414984wmq.10.1683047799651; Tue, 02 May 2023 10:16:39 -0700 (PDT) Received: from ?IPV6:2003:cb:c700:2400:6b79:2aa:9602:7016? (p200300cbc70024006b7902aa96027016.dip0.t-ipconnect.de. [2003:cb:c700:2400:6b79:2aa:9602:7016]) by smtp.gmail.com with ESMTPSA id y5-20020a1c4b05000000b003e91b9a92c9sm35873689wma.24.2023.05.02.10.16.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 May 2023 10:16:39 -0700 (PDT) Message-ID: <03e591ce-debc-bba1-c55e-ce590cc1f38d@redhat.com> Date: Tue, 2 May 2023 19:16:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v7 1/3] mm/mmap: separate writenotify and dirty tracking logic Content-Language: en-US To: Lorenzo Stoakes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger References: <72a90af5a9e4445a33ae44efa710f112c2694cb1.1683044162.git.lstoakes@gmail.com> <56696a72-24fa-958e-e6a1-7a17c9e54081@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02.05.23 19:09, Lorenzo Stoakes wrote: > On Tue, May 02, 2023 at 05:53:46PM +0100, Lorenzo Stoakes wrote: >> On Tue, May 02, 2023 at 06:38:53PM +0200, David Hildenbrand wrote: >>> On 02.05.23 18:34, Lorenzo Stoakes wrote: >>>> vma_wants_writenotify() is specifically intended for setting PTE page table >>>> flags, accounting for existing PTE flag state and whether that might >>>> already be read-only while mixing this check with a check whether the >>>> filesystem performs dirty tracking. >>>> >>>> Separate out the notions of dirty tracking and a PTE write notify checking >>>> in order that we can invoke the dirty tracking check from elsewhere. >>>> >>>> Note that this change introduces a very small duplicate check of the >>>> separated out vm_ops_needs_writenotify(). This is necessary to avoid making >>>> vma_needs_dirty_tracking() needlessly complicated (e.g. passing a >>>> check_writenotify flag or having it assume this check was already >>>> performed). This is such a small check that it doesn't seem too egregious >>>> to do this. >>>> >>>> Signed-off-by: Lorenzo Stoakes >>>> Reviewed-by: John Hubbard >>>> Reviewed-by: Mika Penttilä >>>> Reviewed-by: Jan Kara >>>> Reviewed-by: Jason Gunthorpe >>>> --- >>>> include/linux/mm.h | 1 + >>>> mm/mmap.c | 36 +++++++++++++++++++++++++++--------- >>>> 2 files changed, 28 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index 27ce77080c79..7b1d4e7393ef 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -2422,6 +2422,7 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, >>>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ >>>> MM_CP_UFFD_WP_RESOLVE) >>>> +bool vma_needs_dirty_tracking(struct vm_area_struct *vma); >>>> int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); >>>> static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) >>>> { >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index 5522130ae606..295c5f2e9bd9 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -1475,6 +1475,31 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg) >>>> } >>>> #endif /* __ARCH_WANT_SYS_OLD_MMAP */ >>>> +/* Do VMA operations imply write notify is required? */ >>>> +static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops) >>>> +{ >>>> + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); >>>> +} >>>> + >>>> +/* >>>> + * Does this VMA require the underlying folios to have their dirty state >>>> + * tracked? >>>> + */ >>>> +bool vma_needs_dirty_tracking(struct vm_area_struct *vma) >>>> +{ >>> >>> Sorry for not noticing this earlier, but ... >> >> pints_owed++ Having tired eyes and jumping back and forth between tasks really seems to start getting expensive ;) >> >>> >>> what about MAP_PRIVATE mappings? When we write, we populate an anon page, >>> which will work as expected ... because we don't have to notify the fs? >>> >>> I think you really also want the "If it was private or non-writable, the >>> write bit is already clear */" part as well and remove "false" in that case. >>> >> >> Not sure a 'write bit is already clear' case is relevant to checking >> whether a filesystem dirty tracks? That seems specific entirely to the page >> table bits. >> >> That's why I didn't include it, >> >> A !VM_WRITE shouldn't be GUP-writable except for FOLL_FORCE, and that >> surely could be problematic if VM_MAYWRITE later? >> >> Thinking about it though a !VM_SHARE should probably can be safely assumed >> to not be dirty-trackable, so we probably do need to add a check for >> !VM_SHARED -> !vma_needs_dirty_tracking >> > > On second thoughts, we explicitly check FOLL_FORCE && !is_cow_mapping() in > check_vma_flags() so that case cannot occur. > > So actually yes we should probably include this on the basis of that and > the fact that a FOLL_WRITE operation will CoW the MAP_PRIVATE mapping. > Yes, we only allow to FOLL_FORCE write to (exclusive) anonymous pages that are mapped read-only. If it's not that, we trigger a (fake) write fault. -- Thanks, David / dhildenb