Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp1392152pxb; Sat, 9 Jan 2021 18:56:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJzxRB4BHROrzi7RHvUhyM/H0P+TiGcQ8YEBC++/JB3AL1eSrwh+GP3MzhluEUhZFn0FHdN+ X-Received: by 2002:a17:907:98ec:: with SMTP id ke12mr6605407ejc.554.1610247390457; Sat, 09 Jan 2021 18:56:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610247390; cv=none; d=google.com; s=arc-20160816; b=B4Ra+pp/0tbMl9HgjHqyjB/6EtbQ1EQWw7+l/1WXXnyjFokYSopYcCQg8jeq4ifb+e u0d/zG+ol3nIPkjSqWYGyNFZaGUse2qCE5Cwo66fqTmTQVF4Wy4QnBmUcCMmr/RgpiHL 2VNPsz4By1seuLN3x3dsHMgCKPG32Ht7e5b7ImLUHIOGyQDzCT6oQ+DyuwtELZJaNEau JPqHx4hurqTLpK5HnKilnH1A1oSx5IzSceQOeI5MtaNrNX/4H7sKd2c4R8l9X1r1c2Wz 4bWWf4xvN1gCNKMuyFDBmo4kH9MC3Je6E+KdZ4ljSrX9MvgbAOqmyDV+jaqzu18GcGcq Np8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=aIoIQn4ojBUCPSiaZ09DjloVv1kHMY3eIPXoRrSNrA8=; b=WEYlGCFuZjYzxZO/cl4p4GjsxXnCWnKA2YFlZHX6ZXTYq99bgeRy0S2vLXzp2QKa13 tcaOBuXutf/+e6jyuAiknE9npNAwB04GNQreS0AMaOvS+c9VYHnoPoGCy6Czhb2kgmqy LK2tr8F8D2PDukqKsFGSoLUFWlPysrypotm7y+fdHXzG52Pf8QKptK9V1rfTwQBRtGNe FCYoGuuCA0a7XoTCM1U7iaW3aLspPy6GF6W2CTyLAM5HfrLmk9pWqcGKUcHAJiyymtPM oWSKBfXqJWWYTudUMbd5oh925/NLaaDMzyCsrJpXMhY+RMG3eRSRO54j2jJgPVUL7fuz vKXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LLQHE5ax; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n17si5122631eja.271.2021.01.09.18.55.39; Sat, 09 Jan 2021 18:56:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LLQHE5ax; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726253AbhAJCw4 (ORCPT + 99 others); Sat, 9 Jan 2021 21:52:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34717 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726062AbhAJCwz (ORCPT ); Sat, 9 Jan 2021 21:52:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610247089; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aIoIQn4ojBUCPSiaZ09DjloVv1kHMY3eIPXoRrSNrA8=; b=LLQHE5axSpO2ezhwdxZ+28WWyCM0oEK6YiOLtwP5mT2VNrS4er/PzvDU3QC5ZWPTR5ox+S IIQ8akdwG22wDWKgTMuR4SxfwMO3z7cerZjjrMM3hG2Eod5C2eMmux0Qa0B3t1EsD2p/o7 +ptUV9GRluYS7YBpZVg8zBEtkg81xBY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-507-bCKjKWr4Mgm76VpykG7jOQ-1; Sat, 09 Jan 2021 21:51:25 -0500 X-MC-Unique: bCKjKWr4Mgm76VpykG7jOQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DDE7D801817; Sun, 10 Jan 2021 02:51:21 +0000 (UTC) Received: from mail (ovpn-112-222.rdu2.redhat.com [10.10.112.222]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DEADF5D9D5; Sun, 10 Jan 2021 02:51:14 +0000 (UTC) Date: Sat, 9 Jan 2021 21:51:14 -0500 From: Andrea Arcangeli To: Linus Torvalds Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse Message-ID: References: <20210110004435.26382-1-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Linus, On Sat, Jan 09, 2021 at 05:19:51PM -0800, Linus Torvalds wrote: > +#define is_cow_mapping(flags) (((flags) & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE) > + > +static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, pte_t pte) > +{ > + struct page *page; > + > + if (!is_cow_mapping(vma->vm_flags)) > + return false; > + if (likely(!atomic_read(&vma->vm_mm->has_pinned))) > + return false; > + page = vm_normal_page(vma, addr, pte); > + if (!page) > + return false; > + if (page_mapcount(page) != 1) > + return false; > + return page_maybe_dma_pinned(page); > +} I just don't see the simplification coming from 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking page_mapcount above as an optimization, to me it looks much simpler to check it in a single place, in do_wp_page, that in addition of optimizing away the superfluous copy, would optimize away the above complexity as well. And I won't comment if it's actually safe to skip random pages or not. All I know is for mprotect and uffd-wp, definitely the above approach wouldn't work. In addition I dislike has_pinned and FOLL_PIN. has_pinned 450 include/linux/mm_types.h * for instance during page table copying for fork(). has_pinned 1021 kernel/fork.c atomic64_set(&mm->pinned_vm, 0); has_pinned 1239 mm/gup.c atomic_set(&mm->has_pinned, 1); has_pinned 2579 mm/gup.c atomic_set(¤t->mm->has_pinned, 1); has_pinned 1099 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 1213 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 819 mm/memory.c if (likely(!atomic_read(&src_mm->has_pinned))) Are we spending 32bit in mm_struct atomic_t just to call atomic_set(1) on it? Why isn't it a MMF_HAS_PINNED that already can be set atomically under mmap_read_lock too? There's bit left free there, we didn't run out yet to justify wasting another 31 bits. I hope I'm overlooking something. The existence of FOLL_LONGTERM is good and makes a difference at times for writeback if it's on a MAP_SHARED, or it makes difference during GUP to do a page_migrate before taking the pin, but for the whole rest of the VM it's irrelevant if it's long or short term, so I'm also concerned from what Jason mentioned about long term pins being treated differently within the VM. That to me looks fundamentally as flawed as introducing inaccuracies in do_wp_page, that even ignoring the performance implications caused by the inaccuracy, simply prevent to do some useful things. I obviously agree all common workloads with no GUP pins and no clear_refs and no uffd, are way more important to be optimal, but I haven't seen a single benchmark showing the benefit of not taking the PG_lock before a page copy or any other runtime benefit coming from page_count in do_wp_page. To the contrary now I see additional branches in fork and in various other paths. The only thing again where I see page_count provides a tangible benefit, is the vmsplice attack from child to parent, but that has not been fully fixed and if page_count is added to fix it in all COW faults, it'll introduce extra inefficiency to the the very common important workloads, not only to the special GUP/clear_refs/uffd-wp workloads as your patch above shows. Thanks, Andrea