Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1587462pxk; Fri, 18 Sep 2020 17:31:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyL5yuyjK6ZeGjrhPCN/voiPn2eVu6epO2r8baDUliiHvsdiJouBGlcsA9AbFEJll5sp/+X X-Received: by 2002:a17:906:1a4b:: with SMTP id j11mr37614890ejf.97.1600475476979; Fri, 18 Sep 2020 17:31:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600475476; cv=none; d=google.com; s=arc-20160816; b=nDXi6TuEhnBIZatv0oJoJ/+7Vb+G5sWPYjoiLe/4207bE3T3U16gdgNJtblvIdOEV0 ZDHGXR+6EJ2Lx3MjSlIqIEgKM90O+7ety0oujnIAIme55JpHj0M1RdIgcbgM1EZQQ8za UBntWKRCOJnDsIaboEHCIuyxIaKvQ3S/f7xjH3uLwN7nkD60kluz0q2ehgbZRjYCmwO1 mvD9z6nB52vaXip0EJW6LzlQwewyVcli8P91Ttwuvy6d53hAJNnoPDhoBMoLyKyUnfOT nrQQhe657eR/vq3+4ZAqKwr+bikYscNvMz8kP6Bt9+1JNqby2L6g1mJwGXLpXb1DD4Vr A5sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=j4P3BTqdkFuFYaLLGZbe7c2TIqQozYOPAE0JV1keGJo=; b=iGN5xq6LZi1lco75BW8CN5VEN5uOYVTO1/vac5fmtHgsE2EZ9MOgfwQNELTjnz7+T5 L7Ax1IfCqDiSzCwaA2q/eUTar7361vD1nXN8swxI55BBdK2UQNJTvsZEpkXPluDqO+kg 6Fy6kKFQ0pz+p6yUEelAzUm7kblHdKH22Enh7QOJ7S6NrNIzOy6qr60EpWgzDX9/JSZw RrTD3HstjE5Sp9/n6NtMgH+XAKDaUEX40IL0/RQKYo+Y8ASRHRRMMrX1vF5GklOq2RqI f77ZyxTBbP7XeWiz7cts3CiBM4XNRGq9JIdt1fVDHaikD6pLKY0Wx+d6K0YeZZdX1ezB cdvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=dLBnAlnw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gl17si3286688ejb.311.2020.09.18.17.30.53; Fri, 18 Sep 2020 17:31:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=dLBnAlnw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726104AbgISA2w (ORCPT + 99 others); Fri, 18 Sep 2020 20:28:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726009AbgISA2w (ORCPT ); Fri, 18 Sep 2020 20:28:52 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31E4AC0613CE for ; Fri, 18 Sep 2020 17:28:52 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id e7so6703926qtj.11 for ; Fri, 18 Sep 2020 17:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j4P3BTqdkFuFYaLLGZbe7c2TIqQozYOPAE0JV1keGJo=; b=dLBnAlnwbin8WDERspJuceop3Te+BNeAc5uU9GUX9B0PcR+b2YgIDAH0JiWHmwvIl/ eKo6b4RFOJbrPP1JGgI8gZaYIvehyRf1S1+m/DGm2fGyKMZ6b/Wwzt6zrfJhjxcdN2Gn XjMLJtWiRyph+GRm1L4JJKsFgvP1M/j4gKOhQ6NxatU8v1lbsloMYI+8NB2c6Y1X0qt/ PNRv+gdoFqUmP3s8N88s4iK+6A4u1hkPMbBnEu8ofJhWz4VSd3kDY2GvZJJcR0R3WKRn 5F2KAk8fSF4AotSJno+qerOij0Ebn0ylNz1h/Odkt/lTMVTWVrm77UJXaifsZA53uMzI PWMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j4P3BTqdkFuFYaLLGZbe7c2TIqQozYOPAE0JV1keGJo=; b=KA3dNLNIIQBXX+E0junoVC9vZUUHYQehqJeADrCltQekXhpbVS0o1kXgAjb8HrqYtW g2qeYT9GK/8eQJr9AUWN541D94CG7qNVGlS3Th+/NESPqs7LlM+HWDpS2PeiDZQn1J0O 4bUgWrHZiNLz+rs2htepttWC3hfBk/6b8CCprsdZmrvRStLnkCQBlatL2pWygtC3m9Wz uokx/BriA8HFj18Ne4975GZirvybP1SPn3v99a7s9ySl9InPkOvu/uGlEr1NbQhOQdse RE0KZ0Ox1zQEMboDjdjnFQwVRUhcNoJHFBeRS3YU9k7azKv+E8Mw+KjBV7PijfGwzG5k gguQ== X-Gm-Message-State: AOAM533AXrSzpjXBzWuMF68vizRwHX+ciUa5Y9OQ5DCvLvUCWAV5CS7L 7nD8PZ8IJChUbvbXcRCVbYbhAg== X-Received: by 2002:ac8:4d84:: with SMTP id a4mr36829730qtw.365.1600475331321; Fri, 18 Sep 2020 17:28:51 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id j88sm3179374qte.96.2020.09.18.17.28.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Sep 2020 17:28:50 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kJQkE-001uTq-5K; Fri, 18 Sep 2020 21:28:50 -0300 Date: Fri, 18 Sep 2020 21:28:50 -0300 From: Jason Gunthorpe To: Linus Torvalds Cc: Peter Xu , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200919002850.GA8409@ziepe.ca> References: <20200916174804.GC8409@ziepe.ca> <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917193824.GL8409@ziepe.ca> <20200918164032.GA5962@xz-x1> <20200918173240.GY8409@ziepe.ca> <20200918204048.GC5962@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 18, 2020 at 01:59:41PM -0700, Linus Torvalds wrote: > Honestly, if we had a completely *reliable* sign of "this page is > pinned", then I think the much nicer option would be to just say > "pinned pages will not be copied at all". Kind of an implicit > VM_DONTCOPY. It would be simpler to implement, but it makes the programming model really sketchy. For instance O_DIRECT is using FOLL_PIN, so imagine this program: CPU0 CPU1 a = malloc(1024); b = malloc(1024); read(fd, a, 1024); // FD is O_DIRECT ... fork() *b = ... read completes Here a and b got lucky and both come from the same page due to the allocator. In this case the fork() child in CPU1, would be very surprised that 'b' was not mapped into the fork. Similiarly, CPU0 would have silent data corruption if the read didn't deposit data into 'a' - which is a bug we have today. In this race the COW break of *b might steal the physical page to the child, and *a won't see the data. For this reason, John is right, fork needs to eventually do this for O_DIRECT as well. The copy on fork nicely fixes all of this weird oddball stuff. Jason