Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2967917pxk; Mon, 21 Sep 2020 01:36:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwP54TQmOpYPyIjHv2doiCwQFCDRHIZ3sb9ML8jEn3hpm4XRcBoPWAzNg4dEx1NTQUH9Z3q X-Received: by 2002:a17:906:a156:: with SMTP id bu22mr50857006ejb.177.1600677398374; Mon, 21 Sep 2020 01:36:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600677398; cv=none; d=google.com; s=arc-20160816; b=mXpjA8C73wG8/k53wRT5/Gj9T5gLueJZF11GF9yrOqSK3LMgwjUeYCKG0KVjNFtCu7 X9dRPTC5xkYOdAR7ZAxR6xudX+PqRd9jrToIDNSlNOBfZEN8jRi6VctF494f+MFWtbvQ He/ZO2c7O3elWpybMdD4DGl2FvKjEHc7FgtQJzyPcJ79LG02e4rpsjjMFTsbRHESGHX2 HwiAf46gryRWRP8WXA+YY/s4zRewlbpdxnRs82TXZc6xUCSrDIlPaB4fBbZS2M10HeAI pmAWbcJUQsfhQleQ9I+3Zz3/o8W8OHXOEI/jUwV3G0+U19IDRikse3/gW40w9R76iKdx 8VWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=6n26KvUSUeokiLioypZNk2YotTX5Gbaq/JxizIOneSM=; b=UfP6uYRvenh29RCYQjTJAYI8XP/Us4ngBW1MLAJrjKVnizdeam/D40lGJKtb9xxJ8X IBeRJJ9P/b3ly3T/w1mY+qHaC3XU4DZ9rxNftcomLYtVdy8Pb8PWizW4cM4PHNbZ3CIV khFGpMbYVmy9fmQ85M3D9Rgjaf8+wGn9fn8BQDRVxMP281lTrxg/nz1ndUJxN+MF+jQC UuzWlgcSaobBYl96GNGG81lzERKQ/DvOxAPihSCnqfFZxZc6kJouJplKh2Tvcb3n9nNd 9JJmJccNW3VOIUH53u01D5wB/dXgW8+OA0JXu85LgXVn8PO7qxh/wwQTH7wY9oqvQgyZ 8/Sw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rn4si9118897ejb.143.2020.09.21.01.36.15; Mon, 21 Sep 2020 01:36:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726419AbgIUIfH (ORCPT + 99 others); Mon, 21 Sep 2020 04:35:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:43022 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726326AbgIUIfH (ORCPT ); Mon, 21 Sep 2020 04:35:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B050AB50F; Mon, 21 Sep 2020 08:35:41 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 0ACAE1E12E1; Mon, 21 Sep 2020 10:35:05 +0200 (CEST) Date: Mon, 21 Sep 2020 10:35:05 +0200 From: Jan Kara To: Jason Gunthorpe Cc: John Hubbard , Peter Xu , Linus Torvalds , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200921083505.GA5862@quack2.suse.cz> References: <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917193824.GL8409@ziepe.ca> <20200918164032.GA5962@xz-x1> <20200918173240.GY8409@ziepe.ca> <20200918204048.GC5962@xz-x1> <0af8c77e-ff60-cada-7d22-c7cfcf859b19@nvidia.com> <20200919000153.GZ8409@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200919000153.GZ8409@ziepe.ca> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 18-09-20 21:01:53, Jason Gunthorpe wrote: > On Fri, Sep 18, 2020 at 02:06:23PM -0700, John Hubbard wrote: > > On 9/18/20 1:40 PM, Peter Xu wrote: > > > On Fri, Sep 18, 2020 at 02:32:40PM -0300, Jason Gunthorpe wrote: > > > > On Fri, Sep 18, 2020 at 12:40:32PM -0400, Peter Xu wrote: > > > > > > > > > Firstly in the draft patch mm->has_pinned is introduced and it's written to 1 > > > > > as long as FOLL_GUP is called once. It's never reset after set. > > > > > > > > Worth thinking about also adding FOLL_LONGTERM here, at last as long > > > > as it is not a counter. That further limits the impact. > > > > > > But theoritically we should also trigger COW here for pages even with PIN && > > > !LONGTERM, am I right? Assuming that FOLL_PIN is already a corner case. > > > > > > > This note, plus Linus' comment about "I'm a normal process, I've never > > done any special rdma page pinning", has me a little worried. Because > > page_maybe_dma_pinned() is counting both short- and long-term pins, > > actually. And that includes O_DIRECT callers. > > > > O_DIRECT pins are short-term, and RDMA systems are long-term (and should > > be setting FOLL_LONGTERM). But there's no way right now to discern > > between them, once the initial pin_user_pages*() call is complete. All > > we can do today is to count the number of FOLL_PIN calls, not the number > > of FOLL_PIN | FOLL_LONGTERM calls. > > My thinking is to hit this issue you have to already be doing > FOLL_LONGTERM, and if some driver hasn't been properly marked and > regresses, the fix is to mark it. > > Remember, this use case requires the pin to extend after a system > call, past another fork() system call, and still have data-coherence. > > IMHO that can only happen in the FOLL_LONGTERM case as it inhernetly > means the lifetime of the pin is being controlled by userspace, not by > the kernel. Otherwise userspace could not cause new DMA touches after > fork. I agree that the new aggressive COW behavior is probably causing issues only for FOLL_LONGTERM users. That being said it would be nice if even ordinary threaded FOLL_PIN users would not have to be that careful about fork(2) and possible data loss due to COW - we had certainly reports of O_DIRECT IO loosing data due to fork(2) and COW exactly because it is very subtle how it behaves... But as I wrote above this is not urgent since that problematic behavior exists since the beginning of O_DIRECT IO in Linux. Honza -- Jan Kara SUSE Labs, CR