Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp4028799imj; Tue, 12 Feb 2019 08:36:17 -0800 (PST) X-Google-Smtp-Source: AHgI3IabndxRO42DVvc5PBBRHO1iiDZcsaxTrzsK+i/wAZufo5pAyuhJMdG9+72nMQZ09gb3mFQv X-Received: by 2002:a62:12d9:: with SMTP id 86mr4690576pfs.214.1549989377403; Tue, 12 Feb 2019 08:36:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549989377; cv=none; d=google.com; s=arc-20160816; b=x2GF2zF33tT/DfyOMO+6tMDqzzsmukov83MlYovTDzPkf3lFJXHbeKc+zz6dP/etE5 rtCAuYEbplFwTtNLb7N6V2l/DJU5A0FMhBe6+Z7dmZVyPykqK84DxqMm/k/tEDkMhFsj uvEsLeNCaAxjokw46dZ4MS/xGUfljK21FcGau5DWzTHGX5/xOqV8f8dPgR3390XnrGnP gjJU1vHC8x7bM5PR2KqNYWm5YjPvvSZDeF35543a6WGhdBiC6fnqdByfk1iKleeL9suq +dqzagOQAXv0DCn6ChXeVpetFj1FpM2WW8ZiB7svIFZi+Q32TDGEoRvnkhKyL40r+6KW orsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=nwHEhX651NKR4J/IqzBdizFL7Fd9ZYvXs4LuumuIM2k=; b=tXlNrHU86uypjgjuSu7AANYQ83sLyO+WRNT6CetYhcASzqiN7HP4vnirt5+V7KlC2X h89wwvheaZzkZYAXqdQecRwch85PcwaIzgl9EVV3X8nvzFrADpfdulUxkQ2Bl16kF9Uy TuYLVFnHh2S5PZY2QdmBtWmLlaGVwF9bSgMQPtOBpO8EkhK5UFZ0ymWWKyvHEHUmhYQI mU5fLx2Cl1/wDJsHg+GWSXHzMEbgklvZ/0MFAlT31uX4iZF1Pb08JALpF/XFwmqOC+49 Z58W6dfxv1rEK8iVxd2cWfhRg4Bfx0HxZ2LwdGbOQ78sJPcWEvooXUgNfH79TBKaFKJp qaGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a20si3617123pgk.581.2019.02.12.08.36.01; Tue, 12 Feb 2019 08:36:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730764AbfBLQeg (ORCPT + 99 others); Tue, 12 Feb 2019 11:34:36 -0500 Received: from mx2.suse.de ([195.135.220.15]:38918 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727372AbfBLQeg (ORCPT ); Tue, 12 Feb 2019 11:34:36 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C4F26AD17; Tue, 12 Feb 2019 16:34:33 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2ED141E09C5; Tue, 12 Feb 2019 17:34:33 +0100 (CET) Date: Tue, 12 Feb 2019 17:34:33 +0100 From: Jan Kara To: Jason Gunthorpe Cc: Dan Williams , Matthew Wilcox , Ira Weiny , Jan Kara , Dave Chinner , Christopher Lameter , Doug Ledford , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Jerome Glisse , Michal Hocko Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Message-ID: <20190212163433.GD19076@quack2.suse.cz> References: <20190211102402.GF19029@quack2.suse.cz> <20190211180654.GB24692@ziepe.ca> <20190211181921.GA5526@iweiny-DESK2.sc.intel.com> <20190211182649.GD24692@ziepe.ca> <20190211184040.GF12668@bombadil.infradead.org> <20190211204945.GF24692@ziepe.ca> <20190211210956.GG24692@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190211210956.GG24692@ziepe.ca> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 11-02-19 14:09:56, Jason Gunthorpe wrote: > On Mon, Feb 11, 2019 at 01:02:37PM -0800, Dan Williams wrote: > > On Mon, Feb 11, 2019 at 12:49 PM Jason Gunthorpe wrote: > > > > > > On Mon, Feb 11, 2019 at 11:58:47AM -0800, Dan Williams wrote: > > > > On Mon, Feb 11, 2019 at 10:40 AM Matthew Wilcox wrote: > > > > > > > > > > On Mon, Feb 11, 2019 at 11:26:49AM -0700, Jason Gunthorpe wrote: > > > > > > On Mon, Feb 11, 2019 at 10:19:22AM -0800, Ira Weiny wrote: > > > > > > > What if user space then writes to the end of the file with a regular write? > > > > > > > Does that write end up at the point they truncated to or off the end of the > > > > > > > mmaped area (old length)? > > > > > > > > > > > > IIRC it depends how the user does the write.. > > > > > > > > > > > > pwrite() with a given offset will write to that offset, re-extending > > > > > > the file if needed > > > > > > > > > > > > A file opened with O_APPEND and a write done with write() should > > > > > > append to the new end > > > > > > > > > > > > A normal file with a normal write should write to the FD's current > > > > > > seek pointer. > > > > > > > > > > > > I'm not sure what happens if you write via mmap/msync. > > > > > > > > > > > > RDMA is similar to pwrite() and mmap. > > > > > > > > > > A pertinent point that you didn't mention is that ftruncate() does not change > > > > > the file offset. So there's no user-visible change in behaviour. > > > > > > > > ...but there is. The blocks you thought you freed, especially if the > > > > system was under -ENOSPC pressure, won't actually be free after the > > > > successful ftruncate(). > > > > > > They won't be free after something dirties the existing mmap either. > > > > > > Blocks also won't be free if you unlink a file that is currently still > > > open. > > > > > > This isn't really new behavior for a FS. > > > > An mmap write after a fault due to a hole punch is free to trigger > > SIGBUS if the subsequent page allocation fails. > > Isn't that already racy? If the mmap user is fast enough can't it > prevent the page from becoming freed in the first place today? No, it cannot. We block page faulting for the file (via a lock), tear down page tables, free pages and blocks. Then we resume faults and return SIGBUS (if the page ends up being after the new end of file in case of truncate) or do new page fault and fresh block allocation (which can end with SIGBUS if the filesystem cannot allocate new block to back the page). Honza -- Jan Kara SUSE Labs, CR