Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4916746imu; Wed, 19 Dec 2018 02:29:41 -0800 (PST) X-Google-Smtp-Source: AFSGD/VdKRDXZTyLHjBEjPpH777fxE7ay6BIwpte+fuok/f+1CrD3irT6OnmZzN3YDHIi3X5oV0G X-Received: by 2002:a17:902:bd4a:: with SMTP id b10mr19835642plx.232.1545215380969; Wed, 19 Dec 2018 02:29:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545215380; cv=none; d=google.com; s=arc-20160816; b=FVIYG9+BosiUlpAJ3N0YPAp1FDfGgyqlPAH6D+RW20nrFnlkpMiMNcTd12xrCxpsm8 hbdkbREtlkhAc0T6ev123SWnhiQfn2+UxgdszHaWU4D56YeTZed2MhKkBkrDjmcGKXKM nguWqOS2F8XgB5obMVUEcJoc/FdC1KHKFZFYXx+QmNGUjTkSiaDG6A12T2/bNuQJjH4L tQBJMVuyzsD2xmgb+P1qWzkeyjUZNjgfUTLejGY2wXft3RC6QctpuGWozh9PECVzxC18 9d9c9liVOyF35weYT4p9z+kgtaZZnkoRIUwh7fMAdex0nn801mQWEuvgb7ogRpUBz3Qh DV4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=luIQN7jCfDPFlYmVvbERlym/xSFYrvtg+20nvLh5hXE=; b=or3jdfEA3BTdSdrLD1Z715T7oOWgP8Xa6BW4HXhkMyPaY7wh9c16/3trlMNJ8MNVnv rHfOmMWURL+E8CDj0tm7Sbe2EnfBzYMUfFUOBkLy19kDIQDy+eOdfsTBQYBrsb3OL8Cg o03ODcBgTfttSKFWEHpoRpSjcQSCdOzzoqlrHkv/aUKwnBr6bK3+IvdphPjh4j+UMOO2 A4ODO+ibNZxk9aQmD0EMW7tjlvzGiWWoaKVYBXZAvmWgypsPA391VM01WSSvHxBggwZq JTf9FuEXsrwurTA9MzYsQ5F8vEU8GNDODjDDu28zDcDP0VDmYT4ycUjg0FLihjXXfzBC EMJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e34si15203898pgb.80.2018.12.19.02.29.23; Wed, 19 Dec 2018 02:29:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728996AbeLSK2b (ORCPT + 99 others); Wed, 19 Dec 2018 05:28:31 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:7356 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727071AbeLSK2b (ORCPT ); Wed, 19 Dec 2018 05:28:31 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail06.adl2.internode.on.net with ESMTP; 19 Dec 2018 20:58:27 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gZZ5V-0001Xp-Ur; Wed, 19 Dec 2018 21:28:25 +1100 Date: Wed, 19 Dec 2018 21:28:25 +1100 From: Dave Chinner To: Jason Gunthorpe Cc: Jan Kara , Jerome Glisse , John Hubbard , Matthew Wilcox , Dan Williams , John Hubbard , Andrew Morton , Linux MM , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Michal Hocko , mike.marciniszyn@intel.com, rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Message-ID: <20181219102825.GN6311@dastard> References: <3c4d46c0-aced-f96f-1bf3-725d02f11b60@nvidia.com> <20181208022445.GA7024@redhat.com> <20181210102846.GC29289@quack2.suse.cz> <20181212150319.GA3432@redhat.com> <20181212214641.GB29416@dastard> <20181214154321.GF8896@quack2.suse.cz> <20181216215819.GC10644@dastard> <20181218103306.GC18032@quack2.suse.cz> <20181218234254.GC31274@dastard> <20181219030329.GI21992@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181219030329.GI21992@ziepe.ca> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 18, 2018 at 08:03:29PM -0700, Jason Gunthorpe wrote: > On Wed, Dec 19, 2018 at 10:42:54AM +1100, Dave Chinner wrote: > > > Essentially, what we are talking about is how to handle broken > > hardware. I say we should just brun it with napalm and thermite > > (i.e. taint the kernel with "unsupportable hardware") and force > > wait_for_stable_page() to trigger when there are GUP mappings if > > the underlying storage doesn't already require it. > > If you want to ban O_DIRECT/etc from writing to file backed pages, > then just do it. O_DIRECT IO *isn't the problem*. iO_DIRECT IO uses a short term pin that the existing prefaulting during GUP works just fine for. The problem we have is the long term pins where pages can be cleaned while the pages are pinned. i.e. the use case we current have to disable for DAX because *we can't make it work sanely* without either revokable file leases and/or hardware that is able to trigger page faults when they need write access to a clean page. > Otherwise I'm not sure demanding some unrealistic HW design is > reasonable. ie nvme drives are not likely to add page faulting to > their IO path any time soon. Direct IO on nvme drives are not the problem. It's RDMA pinning pages for hours or days and expecting everyone else to jump through hoops to support their broken page access access model. > A SW architecture that relies on page faulting is just not going to > support real world block IO devices. The existing software architecture for file backed pages has been based around page faulting for write notifications since ~2005. That horse bolted many, many years ago. > GPUs and one RDMA are about the only things that can do this today, > and they are basically irrelevant to O_DIRECT. It's RDMA that we need these changes for, not O_DIRECT. Cheers, Dave. -- Dave Chinner david@fromorbit.com