From: Dan Williams Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Date: Mon, 2 May 2016 09:49:59 -0700 Message-ID: References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> <57277EDA.9000803@plexistor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Vishal Verma , "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , Jens Axboe , Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 To: Boaz Harrosh Return-path: In-Reply-To: <57277EDA.9000803@plexistor.com> Sender: owner-linux-mm@kvack.org List-Id: linux-ext4.vger.kernel.org On Mon, May 2, 2016 at 9:22 AM, Boaz Harrosh wrote: > On 05/02/2016 07:01 PM, Dan Williams wrote: >> On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >>> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>>> handle media errors, and thus cannot provide a recovery path that can >>>> send a write through the driver to clear errors. >>>> >>>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>>> path for DAX filesystems, use the same direct_IO path for both DAX and >>>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>>> direct_IO path instead of DAX. >>>> >>> >>> Really? What are your thinking here? >>> >>> What about all the current users of O_DIRECT, you have just made them >>> 4 times slower and "less concurrent*" then "buffred io" users. Since >>> direct_IO path will queue an IO request and all. >>> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >>> >>> I hate it that you overload the semantics of a known and expected >>> O_DIRECT flag, for special pmem quirks. This is an incompatible >>> and unrelated overload of the semantics of O_DIRECT. >> >> I think it is the opposite situation, it us undoing the premature >> overloading of O_DIRECT that went in without performance numbers. > > We have tons of measurements. Is not hard to imagine the results though. > Specially the 1000 threads case > >> This implementation clarifies that dax_do_io() handles the lack of a >> page cache for buffered I/O and O_DIRECT behaves as it nominally would >> by sending an I/O to the driver. > >> It has the benefit of matching the >> error semantics of a typical block device where a buffered write could >> hit an error filling the page cache, but an O_DIRECT write potentially >> triggers the drive to remap the block. >> > > I fail to see how in writes the device error semantics regarding remapping of > blocks is any different between buffered and direct IO. As far as the block > device it is the same exact code path. All The big difference is higher in the > VFS. > > And ... So you are willing to sacrifice the 99% hotpath for the sake of the > 1% error path? and piggybacking on poor O_DIRECT. > > Again there are tons of O_DIRECT apps out there, why are you forcing them to > change if they want true pmem performance? This isn't forcing them to change. This is the path of least surprise as error semantics are identical to a typical block device. Yes, an application can go faster by switching to the "buffered" / dax_do_io() path it can go even faster to switch to mmap() I/O and use DAX directly. If we can later optimize the O_DIRECT path to bring it's performance more in line with dax_do_io(), great, but the implementation should be correct first and optimized later. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org