Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1760129ybb; Thu, 9 Apr 2020 08:31:15 -0700 (PDT) X-Google-Smtp-Source: APiQypK78qoHo+8YNqfpRl7AOd+/4O3HDQ/i0zvh0Y0//m6nmjLx3kpP6dVs5vcbBH1C7lgX6G68 X-Received: by 2002:a05:620a:8d0:: with SMTP id z16mr329115qkz.483.1586446275360; Thu, 09 Apr 2020 08:31:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586446275; cv=none; d=google.com; s=arc-20160816; b=Fey/e0SzCxFcgV6M/q/Jydjaczbd/vnNKfHWkBlv92GGldO9CMXJ7FQuVcVUM6f0UO cP6of0NWNtegiyHDgpraHHs1WM1fvNkMWBYFXemOzPsrzexjUl1+Agi5dr0l1w4f1MTJ RVC5IgLNXD+kDCQtNJAa4Zr2I6vZmxvQt00BhWVYqBiABTJvRt7tAQr1rnG2Gt+pXA6Y 4GERBzGQ6vKA0RYStgH4ma94TwQoJ9wcjU8NeeH6On32rJ9Yw7BCIcLXrx3riFrWHuWe YxKSN871D0nbfJQbVAZuL425Jv0/8tVe9g2hsVXohzEJ2gJeHY4ND60nxdF0+l4iJT8f iaug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=eLYgjgVykMl4F5s3TzU9YbvrmBZvnhfMfgWN2SvAAP4=; b=vgztQzGCzILcvdBQT4xe9PVD/GecIDCMlSd4dB9/OcN5UTd99v+8GLjcaSgc17Pf+E d8E1FYH7E7S0lH6uU3wYUiYI068UF+rmyu6ZdGK2yd6SFWyT8HvtceDSrrrs7US9aUhi KfTCrnDeuMziPHhoKyiB8fIpnLQecghvelJP+od/Cj/Fk4MpmAdgjscduj1uxCbzXiF2 tAFjW6oubyjeAZGcedUElR4J1YuJuBRbmo1M/LbLBNPrl7MtVcJSmxez1NDp9+qwqA+T 9P+BdQVG3WoBCb1KkhGgutq+ATWO+PMYI66pdxpXoo9+WB1S7qB31t6p0pduun8n7HnF aXGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z5si6260549qva.213.2020.04.09.08.30.50; Thu, 09 Apr 2020 08:31:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728176AbgDIP3q (ORCPT + 99 others); Thu, 9 Apr 2020 11:29:46 -0400 Received: from mga06.intel.com ([134.134.136.31]:25819 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728251AbgDIP3p (ORCPT ); Thu, 9 Apr 2020 11:29:45 -0400 IronPort-SDR: EAqBOIe7A8WuZBFNhpSADYFgv8RtMqLOO8LYbUuxTf2gl4oEa0pBId5PY5VPC6BIo0KinzWqWE 1GGvnWeJC/3w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2020 08:29:45 -0700 IronPort-SDR: VicGFp3oywAC4FzoKR8Eu5U9ZwW9zVzpVKjiSxXCh5g9tiamiE0eUd3nl+fKuSZ4hQgRPKbUVu e/BKwKKZUkaQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,363,1580803200"; d="scan'208";a="286922766" Received: from iweiny-desk2.sc.intel.com ([10.3.52.147]) by fmsmga002.fm.intel.com with ESMTP; 09 Apr 2020 08:29:45 -0700 Date: Thu, 9 Apr 2020 08:29:44 -0700 From: Ira Weiny To: "Darrick J. Wong" Cc: Dave Chinner , linux-kernel@vger.kernel.org, Dan Williams , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , Jeff Moyer , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() Message-ID: <20200409152944.GA801705@iweiny-DESK2.sc.intel.com> References: <20200407182958.568475-1-ira.weiny@intel.com> <20200407182958.568475-7-ira.weiny@intel.com> <20200408020827.GI24067@dread.disaster.area> <20200408170923.GC569068@iweiny-DESK2.sc.intel.com> <20200408210236.GK24067@dread.disaster.area> <20200408220734.GA664132@iweiny-DESK2.sc.intel.com> <20200408232106.GO24067@dread.disaster.area> <20200409001206.GD664132@iweiny-DESK2.sc.intel.com> <20200409003021.GJ6742@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200409003021.GJ6742@magnolia> User-Agent: Mutt/1.11.1 (2018-12-01) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Apr 08, 2020 at 05:30:21PM -0700, Darrick J. Wong wrote: [snip] > > But you're right, this thing keeps swirling around and around and around > because we can't ever get to agreement on this. Maybe I'll just become > XFS BOFH MAINTAINER and make a decision like this: > > 1 Applications must call statx to discover the current S_DAX state. > > 2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on > the parent directory FS_XFLAG_DAX inode flag. This advisory flag can be > changed after file creation, but it does not immediately affect the S_DAX > state. > > If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at > inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX. > Unless overridden... > > 3 There exists a dax= mount option. > > "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX" > "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX" > "-o dax" by itself means "dax=always" > "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default per-Dave '-o dax=inode' > > 4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be > changed at any time. The flag state is copied into any files or > subdirectories when they are created within that directory. Good. > If programs > require file access runs in S_DAX mode, they must create those files > inside a directory with FS_XFLAG_DAX set, or mount the fs with an > appropriate dax mount option. Why do we need this to be true? If the FS_XFLAG_DAX flag can be cleared why not set it and allow the S_DAX change to occur later just like clearing it? The logic is exactly the same. > > 5 Programs that require a specific file access mode (DAX or not DAX) must s/must/can/ > do one of the following: > > (a) create files in directories with the FS_XFLAG_DAX flag set as needed; Again if we allow clearing the flag why not setting? So this is 1 option they 'can' do. > > (b) have the administrator set an override via mount option; > > (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not > match the S_DAX state (as reported by statx), they must cause the > kernel to evict the inode from memory. This can be done by: > > i> closing the file; > ii> re-opening the file and using statx to see if the fs has > changed the S_DAX flag; i and ii need to be 1 step the user must follow. > iii> if not, either unmount and remount the filesystem, or > closing the file and using drop_caches. > > 6 I no longer think it's too wild to require that users who want to > squeeze every last bit of performance out of the particular rough and > tumble bits of their storage also be exposed to the difficulties of > what happens when the operating system can't totally virtualize those > hardware capabilities. Your high performance sports car is not a > Toyota minivan, as it were. I'm good with this statement. But I think we need to clean up the verbiage for the documentation... ;-) Thanks for the summary. I like these to get everyone on the same page. :-D Ira > > I think (like Dave said) that if you set XFS_IDONTCACHE on the inode > when you change the DAX flag, the VFS will kill the inode the instant > the last user close()s the file. Then 5.c.ii will actually work. > > --D > > > > > > > > Furthermore, if we did want an interface like that why not allow > > > > the on-disk flag to be set as well as cleared? > > > > > > Well, why not - it's why I implemented the flag in the first place! > > > The only problem we have here is how to safely change the in-memory > > > DAX state, and that largely has nothing to do with setting/clearing > > > the on-disk flag.... > > > > With the above change to xfs_diflags_to_iflags() I think we are ok here. > > > > Ira > >