Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3608660ybg; Sun, 20 Oct 2019 17:46:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6Nx7P7Vr1oaF7EgHZAVp75KNf/Vhq/KPfvU0RYNADWaDRY6QMF7huSUUMU48GvfhsgXEu X-Received: by 2002:a17:906:5f8d:: with SMTP id a13mr8960410eju.11.1571618769941; Sun, 20 Oct 2019 17:46:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571618769; cv=none; d=google.com; s=arc-20160816; b=Wn1/ll2BLTPNI56Mb8FZYqj26YVB9286x5NVbFrvbvytYn/0A4pgQWH9rKh0IVGUJC ZY4h2eNTnERThuEszrziPGqqCiAcnZDrp4psYQeVO8JhU97lDm+38NMLNpK7kg/2Or59 +julCxAt6v42OyCNmUZZa4MJ3lJfsfAaY6QqDnyXEvr2Kne/XwYk5re9S9oLc772gZu+ Lsf2bVzkRKq0YAiJZ1dFisL/hHXhrCshq98QCStzyYQR2xu7gsdME9Hmi2qN7DuR7bJ3 Qymx6OQOHRdCrMGEJ5KPNDfBNyAyU0+ZkQmfQM45m6oMLGugYUhG+T0TVk1lGbd0xMsD MUXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qJdTQSNpokA/DuY6pIbjEJ4pSbGEapqCtnpg1GWuOh0=; b=RArtagrjT3Md1RsyETNSEN+2J4HMV8lqpijtR92BbCerpWSq2U1hayMEh6e6H06JId LSM8GEryMlwRyxuE0rOfO/YRJCI2uoendViRatD0T6h6msapZaacQafNZxcm4DkgZVqY 50D0+zwdOxKnOVWgKw/8g0epvHX9WFflx+mW7wClAaLqI+hZJ5c6GsxSLZUiKYWKFyIy tzXQxA1o0j9VRRtQkCR4W+DoUZJeIrSZEsfGjjBthNbEB1n8FQJyIZX4GZGJbULveEH4 gAVx8NXbQjQdnyp4J/+2y1zD36ZswAmBY1Wm38GBaJlbdLCvEXPfhZNo7eB6YyspUr/C KreA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g15si8299109edm.349.2019.10.20.17.45.45; Sun, 20 Oct 2019 17:46:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726767AbfJUApm (ORCPT + 99 others); Sun, 20 Oct 2019 20:45:42 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:35038 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726597AbfJUApm (ORCPT ); Sun, 20 Oct 2019 20:45:42 -0400 Received: from dread.disaster.area (pa49-180-40-48.pa.nsw.optusnet.com.au [49.180.40.48]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id EA6A936401C; Mon, 21 Oct 2019 11:45:37 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1iMLpI-0002rU-GA; Mon, 21 Oct 2019 11:45:36 +1100 Date: Mon, 21 Oct 2019 11:45:36 +1100 From: Dave Chinner To: ira.weiny@intel.com Cc: linux-kernel@vger.kernel.org, Alexander Viro , "Darrick J. Wong" , Dan Williams , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag Message-ID: <20191021004536.GD8015@dread.disaster.area> References: <20191020155935.12297-1-ira.weiny@intel.com> <20191020155935.12297-6-ira.weiny@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191020155935.12297-6-ira.weiny@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=G6BsK5s5 c=1 sm=1 tr=0 a=y881pOMu+B+mZdf5UrsJdA==:117 a=y881pOMu+B+mZdf5UrsJdA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=XobE76Q3jBoA:10 a=QyXUC8HyAAAA:8 a=7-415B0cAAAA:8 a=hWqqfTRvkZgBDQNmCT4A:9 a=cV2GjQ2AOK1rOS_g:21 a=QTE3axTs5XAi6-A_:21 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 20, 2019 at 08:59:35AM -0700, ira.weiny@intel.com wrote: > From: Ira Weiny > > Switching between DAX and non-DAX on a file is racy with respect to data > operations. However, if no data is involved the flag is safe to switch. > > Allow toggling the physical flag if a file is empty. The file length > check is not racy with respect to other operations as it is performed > under XFS_MMAPLOCK_EXCL and XFS_IOLOCK_EXCL locks. > > Signed-off-by: Ira Weiny > --- > fs/xfs/xfs_ioctl.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c > index d3a7340d3209..3839684c249b 100644 > --- a/fs/xfs/xfs_ioctl.c > +++ b/fs/xfs/xfs_ioctl.c > @@ -33,6 +33,7 @@ > #include "xfs_sb.h" > #include "xfs_ag.h" > #include "xfs_health.h" > +#include "libxfs/xfs_dir2.h" > > #include > #include > @@ -1232,12 +1233,10 @@ xfs_diflags_to_linux( > inode->i_flags |= S_NOATIME; > else > inode->i_flags &= ~S_NOATIME; > -#if 0 /* disabled until the flag switching races are sorted out */ > if (xflags & FS_XFLAG_DAX) > inode->i_flags |= S_DAX; > else > inode->i_flags &= ~S_DAX; > -#endif This code has bit-rotted. See xfs_setup_iops(), where we now have a different inode->i_mapping->a_ops for DAX inodes. That, fundamentally, is the issue here - it's not setting/clearing the DAX flag that is the issue, it's doing a swap of the mapping->a_ops while there may be other code using that ops structure. IOWs, if there is any code anywhere in the kernel that calls an address space op without holding one of the three locks we hold here (i_rwsem, MMAPLOCK, ILOCK) then it can race with the swap of the address space operations. By limiting the address space swap to file sizes of zero, we rule out the page fault path (mmap of a zero length file segv's with an access beyond EOF on the first read/write page fault, right?). However, other aops callers that might run unlocked and do the wrong thing if the aops pointer is swapped between check of the aop method existing and actually calling it even if the file size is zero? A quick look shows that FIBMAP (ioctl_fibmap())) looks susceptible to such a race condition with the current definitions of the XFS DAX aops. I'm guessing there will be others, but I haven't looked further than this... > /* lock, flush and invalidate mapping in preparation for flag change */ > xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); > + > + if (i_size_read(inode) != 0) { > + error = -EOPNOTSUPP; > + goto out_unlock; > + } Wrong error. Should be the same as whatever is returned when we try to change the extent size hint and can't because the file is non-zero in length (-EINVAL, I think). Also needs a comment explainging why this check exists, and probably better written as i_size_read() > 0 .... Cheers, Dave. -- Dave Chinner david@fromorbit.com