Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp2495873ybx; Fri, 8 Nov 2019 05:13:53 -0800 (PST) X-Google-Smtp-Source: APXvYqxUsftXEApAl28GNvhIeZdK+gCaf8Iotw6HvTtE3xfqZE2ZxcRI51gdZcyV/JGVsIxsa3Zj X-Received: by 2002:aa7:c65a:: with SMTP id z26mr10163405edr.261.1573218833494; Fri, 08 Nov 2019 05:13:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573218833; cv=none; d=google.com; s=arc-20160816; b=TwWHCVl6OcRXZWaGvYyzSk0f5hAoLzqNluZezwIl5Tw3qC6Vhf/N3Sqm/SQ199gt/2 a0P/kSK76HJcntTOwNLoWSgMT+qx4QYsaPc9Hw1M2thgeDDjFKHYbeBfrQDv14HcWdM5 G8o0hqJ+GnXr1G82hy20BkfK26/h62JiT+3Lccq+p3eiDy3gCEjDWffI8jfvqq6Qe944 nR7sRdl9luHXTPwBmyiadbo+4ahgXOPSJSt0lYv4SwZh1AmYKgUxTjkxPtEiQ6wHXrDw eb+Ul/4rVXxrjZJmtqQgL9DFeOdBxDvWU1HkWk4KJfiR8ddRM2FJAg4K1X1cPj21K9qW 8HwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2745/huCXTCYuJtbobi6lAe4aM3Q7n+AxqK8vjMRRIU=; b=afIEn1yiHXZNMTSDJaOhoExikdWXFan2jz1zXKDfmYE1GVnbFR+8yO+kxYhC3zze+K i45q2R7o0OMy0BFyzjvG0ey1eNDy8a6cFBb1i8QU4kCEEedMlgtqLvJDhL5QW4iusXP1 YGE1t8ktBXGaWCsM/PDN2UEPp2geazFLN/+Y9ivefSGK+LWL/AYJbyxkcPfWHOlIEiCJ SmXMYBvVh78qjKdtS/E4cLJfunq+WEuVXd7x3Tn7d3dyeuCUI5QrwOUrHF9BdJ5fWUoX IYm3R12Fg0dUpNKXg8LTEnATWoX9fglpvhJgLCs4ZkiG0eg68yNtMGYMVhVmG9fYKzbJ 6QfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p17si3943980ejm.4.2019.11.08.05.13.30; Fri, 08 Nov 2019 05:13:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727655AbfKHNMl (ORCPT + 99 others); Fri, 8 Nov 2019 08:12:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:46874 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726445AbfKHNMl (ORCPT ); Fri, 8 Nov 2019 08:12:41 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 15F75AE2A; Fri, 8 Nov 2019 13:12:39 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 8C0061E3BE4; Fri, 8 Nov 2019 14:12:38 +0100 (CET) Date: Fri, 8 Nov 2019 14:12:38 +0100 From: Jan Kara To: Ira Weiny Cc: Dave Chinner , linux-kernel@vger.kernel.org, Alexander Viro , "Darrick J. Wong" , Dan Williams , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag Message-ID: <20191108131238.GK20863@quack2.suse.cz> References: <20191020155935.12297-1-ira.weiny@intel.com> <20191020155935.12297-6-ira.weiny@intel.com> <20191021004536.GD8015@dread.disaster.area> <20191021224931.GA25526@iweiny-DESK2.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191021224931.GA25526@iweiny-DESK2.sc.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 21-10-19 15:49:31, Ira Weiny wrote: > On Mon, Oct 21, 2019 at 11:45:36AM +1100, Dave Chinner wrote: > > On Sun, Oct 20, 2019 at 08:59:35AM -0700, ira.weiny@intel.com wrote: > > That, fundamentally, is the issue here - it's not setting/clearing > > the DAX flag that is the issue, it's doing a swap of the > > mapping->a_ops while there may be other code using that ops > > structure. > > > > IOWs, if there is any code anywhere in the kernel that > > calls an address space op without holding one of the three locks we > > hold here (i_rwsem, MMAPLOCK, ILOCK) then it can race with the swap > > of the address space operations. > > > > By limiting the address space swap to file sizes of zero, we rule > > out the page fault path (mmap of a zero length file segv's with an > > access beyond EOF on the first read/write page fault, right?). > > Yes I checked that and thought we were safe here... > > > However, other aops callers that might run unlocked and do the wrong > > thing if the aops pointer is swapped between check of the aop method > > existing and actually calling it even if the file size is zero? > > > > A quick look shows that FIBMAP (ioctl_fibmap())) looks susceptible > > to such a race condition with the current definitions of the XFS DAX > > aops. I'm guessing there will be others, but I haven't looked > > further than this... > > I'll check for others and think on what to do about this. ext4 will have the > same problem I think. :-( Just as a datapoint, ext4 is bold and sets inode->i_mapping->a_ops on existing inodes when switching journal data flag and so far it has not blown up. What we did to deal with issues Dave describes is that we introduced percpu rw-semaphore guarding switching of aops and then inside problematic functions redirect callbacks in the right direction under this semaphore. Somewhat ugly but it seems to work. Honza -- Jan Kara SUSE Labs, CR