Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1433322AbdDYW7u (ORCPT ); Tue, 25 Apr 2017 18:59:50 -0400 Received: from mga05.intel.com ([192.55.52.43]:18527 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1431465AbdDYW7j (ORCPT ); Tue, 25 Apr 2017 18:59:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,251,1488873600"; d="scan'208";a="92163170" Date: Tue, 25 Apr 2017 16:59:36 -0600 From: Ross Zwisler To: Jan Kara Cc: Ross Zwisler , Andrew Morton , linux-kernel@vger.kernel.org, Alexander Viro , Alexey Kuznetsov , Andrey Ryabinin , Anna Schumaker , Christoph Hellwig , Dan Williams , "Darrick J. Wong" , Eric Van Hensbergen , Jens Axboe , Johannes Weiner , Konrad Rzeszutek Wilk , Latchesar Ionkov , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-nvdimm@ml01.01.org, Matthew Wilcox , Ron Minnich , samba-technical@lists.samba.org, Steve French , Trond Myklebust , v9fs-developer@lists.sourceforge.net Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads Message-ID: <20170425225936.GA29655@linux.intel.com> Mail-Followup-To: Ross Zwisler , Jan Kara , Andrew Morton , linux-kernel@vger.kernel.org, Alexander Viro , Alexey Kuznetsov , Andrey Ryabinin , Anna Schumaker , Christoph Hellwig , Dan Williams , "Darrick J. Wong" , Eric Van Hensbergen , Jens Axboe , Johannes Weiner , Konrad Rzeszutek Wilk , Latchesar Ionkov , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-nvdimm@lists.01.org, Matthew Wilcox , Ron Minnich , samba-technical@lists.samba.org, Steve French , Trond Myklebust , v9fs-developer@lists.sourceforge.net References: <20170420191446.GA21694@linux.intel.com> <20170421034437.4359-1-ross.zwisler@linux.intel.com> <20170421034437.4359-2-ross.zwisler@linux.intel.com> <20170425111043.GH2793@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170425111043.GH2793@quack2.suse.cz> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1525 Lines: 40 On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote: <> > Hum, but now thinking more about it I have hard time figuring out why write > vs fault cannot actually still race: > > CPU1 - write(2) CPU2 - read fault > > dax_iomap_pte_fault() > ->iomap_begin() - sees hole > dax_iomap_rw() > iomap_apply() > ->iomap_begin - allocates blocks > dax_iomap_actor() > invalidate_inode_pages2_range() > - there's nothing to invalidate > grab_mapping_entry() > - we add zero page in the radix > tree & map it to page tables > > Similarly read vs write fault may end up racing in a wrong way and try to > replace already existing exceptional entry with a hole page? Yep, this race seems real to me, too. This seems very much like the issues that exist when a thread is doing direct I/O. One thread is doing I/O to an intermediate buffer (page cache for direct I/O case, zero page for us), and the other is going around it directly to media, and they can get out of sync. IIRC the direct I/O code looked something like: 1/ invalidate existing mappings 2/ do direct I/O to media 3/ invalidate mappings again, just in case. Should be cheap if there weren't any conflicting faults. This makes sure any new allocations we made are faulted in. I guess one option would be to replicate that logic in the DAX I/O path, or we could try and enhance our locking so page faults can't race with I/O since both can allocate blocks. I'm not sure, but will think on it.