Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753897AbbHKVs2 (ORCPT ); Tue, 11 Aug 2015 17:48:28 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:29208 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752160AbbHKVs0 (ORCPT ); Tue, 11 Aug 2015 17:48:26 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CSCwCubMpVPBkyLHldgxuBPYZSozgBAQEBAQEGmzcEAgKBSE0BAQEBAQEHAQEBAUABP4QkAQEEJxMcIxAIAxgJJQ8FJQMHGhOILc8wAQEBAQYCAR8ZhgaFMoUJB4MYgRQFlRCMZ4FNhz+FUIsyhDYsM4JMAQEB Date: Wed, 12 Aug 2015 07:48:22 +1000 From: Dave Chinner To: "Wilcox, Matthew R" Cc: Boaz Harrosh , Jan Kara , "Kirill A. Shutemov" , Andrew Morton , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Davidlohr Bueso Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock Message-ID: <20150811214822.GA20596@dastard> References: <1439219664-88088-1-git-send-email-kirill.shutemov@linux.intel.com> <1439219664-88088-3-git-send-email-kirill.shutemov@linux.intel.com> <20150811081909.GD2650@quack.suse.cz> <20150811093708.GB906@dastard> <20150811135004.GC2659@quack.suse.cz> <55CA0728.7060001@plexistor.com> <100D68C7BA14664A8938383216E40DE040914C3E@FMSMSX114.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <100D68C7BA14664A8938383216E40DE040914C3E@FMSMSX114.amr.corp.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2011 Lines: 44 On Tue, Aug 11, 2015 at 04:51:22PM +0000, Wilcox, Matthew R wrote: > The race that you're not seeing is page fault vs page fault. Two > threads each attempt to store a byte to different locations on the > same page. With a read-mutex to exclude truncates, each thread > calls ->get_block. One of the threads gets back a buffer marked > as BH_New and calls memset() to clear the page. The other thread > gets back a buffer which isn't marked as BH_New and simply inserts > the mapping, returning to userspace, which stores the byte ... > just in time for the other thread's memset() to write a zero over > the top of it. So, this is not a truncate race that the XFS MMAPLOCK solves. However, that doesn't mean that the DAX code needs to add locking to solve it. The race here is caused by block initialisation being unserialised after a ->get_block call allocates the block (which the filesystem serialises via internal locking). Hence two simultaneous ->get_block calls to the same block is guaranteed to have the DAX block initialisation race with the second ->get_block call that says the block is already allocated. IOWs, the way to handle this is to have the ->get_block call handle the block zeroing for new blocks instead of doing it after the fact in the generic DAX code where there is no fine-grained serialisation object available. By calling dax_clear_blocks() in the ->get_block callback, the filesystem can ensure that the second racing call will only make progress once the block has been fully initialised by the first call. IMO the fix is - again - to move the functionality into the filesystem where we already have the necessary exclusion in place to avoid this race condition entirely. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/