Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752466AbbLHTTR (ORCPT ); Tue, 8 Dec 2015 14:19:17 -0500 Received: from mga02.intel.com ([134.134.136.20]:42920 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752135AbbLHTTO (ORCPT ); Tue, 8 Dec 2015 14:19:14 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,400,1444719600"; d="scan'208";a="9746586" From: Ross Zwisler To: linux-kernel@vger.kernel.org Cc: Ross Zwisler , "H. Peter Anvin" , "J. Bruce Fields" , "Theodore Ts'o" , Alexander Viro , Andreas Dilger , Dave Chinner , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org, xfs@oss.sgi.com, Andrew Morton , Dan Williams , Matthew Wilcox , Dave Hansen Subject: [PATCH v3 0/7] DAX fsync/msync support Date: Tue, 8 Dec 2015 12:18:38 -0700 Message-Id: <1449602325-20572-1-git-send-email-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.5.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3804 Lines: 78 This patch series adds a slimmed down version of fsync/msync support to DAX. The major change versus v2 of this patch series is that we no longer remove DAX entries from the radix tree during fsync/msync calls. Instead the list of DAX entries in the radix tree grows for the lifetime of the mapping. We reclaim DAX entries from the radix tree via clear_exceptional_entry() for truncate, when the filesystem is unmounted, etc. This change was made because if we try and remove radix tree entries during writeback operations there are a number of race conditions that exist between those writeback operations and page faults. In the non-DAX case these races are dealt with using the page lock, but we don't have a good replacement lock with the same granularity. These races could leave us in a place where we have a DAX page that is dirty and writeable from userspace but no longer in the radix tree. This page would then be skipped during subsequent writeback operations, which is unacceptable. I do plan to continue to try and solve these race conditions so that we can have a more optimal fsync/msync solution for DAX, but I wanted to get this set out for v4.5 consideration while I continued working. While suboptimal the solution in this series gives us correct behavior for DAX fsync/msync and seems like a reasonable short term compromise. This series is built upon v4.4-rc4 plus the recent ext4 DAX series from Jan Kara (http://www.spinics.net/lists/linux-ext4/msg49951.html) and a recent XFS fix from Dave Chinner (https://lkml.org/lkml/2015/12/2/923). The tree with all this working can be found here: https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=fsync_v3 Other changes versus v2: - Renamed dax_fsync() to dax_writeback_mapping_range(). (Dave Chinner) - Removed REQ_FUA/REQ_FLUSH support from the PMEM driver and instead just make the call to wmb_pmem() in dax_writeback_mapping_range(). (Dan) - Reworked some BUG_ON() calls to be a WARN_ON() followed by an error return. - Moved call to dax_writeback_mapping_range() from the filesystems down into filemap_write_and_wait_range(). (Dave Chinner) - Fixed handling of DAX read faults so they create a radix tree entry but don't mark it as dirty until the follow-up dax_pfn_mkwrite() call. - Update clear_exceptional_entry() and to dax_writeback_one() so they validate the DAX radix tree entry before they use it. (Dave Chinner) - Added a comment to find_get_entries_tag() to explain the restart condition. (Dave Chinner) Ross Zwisler (7): pmem: add wb_cache_pmem() to the PMEM API dax: support dirty DAX entries in radix tree mm: add find_get_entries_tag() dax: add support for fsync/sync ext2: call dax_pfn_mkwrite() for DAX fsync/msync ext4: call dax_pfn_mkwrite() for DAX fsync/msync xfs: call dax_pfn_mkwrite() for DAX fsync/msync arch/x86/include/asm/pmem.h | 11 ++-- fs/block_dev.c | 3 +- fs/dax.c | 147 ++++++++++++++++++++++++++++++++++++++++++-- fs/ext2/file.c | 4 +- fs/ext4/file.c | 4 +- fs/inode.c | 1 + fs/xfs/xfs_file.c | 7 ++- include/linux/dax.h | 7 +++ include/linux/fs.h | 1 + include/linux/pagemap.h | 3 + include/linux/pmem.h | 22 ++++++- include/linux/radix-tree.h | 9 +++ mm/filemap.c | 84 ++++++++++++++++++++++++- mm/truncate.c | 64 +++++++++++-------- 14 files changed, 319 insertions(+), 48 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/