2017-05-10 08:54:15

by Jan Kara

[permalink] [raw]
Subject: [PATCH 0/4 v4] mm,dax: Fix data corruption due to mmap inconsistency

Hello,

this series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of sync
with block mappings in the filesystem and thus data seen through mmap is
different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and also
other mmap related tests on DAX filesystem.

Andrew, can you please merge these patches? Thanks!

Changes since v3:
* Rebased on top of current Linus' tree due to non-trivial conflicts with
added tracepoint

Changes since v2:
* Added reviewed-by tag from Ross

Changes since v1:
* Improved performance of unmapping pages
* Changed fault locking to fix another write vs fault race

Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>


2017-05-10 08:54:26

by Jan Kara

[permalink] [raw]
Subject: [PATCH 2/4] mm: Fix data corruption due to stale mmap reads

Currently, we didn't invalidate page tables during
invalidate_inode_pages2() for DAX. That could result in e.g. 2MiB zero
page being mapped into page tables while there were already underlying
blocks allocated and thus data seen through mmap were different from
data seen by read(2). The following sequence reproduces the problem:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.

Fix the problem by unconditionally calling
invalidate_inode_pages2_range() in dax_iomap_actor() for new block
allocations and by properly invalidating page tables in
invalidate_inode_pages2_range() for DAX mappings.

Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55
CC: [email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/dax.c | 2 +-
mm/truncate.c | 12 +++++++++++-
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 38deebb8c86e..123d9903c77d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1015,7 +1015,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
* into page tables. We have to tear down these mappings so that data
* written by write(2) is visible in mmap.
*/
- if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+ if (iomap->flags & IOMAP_F_NEW) {
invalidate_inode_pages2_range(inode->i_mapping,
pos >> PAGE_SHIFT,
(end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index 706cff171a15..6479ed2afc53 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -686,7 +686,17 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
cond_resched();
index++;
}

2017-05-10 17:27:55

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH 0/4 v4] mm,dax: Fix data corruption due to mmap inconsistency

On Wed, May 10, 2017 at 10:54:15AM +0200, Jan Kara wrote:
> Hello,
>
> this series fixes data corruption that can happen for DAX mounts when
> page faults race with write(2) and as a result page tables get out of sync
> with block mappings in the filesystem and thus data seen through mmap is
> different from data seen through read(2).
>
> The series passes testing with t_mmap_stale test program from Ross and also
> other mmap related tests on DAX filesystem.
>
> Andrew, can you please merge these patches? Thanks!
>
> Changes since v3:
> * Rebased on top of current Linus' tree due to non-trivial conflicts with
> added tracepoint

Cool, the merge update looks correct to me.