2022-12-01 15:40:20

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 0/8] fsdax,xfs: fix warning messages

Changes since v1:
1. Added a snippet of the warning message and some of the failed cases
2. Separated the patch for easily review
3. Added page->share and its helper functions
4. Included the patch[1] that removes the restrictions of fsdax and reflink
[1] https://lore.kernel.org/linux-xfs/[email protected]/

Many testcases failed in dax+reflink mode with warning message in dmesg.
Such as generic/051,075,127. The warning message is like this:
[ 775.509337] ------------[ cut here ]------------
[ 775.509636] WARNING: CPU: 1 PID: 16815 at fs/dax.c:386 dax_insert_entry.cold+0x2e/0x69
[ 775.510151] Modules linked in: auth_rpcgss oid_registry nfsv4 algif_hash af_alg af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables x_tables dax_pmem nd_pmem nd_btt sch_fq_codel configfs xfs libcrc32c fuse
[ 775.524288] CPU: 1 PID: 16815 Comm: fsx Kdump: loaded Tainted: G W 6.1.0-rc4+ #164 eb34e4ee4200c7cbbb47de2b1892c5a3e027fd6d
[ 775.524904] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.0-3-3 04/01/2014
[ 775.525460] RIP: 0010:dax_insert_entry.cold+0x2e/0x69
[ 775.525797] Code: c7 c7 18 eb e0 81 48 89 4c 24 20 48 89 54 24 10 e8 73 6d ff ff 48 83 7d 18 00 48 8b 54 24 10 48 8b 4c 24 20 0f 84 e3 e9 b9 ff <0f> 0b e9 dc e9 b9 ff 48 c7 c6 a0 20 c3 81 48 c7 c7 f0 ea e0 81 48
[ 775.526708] RSP: 0000:ffffc90001d57b30 EFLAGS: 00010082
[ 775.527042] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000042
[ 775.527396] RDX: ffffea000a0f6c80 RSI: ffffffff81dfab1b RDI: 00000000ffffffff
[ 775.527819] RBP: ffffea000a0f6c40 R08: 0000000000000000 R09: ffffffff820625e0
[ 775.528241] R10: ffffc90001d579d8 R11: ffffffff820d2628 R12: ffff88815fc98320
[ 775.528598] R13: ffffc90001d57c18 R14: 0000000000000000 R15: 0000000000000001
[ 775.528997] FS: 00007f39fc75d740(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
[ 775.529474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 775.529800] CR2: 00007f39fc772040 CR3: 0000000107eb6001 CR4: 00000000003706e0
[ 775.530214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 775.530592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 775.531002] Call Trace:
[ 775.531230] <TASK>
[ 775.531444] dax_fault_iter+0x267/0x6c0
[ 775.531719] dax_iomap_pte_fault+0x198/0x3d0
[ 775.532002] __xfs_filemap_fault+0x24a/0x2d0 [xfs aa8d25411432b306d9554da38096f4ebb86bdfe7]
[ 775.532603] __do_fault+0x30/0x1e0
[ 775.532903] do_fault+0x314/0x6c0
[ 775.533166] __handle_mm_fault+0x646/0x1250
[ 775.533480] handle_mm_fault+0xc1/0x230
[ 775.533810] do_user_addr_fault+0x1ac/0x610
[ 775.534110] exc_page_fault+0x63/0x140
[ 775.534389] asm_exc_page_fault+0x22/0x30
[ 775.534678] RIP: 0033:0x7f39fc55820a
[ 775.534950] Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 0f 1f 44 00 00
[ 775.535839] RSP: 002b:00007ffc66a08118 EFLAGS: 00010202
[ 775.536157] RAX: 00007f39fc772001 RBX: 0000000000042001 RCX: 00000000000063c1
[ 775.536537] RDX: 0000000000006400 RSI: 00007f39fac42050 RDI: 00007f39fc772040
[ 775.536919] RBP: 0000000000006400 R08: 00007f39fc772001 R09: 0000000000042000
[ 775.537304] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
[ 775.537694] R13: 00007f39fc772000 R14: 0000000000006401 R15: 0000000000000003
[ 775.538086] </TASK>
[ 775.538333] ---[ end trace 0000000000000000 ]---

This also effects dax+noreflink mode if we run the test after a
dax+reflink test. So, the most urgent thing is solving the warning
messages.

With these fixes, most warning messages in dax_associate_entry() are
gone. But honestly, generic/388 will randomly failed with the warning.
The case shutdown the xfs when fsstress is running, and do it for many
times. I think the reason is that dax pages in use are not able to be
invalidated in time when fs is shutdown. The next time dax page to be
associated, it still remains the mapping value set last time. I'll keep
on solving it.

The warning message in dax_writeback_one() can also be fixed because of
the dax unshare.


Shiyang Ruan (8):
fsdax: introduce page->share for fsdax in reflink mode
fsdax: invalidate pages when CoW
fsdax: zero the edges if source is HOLE or UNWRITTEN
fsdax,xfs: set the shared flag when file extent is shared
fsdax: dedupe: iter two files at the same time
xfs: use dax ops for zero and truncate in fsdax mode
fsdax,xfs: port unshare to fsdax
xfs: remove restrictions for fsdax and reflink

fs/dax.c | 220 +++++++++++++++++++++++++------------
fs/xfs/xfs_ioctl.c | 4 -
fs/xfs/xfs_iomap.c | 6 +-
fs/xfs/xfs_iops.c | 4 -
fs/xfs/xfs_reflink.c | 8 +-
include/linux/dax.h | 2 +
include/linux/mm_types.h | 5 +-
include/linux/page-flags.h | 2 +-
8 files changed, 166 insertions(+), 85 deletions(-)

--
2.38.1


2022-12-01 15:40:53

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 7/8] fsdax,xfs: port unshare to fsdax

Implement unshare in fsdax mode: copy data from srcmap to iomap.

Signed-off-by: Shiyang Ruan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/dax.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.c | 8 +++++--
include/linux/dax.h | 2 ++
3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 354be56750c2..a57e320e7971 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1244,6 +1244,58 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
}
#endif /* CONFIG_FS_DAX_PMD */

+static s64 dax_unshare_iter(struct iomap_iter *iter)
+{
+ struct iomap *iomap = &iter->iomap;
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ loff_t pos = iter->pos;
+ loff_t length = iomap_length(iter);
+ int id = 0;
+ s64 ret = 0;
+ void *daddr = NULL, *saddr = NULL;
+
+ /* don't bother with blocks that are not shared to start with */
+ if (!(iomap->flags & IOMAP_F_SHARED))
+ return length;
+ /* don't bother with holes or unwritten extents */
+ if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+ return length;
+
+ id = dax_read_lock();
+ ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL);
+ if (ret < 0)
+ goto out_unlock;
+
+ ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL);
+ if (ret < 0)
+ goto out_unlock;
+
+ ret = copy_mc_to_kernel(daddr, saddr, length);
+ if (ret)
+ ret = -EIO;
+
+out_unlock:
+ dax_read_unlock(id);
+ return ret;
+}
+
+int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len,
+ const struct iomap_ops *ops)
+{
+ struct iomap_iter iter = {
+ .inode = inode,
+ .pos = pos,
+ .len = len,
+ .flags = IOMAP_WRITE | IOMAP_UNSHARE | IOMAP_DAX,
+ };
+ int ret;
+
+ while ((ret = iomap_iter(&iter, ops)) > 0)
+ iter.processed = dax_unshare_iter(&iter);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dax_file_unshare);
+
static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size)
{
const struct iomap *iomap = &iter->iomap;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 93bdd25680bc..fe46bce8cae6 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1693,8 +1693,12 @@ xfs_reflink_unshare(

inode_dio_wait(inode);

- error = iomap_file_unshare(inode, offset, len,
- &xfs_buffered_write_iomap_ops);
+ if (IS_DAX(inode))
+ error = dax_file_unshare(inode, offset, len,
+ &xfs_dax_write_iomap_ops);
+ else
+ error = iomap_file_unshare(inode, offset, len,
+ &xfs_buffered_write_iomap_ops);
if (error)
goto out;

diff --git a/include/linux/dax.h b/include/linux/dax.h
index ba985333e26b..2b5ecb591059 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -205,6 +205,8 @@ static inline void dax_unlock_mapping_entry(struct address_space *mapping,
}
#endif

+int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len,
+ const struct iomap_ops *ops);
int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
const struct iomap_ops *ops);
int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
--
2.38.1

2022-12-01 15:58:44

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 4/8] fsdax,xfs: set the shared flag when file extent is shared

If a dax page is shared, mapread at different offsets can also trigger
page fault on same dax page. So, change the flag from "cow" to
"shared". And get the shared flag from filesystem when read.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 19 +++++++------------
fs/xfs/xfs_iomap.c | 2 +-
2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6b6e07ad8d80..f1eb59bee0b5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -846,12 +846,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter,
(iter->iomap.flags & IOMAP_F_DIRTY);
}

-static bool dax_fault_is_cow(const struct iomap_iter *iter)
-{
- return (iter->flags & IOMAP_WRITE) &&
- (iter->iomap.flags & IOMAP_F_SHARED);
-}
-
/*
* By this point grab_mapping_entry() has ensured that we have a locked entry
* of the appropriate size so we don't have to worry about downgrading PMDs to
@@ -865,13 +859,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
void *new_entry = dax_make_entry(pfn, flags);
- bool dirty = !dax_fault_is_synchronous(iter, vmf->vma);
- bool cow = dax_fault_is_cow(iter);
+ bool write = iter->flags & IOMAP_WRITE;
+ bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma);
+ bool shared = iter->iomap.flags & IOMAP_F_SHARED;

if (dirty)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);

- if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
+ if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
unsigned long index = xas->xa_index;
/* we are replacing a zero page with block mapping */
if (dax_is_pmd_entry(entry))
@@ -883,12 +878,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,

xas_reset(xas);
xas_lock_irq(xas);
- if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
+ if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
void *old;

dax_disassociate_entry(entry, mapping, false);
dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address,
- cow);
+ shared);
/*
* Only swap our new entry into the page cache if the current
* entry is a zero page or an empty entry. If a normal PTE or
@@ -908,7 +903,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
if (dirty)
xas_set_mark(xas, PAGECACHE_TAG_DIRTY);

- if (cow)
+ if (write && shared)
xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);

xas_unlock_irq(xas);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 07da03976ec1..881de99766ca 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1215,7 +1215,7 @@ xfs_read_iomap_begin(
return error;
error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
&nimaps, 0);
- if (!error && (flags & IOMAP_REPORT))
+ if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode)))
error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
xfs_iunlock(ip, lockmode);

--
2.38.1

2022-12-01 16:09:38

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 6/8] xfs: use dax ops for zero and truncate in fsdax mode

Zero and truncate on a dax file may execute CoW. So use dax ops which
contains end work for CoW.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/xfs/xfs_iomap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 881de99766ca..d9401d0300ad 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1370,7 +1370,7 @@ xfs_zero_range(

if (IS_DAX(inode))
return dax_zero_range(inode, pos, len, did_zero,
- &xfs_direct_write_iomap_ops);
+ &xfs_dax_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
&xfs_buffered_write_iomap_ops);
}
@@ -1385,7 +1385,7 @@ xfs_truncate_page(

if (IS_DAX(inode))
return dax_truncate_page(inode, pos, did_zero,
- &xfs_direct_write_iomap_ops);
+ &xfs_dax_write_iomap_ops);
return iomap_truncate_page(inode, pos, did_zero,
&xfs_buffered_write_iomap_ops);
}
--
2.38.1

2022-12-01 16:15:09

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 5/8] fsdax: dedupe: iter two files at the same time

The iomap_iter() on a range of one file may loop more than once. In
this case, the inner dst_iter can update its iomap but the outer
src_iter can't. This may cause the wrong remapping in filesystem. Let
them called at the same time.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f1eb59bee0b5..354be56750c2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1964,15 +1964,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
.len = len,
.flags = IOMAP_DAX,
};
- int ret;
+ int ret, compared = 0;

- while ((ret = iomap_iter(&src_iter, ops)) > 0) {
- while ((ret = iomap_iter(&dst_iter, ops)) > 0) {
- dst_iter.processed = dax_range_compare_iter(&src_iter,
- &dst_iter, len, same);
- }
- if (ret <= 0)
- src_iter.processed = ret;
+ while ((ret = iomap_iter(&src_iter, ops)) > 0 &&
+ (ret = iomap_iter(&dst_iter, ops)) > 0) {
+ compared = dax_range_compare_iter(&src_iter, &dst_iter, len,
+ same);
+ if (compared < 0)
+ return ret;
+ src_iter.processed = dst_iter.processed = compared;
}
return ret;
}
--
2.38.1

2022-12-01 16:16:04

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 2/8] fsdax: invalidate pages when CoW

CoW changes the share state of a dax page, but the share count of the
page isn't updated. The next time access this page, it should have been
a newly accessed, but old association exists. So, we need to clear the
share state when CoW happens, in both dax_iomap_rw() and
dax_zero_iter().

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85b81963ea31..482dda85ccaf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1264,6 +1264,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
return length;

+ /*
+ * invalidate the pages whose sharing state is to be changed
+ * because of CoW.
+ */
+ if (iomap->flags & IOMAP_F_SHARED)
+ invalidate_inode_pages2_range(iter->inode->i_mapping,
+ pos >> PAGE_SHIFT,
+ (pos + length - 1) >> PAGE_SHIFT);
+
do {
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
@@ -1324,12 +1333,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
struct iov_iter *iter)
{
const struct iomap *iomap = &iomi->iomap;
- const struct iomap *srcmap = &iomi->srcmap;
+ const struct iomap *srcmap = iomap_iter_srcmap(iomi);
loff_t length = iomap_length(iomi);
loff_t pos = iomi->pos;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
bool write = iov_iter_rw(iter) == WRITE;
+ bool cow = write && iomap->flags & IOMAP_F_SHARED;
ssize_t ret = 0;
size_t xfer;
int id;
@@ -1356,7 +1366,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
* into page tables. We have to tear down these mappings so that data
* written by write(2) is visible in mmap.
*/
- if (iomap->flags & IOMAP_F_NEW) {
+ if (iomap->flags & IOMAP_F_NEW || cow) {
invalidate_inode_pages2_range(iomi->inode->i_mapping,
pos >> PAGE_SHIFT,
(end - 1) >> PAGE_SHIFT);
@@ -1390,8 +1400,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
break;
}

- if (write &&
- srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
+ if (cow) {
ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
kaddr);
if (ret)
--
2.38.1

2022-12-01 16:38:39

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 8/8] xfs: remove restrictions for fsdax and reflink

Since the basic function for fsdax and reflink has been implemented,
remove the restrictions of them for widly test.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/xfs/xfs_ioctl.c | 4 ----
fs/xfs/xfs_iops.c | 4 ----
2 files changed, 8 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 1f783e979629..13f1b2add390 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1138,10 +1138,6 @@ xfs_ioctl_setattr_xflags(
if ((fa->fsx_xflags & FS_XFLAG_REALTIME) && xfs_is_reflink_inode(ip))
ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK;

- /* Don't allow us to set DAX mode for a reflinked file for now. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && xfs_is_reflink_inode(ip))
- return -EINVAL;
-
/* diflags2 only valid for v3 inodes. */
i_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags);
if (i_flags2 && !xfs_has_v3inodes(mp))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 2e10e1c66ad6..bf0495f7a5e1 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1185,10 +1185,6 @@ xfs_inode_supports_dax(
if (!S_ISREG(VFS_I(ip)->i_mode))
return false;

- /* Only supported on non-reflinked files. */
- if (xfs_is_reflink_inode(ip))
- return false;
-
/* Block size must match page size */
if (mp->m_sb.sb_blocksize != PAGE_SIZE)
return false;
--
2.38.1

2022-12-01 16:40:21

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 2/8] fsdax: invalidate pages when CoW

On Thu, Dec 01, 2022 at 03:28:52PM +0000, Shiyang Ruan wrote:
> CoW changes the share state of a dax page, but the share count of the
> page isn't updated. The next time access this page, it should have been
> a newly accessed, but old association exists. So, we need to clear the
> share state when CoW happens, in both dax_iomap_rw() and
> dax_zero_iter().
>
> Signed-off-by: Shiyang Ruan <[email protected]>

Looks ok,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/dax.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 85b81963ea31..482dda85ccaf 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1264,6 +1264,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
> if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> return length;
>
> + /*
> + * invalidate the pages whose sharing state is to be changed
> + * because of CoW.
> + */
> + if (iomap->flags & IOMAP_F_SHARED)
> + invalidate_inode_pages2_range(iter->inode->i_mapping,
> + pos >> PAGE_SHIFT,
> + (pos + length - 1) >> PAGE_SHIFT);
> +
> do {
> unsigned offset = offset_in_page(pos);
> unsigned size = min_t(u64, PAGE_SIZE - offset, length);
> @@ -1324,12 +1333,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> struct iov_iter *iter)
> {
> const struct iomap *iomap = &iomi->iomap;
> - const struct iomap *srcmap = &iomi->srcmap;
> + const struct iomap *srcmap = iomap_iter_srcmap(iomi);
> loff_t length = iomap_length(iomi);
> loff_t pos = iomi->pos;
> struct dax_device *dax_dev = iomap->dax_dev;
> loff_t end = pos + length, done = 0;
> bool write = iov_iter_rw(iter) == WRITE;
> + bool cow = write && iomap->flags & IOMAP_F_SHARED;
> ssize_t ret = 0;
> size_t xfer;
> int id;
> @@ -1356,7 +1366,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> * into page tables. We have to tear down these mappings so that data
> * written by write(2) is visible in mmap.
> */
> - if (iomap->flags & IOMAP_F_NEW) {
> + if (iomap->flags & IOMAP_F_NEW || cow) {
> invalidate_inode_pages2_range(iomi->inode->i_mapping,
> pos >> PAGE_SHIFT,
> (end - 1) >> PAGE_SHIFT);
> @@ -1390,8 +1400,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> break;
> }
>
> - if (write &&
> - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
> + if (cow) {
> ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
> kaddr);
> if (ret)
> --
> 2.38.1
>

2022-12-01 16:40:21

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest
page should be zeroed. Otherwise, since it's a pmem, old data may
remains on the dest page, the result of CoW will be incorrect.

The function name is also not easy to understand, rename it to
"dax_iomap_copy_around()", which means it copys data around the range.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 78 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 48 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 482dda85ccaf..6b6e07ad8d80 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1092,7 +1092,7 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
}

/**
- * dax_iomap_cow_copy - Copy the data from source to destination before write
+ * dax_iomap_copy_around - Copy the data from source to destination before write
* @pos: address to do copy from.
* @length: size of copy operation.
* @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
@@ -1101,35 +1101,50 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
*
* This can be called from two places. Either during DAX write fault (page
* aligned), to copy the length size data to daddr. Or, while doing normal DAX
- * write operation, dax_iomap_actor() might call this to do the copy of either
+ * write operation, dax_iomap_iter() might call this to do the copy of either
* start or end unaligned address. In the latter case the rest of the copy of
- * aligned ranges is taken care by dax_iomap_actor() itself.
+ * aligned ranges is taken care by dax_iomap_iter() itself.
+ * If the srcmap contains invalid data, such as HOLE and UNWRITTEN, zero the
+ * area to make sure no old data remains.
*/
-static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
+static int dax_iomap_copy_around(loff_t pos, uint64_t length, size_t align_size,
const struct iomap *srcmap, void *daddr)
{
loff_t head_off = pos & (align_size - 1);
size_t size = ALIGN(head_off + length, align_size);
loff_t end = pos + length;
loff_t pg_end = round_up(end, align_size);
+ /* copy_all is usually in page fault case */
bool copy_all = head_off == 0 && end == pg_end;
+ /* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
+ bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
+ srcmap->type == IOMAP_UNWRITTEN;
void *saddr = 0;
int ret = 0;

- ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
- if (ret)
- return ret;
+ if (!zero_edge) {
+ ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
+ if (ret)
+ return ret;
+ }

if (copy_all) {
- ret = copy_mc_to_kernel(daddr, saddr, length);
- return ret ? -EIO : 0;
+ if (zero_edge)
+ memset(daddr, 0, size);
+ else
+ ret = copy_mc_to_kernel(daddr, saddr, length);
+ goto out;
}

/* Copy the head part of the range */
if (head_off) {
- ret = copy_mc_to_kernel(daddr, saddr, head_off);
- if (ret)
- return -EIO;
+ if (zero_edge)
+ memset(daddr, 0, head_off);
+ else {
+ ret = copy_mc_to_kernel(daddr, saddr, head_off);
+ if (ret)
+ return -EIO;
+ }
}

/* Copy the tail part of the range */
@@ -1137,12 +1152,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
loff_t tail_off = head_off + length;
loff_t tail_len = pg_end - end;

- ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
- tail_len);
- if (ret)
- return -EIO;
+ if (zero_edge)
+ memset(daddr + tail_off, 0, tail_len);
+ else {
+ ret = copy_mc_to_kernel(daddr + tail_off,
+ saddr + tail_off, tail_len);
+ if (ret)
+ return -EIO;
+ }
}
- return 0;
+out:
+ if (zero_edge)
+ dax_flush(srcmap->dax_dev, daddr, size);
+ return ret ? -EIO : 0;
}

/*
@@ -1241,13 +1263,10 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size)
if (ret < 0)
return ret;
memset(kaddr + offset, 0, size);
- if (srcmap->addr != iomap->addr) {
- ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap,
- kaddr);
- if (ret < 0)
- return ret;
- dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE);
- } else
+ if (iomap->flags & IOMAP_F_SHARED)
+ ret = dax_iomap_copy_around(pos, size, PAGE_SIZE, srcmap,
+ kaddr);
+ else
dax_flush(iomap->dax_dev, kaddr + offset, size);
return ret;
}
@@ -1401,8 +1420,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
}

if (cow) {
- ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
- kaddr);
+ ret = dax_iomap_copy_around(pos, length, PAGE_SIZE,
+ srcmap, kaddr);
if (ret)
break;
}
@@ -1547,7 +1566,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
struct xa_state *xas, void **entry, bool pmd)
{
const struct iomap *iomap = &iter->iomap;
- const struct iomap *srcmap = &iter->srcmap;
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
bool write = iter->flags & IOMAP_WRITE;
@@ -1578,9 +1597,8 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,

*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags);

- if (write &&
- srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
- err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
+ if (write && iomap->flags & IOMAP_F_SHARED) {
+ err = dax_iomap_copy_around(pos, size, size, srcmap, kaddr);
if (err)
return dax_fault_return(err);
}
--
2.38.1

2022-12-01 16:42:44

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2 1/8] fsdax: introduce page->share for fsdax in reflink mode

fsdax page is used not only when CoW, but also mapread. To make the it
easily understood, use 'share' to indicate that the dax page is shared
by more than one extent. And add helper functions to use it.

Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 38 ++++++++++++++++++++++----------------
include/linux/mm_types.h | 5 ++++-
include/linux/page-flags.h | 2 +-
3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1c6867810cbd..85b81963ea31 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
for (pfn = dax_to_pfn(entry); \
pfn < dax_end_pfn(entry); pfn++)

-static inline bool dax_mapping_is_cow(struct address_space *mapping)
+static inline bool dax_mapping_is_shared(struct page *page)
{
- return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
+ return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
}

/*
- * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
+ * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
+ * refcount.
*/
-static inline void dax_mapping_set_cow(struct page *page)
+static inline void dax_mapping_set_shared(struct page *page)
{
- if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
+ if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
/*
* Reset the index if the page was already mapped
* regularly before.
*/
if (page->mapping)
- page->index = 1;
- page->mapping = (void *)PAGE_MAPPING_DAX_COW;
+ page->share = 1;
+ page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
}
- page->index++;
+ page->share++;
+}
+
+static inline unsigned long dax_mapping_decrease_shared(struct page *page)
+{
+ return --page->share;
}

/*
- * When it is called in dax_insert_entry(), the cow flag will indicate that
+ * When it is called in dax_insert_entry(), the shared flag will indicate that
* whether this entry is shared by multiple files. If so, set the page->mapping
- * FS_DAX_MAPPING_COW, and use page->index as refcount.
+ * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
*/
static void dax_associate_entry(void *entry, struct address_space *mapping,
- struct vm_area_struct *vma, unsigned long address, bool cow)
+ struct vm_area_struct *vma, unsigned long address, bool shared)
{
unsigned long size = dax_entry_size(entry), pfn, index;
int i = 0;
@@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);

- if (cow) {
- dax_mapping_set_cow(page);
+ if (shared) {
+ dax_mapping_set_shared(page);
} else {
WARN_ON_ONCE(page->mapping);
page->mapping = mapping;
@@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
struct page *page = pfn_to_page(pfn);

WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
- if (dax_mapping_is_cow(page->mapping)) {
- /* keep the CoW flag if this page is still shared */
- if (page->index-- > 0)
+ if (dax_mapping_is_shared(page)) {
+ /* keep the shared flag if this page is still shared */
+ if (dax_mapping_decrease_shared(page) > 0)
continue;
} else
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..f46cac3657ad 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -103,7 +103,10 @@ struct page {
};
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
- pgoff_t index; /* Our offset within mapping. */
+ union {
+ pgoff_t index; /* Our offset within mapping. */
+ unsigned long share; /* share count for fsdax */
+ };
/**
* @private: Mapping-private opaque data.
* Usually used for buffer_heads if PagePrivate.
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b0ae5084e60..c8a3aa02278d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* Different with flags above, this flag is used only for fsdax mode. It
* indicates that this page->mapping is now under reflink case.
*/
-#define PAGE_MAPPING_DAX_COW 0x1
+#define PAGE_MAPPING_DAX_SHARED 0x1

static __always_inline bool folio_mapping_flags(struct folio *folio)
{
--
2.38.1

2022-12-02 00:17:50

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

On Thu, Dec 01, 2022 at 03:28:53PM +0000, Shiyang Ruan wrote:
> If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest
> page should be zeroed. Otherwise, since it's a pmem, old data may
> remains on the dest page, the result of CoW will be incorrect.
>
> The function name is also not easy to understand, rename it to
> "dax_iomap_copy_around()", which means it copys data around the range.
>
> Signed-off-by: Shiyang Ruan <[email protected]>
> ---
> fs/dax.c | 78 ++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 48 insertions(+), 30 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 482dda85ccaf..6b6e07ad8d80 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1092,7 +1092,7 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
> }
>
> /**
> - * dax_iomap_cow_copy - Copy the data from source to destination before write
> + * dax_iomap_copy_around - Copy the data from source to destination before write

* dax_iomap_copy_around - Prepare for an unaligned write to a
* shared/cow page by copying the data before and after the range to be
* written.

Other than that, this make sense,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> * @pos: address to do copy from.
> * @length: size of copy operation.
> * @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
> @@ -1101,35 +1101,50 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
> *
> * This can be called from two places. Either during DAX write fault (page
> * aligned), to copy the length size data to daddr. Or, while doing normal DAX
> - * write operation, dax_iomap_actor() might call this to do the copy of either
> + * write operation, dax_iomap_iter() might call this to do the copy of either
> * start or end unaligned address. In the latter case the rest of the copy of
> - * aligned ranges is taken care by dax_iomap_actor() itself.
> + * aligned ranges is taken care by dax_iomap_iter() itself.
> + * If the srcmap contains invalid data, such as HOLE and UNWRITTEN, zero the
> + * area to make sure no old data remains.
> */
> -static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
> +static int dax_iomap_copy_around(loff_t pos, uint64_t length, size_t align_size,
> const struct iomap *srcmap, void *daddr)
> {
> loff_t head_off = pos & (align_size - 1);
> size_t size = ALIGN(head_off + length, align_size);
> loff_t end = pos + length;
> loff_t pg_end = round_up(end, align_size);
> + /* copy_all is usually in page fault case */
> bool copy_all = head_off == 0 && end == pg_end;
> + /* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
> + bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
> + srcmap->type == IOMAP_UNWRITTEN;
> void *saddr = 0;
> int ret = 0;
>
> - ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
> - if (ret)
> - return ret;
> + if (!zero_edge) {
> + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
> + if (ret)
> + return ret;
> + }
>
> if (copy_all) {
> - ret = copy_mc_to_kernel(daddr, saddr, length);
> - return ret ? -EIO : 0;
> + if (zero_edge)
> + memset(daddr, 0, size);
> + else
> + ret = copy_mc_to_kernel(daddr, saddr, length);
> + goto out;
> }
>
> /* Copy the head part of the range */
> if (head_off) {
> - ret = copy_mc_to_kernel(daddr, saddr, head_off);
> - if (ret)
> - return -EIO;
> + if (zero_edge)
> + memset(daddr, 0, head_off);
> + else {
> + ret = copy_mc_to_kernel(daddr, saddr, head_off);
> + if (ret)
> + return -EIO;
> + }
> }
>
> /* Copy the tail part of the range */
> @@ -1137,12 +1152,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
> loff_t tail_off = head_off + length;
> loff_t tail_len = pg_end - end;
>
> - ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
> - tail_len);
> - if (ret)
> - return -EIO;
> + if (zero_edge)
> + memset(daddr + tail_off, 0, tail_len);
> + else {
> + ret = copy_mc_to_kernel(daddr + tail_off,
> + saddr + tail_off, tail_len);
> + if (ret)
> + return -EIO;
> + }
> }
> - return 0;
> +out:
> + if (zero_edge)
> + dax_flush(srcmap->dax_dev, daddr, size);
> + return ret ? -EIO : 0;
> }
>
> /*
> @@ -1241,13 +1263,10 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size)
> if (ret < 0)
> return ret;
> memset(kaddr + offset, 0, size);
> - if (srcmap->addr != iomap->addr) {
> - ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap,
> - kaddr);
> - if (ret < 0)
> - return ret;
> - dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE);
> - } else
> + if (iomap->flags & IOMAP_F_SHARED)
> + ret = dax_iomap_copy_around(pos, size, PAGE_SIZE, srcmap,
> + kaddr);
> + else
> dax_flush(iomap->dax_dev, kaddr + offset, size);
> return ret;
> }
> @@ -1401,8 +1420,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> }
>
> if (cow) {
> - ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
> - kaddr);
> + ret = dax_iomap_copy_around(pos, length, PAGE_SIZE,
> + srcmap, kaddr);
> if (ret)
> break;
> }
> @@ -1547,7 +1566,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
> struct xa_state *xas, void **entry, bool pmd)
> {
> const struct iomap *iomap = &iter->iomap;
> - const struct iomap *srcmap = &iter->srcmap;
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
> size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
> loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
> bool write = iter->flags & IOMAP_WRITE;
> @@ -1578,9 +1597,8 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>
> *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags);
>
> - if (write &&
> - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
> - err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
> + if (write && iomap->flags & IOMAP_F_SHARED) {
> + err = dax_iomap_copy_around(pos, size, size, srcmap, kaddr);
> if (err)
> return dax_fault_return(err);
> }
> --
> 2.38.1
>

2022-12-02 00:32:35

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] fsdax,xfs: set the shared flag when file extent is shared

On Thu, Dec 01, 2022 at 03:28:54PM +0000, Shiyang Ruan wrote:
> If a dax page is shared, mapread at different offsets can also trigger
> page fault on same dax page. So, change the flag from "cow" to
> "shared". And get the shared flag from filesystem when read.
>
> Signed-off-by: Shiyang Ruan <[email protected]>

Makes sense.
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/dax.c | 19 +++++++------------
> fs/xfs/xfs_iomap.c | 2 +-
> 2 files changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 6b6e07ad8d80..f1eb59bee0b5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -846,12 +846,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter,
> (iter->iomap.flags & IOMAP_F_DIRTY);
> }
>
> -static bool dax_fault_is_cow(const struct iomap_iter *iter)
> -{
> - return (iter->flags & IOMAP_WRITE) &&
> - (iter->iomap.flags & IOMAP_F_SHARED);
> -}
> -
> /*
> * By this point grab_mapping_entry() has ensured that we have a locked entry
> * of the appropriate size so we don't have to worry about downgrading PMDs to
> @@ -865,13 +859,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
> {
> struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> void *new_entry = dax_make_entry(pfn, flags);
> - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma);
> - bool cow = dax_fault_is_cow(iter);
> + bool write = iter->flags & IOMAP_WRITE;
> + bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma);
> + bool shared = iter->iomap.flags & IOMAP_F_SHARED;
>
> if (dirty)
> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>
> - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
> + if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
> unsigned long index = xas->xa_index;
> /* we are replacing a zero page with block mapping */
> if (dax_is_pmd_entry(entry))
> @@ -883,12 +878,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
>
> xas_reset(xas);
> xas_lock_irq(xas);
> - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
> + if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
> void *old;
>
> dax_disassociate_entry(entry, mapping, false);
> dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address,
> - cow);
> + shared);
> /*
> * Only swap our new entry into the page cache if the current
> * entry is a zero page or an empty entry. If a normal PTE or
> @@ -908,7 +903,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
> if (dirty)
> xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>
> - if (cow)
> + if (write && shared)
> xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
>
> xas_unlock_irq(xas);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 07da03976ec1..881de99766ca 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1215,7 +1215,7 @@ xfs_read_iomap_begin(
> return error;
> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> &nimaps, 0);
> - if (!error && (flags & IOMAP_REPORT))
> + if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode)))
> error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> xfs_iunlock(ip, lockmode);
>
> --
> 2.38.1
>

2022-12-02 01:04:00

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 5/8] fsdax: dedupe: iter two files at the same time

On Thu, Dec 01, 2022 at 03:31:41PM +0000, Shiyang Ruan wrote:
> The iomap_iter() on a range of one file may loop more than once. In
> this case, the inner dst_iter can update its iomap but the outer
> src_iter can't. This may cause the wrong remapping in filesystem. Let
> them called at the same time.
>
> Signed-off-by: Shiyang Ruan <[email protected]>

Thank you for adding that explanation, it makes the problem much more
obvious. :)

Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/dax.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index f1eb59bee0b5..354be56750c2 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1964,15 +1964,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
> .len = len,
> .flags = IOMAP_DAX,
> };
> - int ret;
> + int ret, compared = 0;
>
> - while ((ret = iomap_iter(&src_iter, ops)) > 0) {
> - while ((ret = iomap_iter(&dst_iter, ops)) > 0) {
> - dst_iter.processed = dax_range_compare_iter(&src_iter,
> - &dst_iter, len, same);
> - }
> - if (ret <= 0)
> - src_iter.processed = ret;
> + while ((ret = iomap_iter(&src_iter, ops)) > 0 &&
> + (ret = iomap_iter(&dst_iter, ops)) > 0) {
> + compared = dax_range_compare_iter(&src_iter, &dst_iter, len,
> + same);
> + if (compared < 0)
> + return ret;
> + src_iter.processed = dst_iter.processed = compared;
> }
> return ret;
> }
> --
> 2.38.1
>

2022-12-02 01:06:37

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 6/8] xfs: use dax ops for zero and truncate in fsdax mode

On Thu, Dec 01, 2022 at 03:32:10PM +0000, Shiyang Ruan wrote:
> Zero and truncate on a dax file may execute CoW. So use dax ops which
> contains end work for CoW.
>
> Signed-off-by: Shiyang Ruan <[email protected]>

LGTM
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/xfs/xfs_iomap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 881de99766ca..d9401d0300ad 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1370,7 +1370,7 @@ xfs_zero_range(
>
> if (IS_DAX(inode))
> return dax_zero_range(inode, pos, len, did_zero,
> - &xfs_direct_write_iomap_ops);
> + &xfs_dax_write_iomap_ops);
> return iomap_zero_range(inode, pos, len, did_zero,
> &xfs_buffered_write_iomap_ops);
> }
> @@ -1385,7 +1385,7 @@ xfs_truncate_page(
>
> if (IS_DAX(inode))
> return dax_truncate_page(inode, pos, did_zero,
> - &xfs_direct_write_iomap_ops);
> + &xfs_dax_write_iomap_ops);
> return iomap_truncate_page(inode, pos, did_zero,
> &xfs_buffered_write_iomap_ops);
> }
> --
> 2.38.1
>

2022-12-02 01:07:23

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2 8/8] xfs: remove restrictions for fsdax and reflink

On Thu, Dec 01, 2022 at 03:32:53PM +0000, Shiyang Ruan wrote:
> Since the basic function for fsdax and reflink has been implemented,
> remove the restrictions of them for widly test.
>
> Signed-off-by: Shiyang Ruan <[email protected]>

Looks ok,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/xfs/xfs_ioctl.c | 4 ----
> fs/xfs/xfs_iops.c | 4 ----
> 2 files changed, 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 1f783e979629..13f1b2add390 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1138,10 +1138,6 @@ xfs_ioctl_setattr_xflags(
> if ((fa->fsx_xflags & FS_XFLAG_REALTIME) && xfs_is_reflink_inode(ip))
> ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK;
>
> - /* Don't allow us to set DAX mode for a reflinked file for now. */
> - if ((fa->fsx_xflags & FS_XFLAG_DAX) && xfs_is_reflink_inode(ip))
> - return -EINVAL;
> -
> /* diflags2 only valid for v3 inodes. */
> i_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags);
> if (i_flags2 && !xfs_has_v3inodes(mp))
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 2e10e1c66ad6..bf0495f7a5e1 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1185,10 +1185,6 @@ xfs_inode_supports_dax(
> if (!S_ISREG(VFS_I(ip)->i_mode))
> return false;
>
> - /* Only supported on non-reflinked files. */
> - if (xfs_is_reflink_inode(ip))
> - return false;
> -
> /* Block size must match page size */
> if (mp->m_sb.sb_blocksize != PAGE_SIZE)
> return false;
> --
> 2.38.1
>

2022-12-02 01:11:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

On Thu, 1 Dec 2022 15:58:11 -0800 "Darrick J. Wong" <[email protected]> wrote:

> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1092,7 +1092,7 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
> > }
> >
> > /**
> > - * dax_iomap_cow_copy - Copy the data from source to destination before write
> > + * dax_iomap_copy_around - Copy the data from source to destination before write
>
> * dax_iomap_copy_around - Prepare for an unaligned write to a
> * shared/cow page by copying the data before and after the range to be
> * written.

Thanks, I added this:

--- a/fs/dax.c~fsdax-zero-the-edges-if-source-is-hole-or-unwritten-fix
+++ a/fs/dax.c
@@ -1092,7 +1092,8 @@ out:
}

/**
- * dax_iomap_copy_around - Copy the data from source to destination before write
+ * dax_iomap_copy_around - Prepare for an unaligned write to a shared/cow page
+ * by copying the data before and after the range to be written.
* @pos: address to do copy from.
* @length: size of copy operation.
* @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
_

2022-12-02 10:12:54

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

fsdax page is used not only when CoW, but also mapread. To make the it
easily understood, use 'share' to indicate that the dax page is shared
by more than one extent. And add helper functions to use it.

Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/dax.c | 38 ++++++++++++++++++++++----------------
include/linux/mm_types.h | 5 ++++-
include/linux/page-flags.h | 2 +-
3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1c6867810cbd..edbacb273ab5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
for (pfn = dax_to_pfn(entry); \
pfn < dax_end_pfn(entry); pfn++)

-static inline bool dax_mapping_is_cow(struct address_space *mapping)
+static inline bool dax_page_is_shared(struct page *page)
{
- return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
+ return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
}

/*
- * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
+ * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
+ * refcount.
*/
-static inline void dax_mapping_set_cow(struct page *page)
+static inline void dax_page_bump_sharing(struct page *page)
{
- if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
+ if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
/*
* Reset the index if the page was already mapped
* regularly before.
*/
if (page->mapping)
- page->index = 1;
- page->mapping = (void *)PAGE_MAPPING_DAX_COW;
+ page->share = 1;
+ page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
}
- page->index++;
+ page->share++;
+}
+
+static inline unsigned long dax_page_drop_sharing(struct page *page)
+{
+ return --page->share;
}

/*
- * When it is called in dax_insert_entry(), the cow flag will indicate that
+ * When it is called in dax_insert_entry(), the shared flag will indicate that
* whether this entry is shared by multiple files. If so, set the page->mapping
- * FS_DAX_MAPPING_COW, and use page->index as refcount.
+ * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
*/
static void dax_associate_entry(void *entry, struct address_space *mapping,
- struct vm_area_struct *vma, unsigned long address, bool cow)
+ struct vm_area_struct *vma, unsigned long address, bool shared)
{
unsigned long size = dax_entry_size(entry), pfn, index;
int i = 0;
@@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);

- if (cow) {
- dax_mapping_set_cow(page);
+ if (shared) {
+ dax_page_bump_sharing(page);
} else {
WARN_ON_ONCE(page->mapping);
page->mapping = mapping;
@@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
struct page *page = pfn_to_page(pfn);

WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
- if (dax_mapping_is_cow(page->mapping)) {
- /* keep the CoW flag if this page is still shared */
- if (page->index-- > 0)
+ if (dax_page_is_shared(page)) {
+ /* keep the shared flag if this page is still shared */
+ if (dax_page_drop_sharing(page) > 0)
continue;
} else
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..f46cac3657ad 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -103,7 +103,10 @@ struct page {
};
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
- pgoff_t index; /* Our offset within mapping. */
+ union {
+ pgoff_t index; /* Our offset within mapping. */
+ unsigned long share; /* share count for fsdax */
+ };
/**
* @private: Mapping-private opaque data.
* Usually used for buffer_heads if PagePrivate.
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b0ae5084e60..c8a3aa02278d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* Different with flags above, this flag is used only for fsdax mode. It
* indicates that this page->mapping is now under reflink case.
*/
-#define PAGE_MAPPING_DAX_COW 0x1
+#define PAGE_MAPPING_DAX_SHARED 0x1

static __always_inline bool folio_mapping_flags(struct folio *folio)
{
--
2.38.1

2022-12-02 10:54:44

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2.1 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest
page should be zeroed. Otherwise, since it's a pmem, old data may
remains on the dest page, the result of CoW will be incorrect.

The function name is also not easy to understand, rename it to
"dax_iomap_copy_around()", which means it copys data around the range.

Signed-off-by: Shiyang Ruan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/dax.c | 79 +++++++++++++++++++++++++++++++++++---------------------
1 file changed, 49 insertions(+), 30 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a77739f2abe7..f12645d6f3c8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1092,7 +1092,8 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
}

/**
- * dax_iomap_cow_copy - Copy the data from source to destination before write
+ * dax_iomap_copy_around - Prepare for an unaligned write to a shared/cow page
+ * by copying the data before and after the range to be written.
* @pos: address to do copy from.
* @length: size of copy operation.
* @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
@@ -1101,35 +1102,50 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
*
* This can be called from two places. Either during DAX write fault (page
* aligned), to copy the length size data to daddr. Or, while doing normal DAX
- * write operation, dax_iomap_actor() might call this to do the copy of either
+ * write operation, dax_iomap_iter() might call this to do the copy of either
* start or end unaligned address. In the latter case the rest of the copy of
- * aligned ranges is taken care by dax_iomap_actor() itself.
+ * aligned ranges is taken care by dax_iomap_iter() itself.
+ * If the srcmap contains invalid data, such as HOLE and UNWRITTEN, zero the
+ * area to make sure no old data remains.
*/
-static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
+static int dax_iomap_copy_around(loff_t pos, uint64_t length, size_t align_size,
const struct iomap *srcmap, void *daddr)
{
loff_t head_off = pos & (align_size - 1);
size_t size = ALIGN(head_off + length, align_size);
loff_t end = pos + length;
loff_t pg_end = round_up(end, align_size);
+ /* copy_all is usually in page fault case */
bool copy_all = head_off == 0 && end == pg_end;
+ /* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
+ bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
+ srcmap->type == IOMAP_UNWRITTEN;
void *saddr = 0;
int ret = 0;

- ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
- if (ret)
- return ret;
+ if (!zero_edge) {
+ ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
+ if (ret)
+ return ret;
+ }

if (copy_all) {
- ret = copy_mc_to_kernel(daddr, saddr, length);
- return ret ? -EIO : 0;
+ if (zero_edge)
+ memset(daddr, 0, size);
+ else
+ ret = copy_mc_to_kernel(daddr, saddr, length);
+ goto out;
}

/* Copy the head part of the range */
if (head_off) {
- ret = copy_mc_to_kernel(daddr, saddr, head_off);
- if (ret)
- return -EIO;
+ if (zero_edge)
+ memset(daddr, 0, head_off);
+ else {
+ ret = copy_mc_to_kernel(daddr, saddr, head_off);
+ if (ret)
+ return -EIO;
+ }
}

/* Copy the tail part of the range */
@@ -1137,12 +1153,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
loff_t tail_off = head_off + length;
loff_t tail_len = pg_end - end;

- ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
- tail_len);
- if (ret)
- return -EIO;
+ if (zero_edge)
+ memset(daddr + tail_off, 0, tail_len);
+ else {
+ ret = copy_mc_to_kernel(daddr + tail_off,
+ saddr + tail_off, tail_len);
+ if (ret)
+ return -EIO;
+ }
}
- return 0;
+out:
+ if (zero_edge)
+ dax_flush(srcmap->dax_dev, daddr, size);
+ return ret ? -EIO : 0;
}

/*
@@ -1241,13 +1264,10 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size)
if (ret < 0)
return ret;
memset(kaddr + offset, 0, size);
- if (srcmap->addr != iomap->addr) {
- ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap,
- kaddr);
- if (ret < 0)
- return ret;
- dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE);
- } else
+ if (iomap->flags & IOMAP_F_SHARED)
+ ret = dax_iomap_copy_around(pos, size, PAGE_SIZE, srcmap,
+ kaddr);
+ else
dax_flush(iomap->dax_dev, kaddr + offset, size);
return ret;
}
@@ -1401,8 +1421,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
}

if (cow) {
- ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
- kaddr);
+ ret = dax_iomap_copy_around(pos, length, PAGE_SIZE,
+ srcmap, kaddr);
if (ret)
break;
}
@@ -1547,7 +1567,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
struct xa_state *xas, void **entry, bool pmd)
{
const struct iomap *iomap = &iter->iomap;
- const struct iomap *srcmap = &iter->srcmap;
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
bool write = iter->flags & IOMAP_WRITE;
@@ -1578,9 +1598,8 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,

*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags);

- if (write &&
- srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) {
- err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
+ if (write && iomap->flags & IOMAP_F_SHARED) {
+ err = dax_iomap_copy_around(pos, size, size, srcmap, kaddr);
if (err)
return dax_fault_return(err);
}
--
2.38.1

2022-12-02 20:59:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

On Fri, 2 Dec 2022 09:23:11 +0000 Shiyang Ruan <[email protected]> wrote:

> fsdax page is used not only when CoW, but also mapread. To make the it
> easily understood, use 'share' to indicate that the dax page is shared
> by more than one extent. And add helper functions to use it.
>
> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
>

For those who are wondering what changed, I queued the below incremental
patch.


From: Shiyang Ruan <[email protected]>
Subject: fsdax: introduce page->share for fsdax in reflink mode
Date: Fri, 2 Dec 2022 09:23:11 +0000

rename several functions

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shiyang Ruan <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

fs/dax.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- a/fs/dax.c~fsdax-introduce-page-share-for-fsdax-in-reflink-mode-fix
+++ a/fs/dax.c
@@ -334,7 +334,7 @@ static unsigned long dax_end_pfn(void *e
for (pfn = dax_to_pfn(entry); \
pfn < dax_end_pfn(entry); pfn++)

-static inline bool dax_mapping_is_shared(struct page *page)
+static inline bool dax_page_is_shared(struct page *page)
{
return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
}
@@ -343,7 +343,7 @@ static inline bool dax_mapping_is_shared
* Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
* refcount.
*/
-static inline void dax_mapping_set_shared(struct page *page)
+static inline void dax_page_bump_sharing(struct page *page)
{
if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
/*
@@ -357,7 +357,7 @@ static inline void dax_mapping_set_share
page->share++;
}

-static inline unsigned long dax_mapping_decrease_shared(struct page *page)
+static inline unsigned long dax_page_drop_sharing(struct page *page)
{
return --page->share;
}
@@ -381,7 +381,7 @@ static void dax_associate_entry(void *en
struct page *page = pfn_to_page(pfn);

if (shared) {
- dax_mapping_set_shared(page);
+ dax_page_bump_sharing(page);
} else {
WARN_ON_ONCE(page->mapping);
page->mapping = mapping;
@@ -402,9 +402,9 @@ static void dax_disassociate_entry(void
struct page *page = pfn_to_page(pfn);

WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
- if (dax_mapping_is_shared(page)) {
+ if (dax_page_is_shared(page)) {
/* keep the shared flag if this page is still shared */
- if (dax_mapping_decrease_shared(page) > 0)
+ if (dax_page_drop_sharing(page) > 0)
continue;
} else
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
_

2022-12-03 00:33:07

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH v2.1 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

On Fri, 2022-12-02 at 09:25 +0000, Shiyang Ruan wrote:
> If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest
> page should be zeroed.  Otherwise, since it's a pmem, old data may
> remains on the dest page, the result of CoW will be incorrect.
>
> The function name is also not easy to understand, rename it to
> "dax_iomap_copy_around()", which means it copys data around the
> range.
>
> Signed-off-by: Shiyang Ruan <[email protected]>
> Reviewed-by: Darrick J. Wong <[email protected]>
>
I think the new changes look good
Reviewed-by: Allison Henderson <[email protected]>

> ---
>  fs/dax.c | 79 +++++++++++++++++++++++++++++++++++-------------------
> --
>  1 file changed, 49 insertions(+), 30 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index a77739f2abe7..f12645d6f3c8 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1092,7 +1092,8 @@ static int dax_iomap_direct_access(const struct
> iomap *iomap, loff_t pos,
>  }
>  
>  /**
> - * dax_iomap_cow_copy - Copy the data from source to destination
> before write
> + * dax_iomap_copy_around - Prepare for an unaligned write to a
> shared/cow page
> + * by copying the data before and after the range to be written.
>   * @pos:       address to do copy from.
>   * @length:    size of copy operation.
>   * @align_size:        aligned w.r.t align_size (either PMD_SIZE or
> PAGE_SIZE)
> @@ -1101,35 +1102,50 @@ static int dax_iomap_direct_access(const
> struct iomap *iomap, loff_t pos,
>   *
>   * This can be called from two places. Either during DAX write fault
> (page
>   * aligned), to copy the length size data to daddr. Or, while doing
> normal DAX
> - * write operation, dax_iomap_actor() might call this to do the copy
> of either
> + * write operation, dax_iomap_iter() might call this to do the copy
> of either
>   * start or end unaligned address. In the latter case the rest of
> the copy of
> - * aligned ranges is taken care by dax_iomap_actor() itself.
> + * aligned ranges is taken care by dax_iomap_iter() itself.
> + * If the srcmap contains invalid data, such as HOLE and UNWRITTEN,
> zero the
> + * area to make sure no old data remains.
>   */
> -static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t
> align_size,
> +static int dax_iomap_copy_around(loff_t pos, uint64_t length, size_t
> align_size,
>                 const struct iomap *srcmap, void *daddr)
>  {
>         loff_t head_off = pos & (align_size - 1);
>         size_t size = ALIGN(head_off + length, align_size);
>         loff_t end = pos + length;
>         loff_t pg_end = round_up(end, align_size);
> +       /* copy_all is usually in page fault case */
>         bool copy_all = head_off == 0 && end == pg_end;
> +       /* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
> +       bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
> +                        srcmap->type == IOMAP_UNWRITTEN;
>         void *saddr = 0;
>         int ret = 0;
>  
> -       ret = dax_iomap_direct_access(srcmap, pos, size, &saddr,
> NULL);
> -       if (ret)
> -               return ret;
> +       if (!zero_edge) {
> +               ret = dax_iomap_direct_access(srcmap, pos, size,
> &saddr, NULL);
> +               if (ret)
> +                       return ret;
> +       }
>  
>         if (copy_all) {
> -               ret = copy_mc_to_kernel(daddr, saddr, length);
> -               return ret ? -EIO : 0;
> +               if (zero_edge)
> +                       memset(daddr, 0, size);
> +               else
> +                       ret = copy_mc_to_kernel(daddr, saddr,
> length);
> +               goto out;
>         }
>  
>         /* Copy the head part of the range */
>         if (head_off) {
> -               ret = copy_mc_to_kernel(daddr, saddr, head_off);
> -               if (ret)
> -                       return -EIO;
> +               if (zero_edge)
> +                       memset(daddr, 0, head_off);
> +               else {
> +                       ret = copy_mc_to_kernel(daddr, saddr,
> head_off);
> +                       if (ret)
> +                               return -EIO;
> +               }
>         }
>  
>         /* Copy the tail part of the range */
> @@ -1137,12 +1153,19 @@ static int dax_iomap_cow_copy(loff_t pos,
> uint64_t length, size_t align_size,
>                 loff_t tail_off = head_off + length;
>                 loff_t tail_len = pg_end - end;
>  
> -               ret = copy_mc_to_kernel(daddr + tail_off, saddr +
> tail_off,
> -                                       tail_len);
> -               if (ret)
> -                       return -EIO;
> +               if (zero_edge)
> +                       memset(daddr + tail_off, 0, tail_len);
> +               else {
> +                       ret = copy_mc_to_kernel(daddr + tail_off,
> +                                               saddr + tail_off,
> tail_len);
> +                       if (ret)
> +                               return -EIO;
> +               }
>         }
> -       return 0;
> +out:
> +       if (zero_edge)
> +               dax_flush(srcmap->dax_dev, daddr, size);
> +       return ret ? -EIO : 0;
>  }
>  
>  /*
> @@ -1241,13 +1264,10 @@ static int dax_memzero(struct iomap_iter
> *iter, loff_t pos, size_t size)
>         if (ret < 0)
>                 return ret;
>         memset(kaddr + offset, 0, size);
> -       if (srcmap->addr != iomap->addr) {
> -               ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE,
> srcmap,
> -                                        kaddr);
> -               if (ret < 0)
> -                       return ret;
> -               dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE);
> -       } else
> +       if (iomap->flags & IOMAP_F_SHARED)
> +               ret = dax_iomap_copy_around(pos, size, PAGE_SIZE,
> srcmap,
> +                                           kaddr);
> +       else
>                 dax_flush(iomap->dax_dev, kaddr + offset, size);
>         return ret;
>  }
> @@ -1401,8 +1421,8 @@ static loff_t dax_iomap_iter(const struct
> iomap_iter *iomi,
>                 }
>  
>                 if (cow) {
> -                       ret = dax_iomap_cow_copy(pos, length,
> PAGE_SIZE, srcmap,
> -                                                kaddr);
> +                       ret = dax_iomap_copy_around(pos, length,
> PAGE_SIZE,
> +                                                   srcmap, kaddr);
>                         if (ret)
>                                 break;
>                 }
> @@ -1547,7 +1567,7 @@ static vm_fault_t dax_fault_iter(struct
> vm_fault *vmf,
>                 struct xa_state *xas, void **entry, bool pmd)
>  {
>         const struct iomap *iomap = &iter->iomap;
> -       const struct iomap *srcmap = &iter->srcmap;
> +       const struct iomap *srcmap = iomap_iter_srcmap(iter);
>         size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
>         loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
>         bool write = iter->flags & IOMAP_WRITE;
> @@ -1578,9 +1598,8 @@ static vm_fault_t dax_fault_iter(struct
> vm_fault *vmf,
>  
>         *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn,
> entry_flags);
>  
> -       if (write &&
> -           srcmap->type != IOMAP_HOLE && srcmap->addr != iomap-
> >addr) {
> -               err = dax_iomap_cow_copy(pos, size, size, srcmap,
> kaddr);
> +       if (write && iomap->flags & IOMAP_F_SHARED) {
> +               err = dax_iomap_copy_around(pos, size, size, srcmap,
> kaddr);
>                 if (err)
>                         return dax_fault_return(err);
>         }

2022-12-03 00:37:02

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

On Fri, 2022-12-02 at 09:23 +0000, Shiyang Ruan wrote:
> fsdax page is used not only when CoW, but also mapread. To make the
> it
> easily understood, use 'share' to indicate that the dax page is
> shared
> by more than one extent.  And add helper functions to use it.
>
> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
>
The new changes look reasonable to me
Reviewed-by: Allison Henderson <[email protected]>

> Signed-off-by: Shiyang Ruan <[email protected]>
> ---
>  fs/dax.c                   | 38 ++++++++++++++++++++++--------------
> --
>  include/linux/mm_types.h   |  5 ++++-
>  include/linux/page-flags.h |  2 +-
>  3 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 1c6867810cbd..edbacb273ab5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
>         for (pfn = dax_to_pfn(entry); \
>                         pfn < dax_end_pfn(entry); pfn++)
>  
> -static inline bool dax_mapping_is_cow(struct address_space *mapping)
> +static inline bool dax_page_is_shared(struct page *page)
>  {
> -       return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
> +       return (unsigned long)page->mapping ==
> PAGE_MAPPING_DAX_SHARED;
>  }
>  
>  /*
> - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the
> refcount.
> + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase
> the
> + * refcount.
>   */
> -static inline void dax_mapping_set_cow(struct page *page)
> +static inline void dax_page_bump_sharing(struct page *page)
>  {
> -       if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
> +       if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
>                 /*
>                  * Reset the index if the page was already mapped
>                  * regularly before.
>                  */
>                 if (page->mapping)
> -                       page->index = 1;
> -               page->mapping = (void *)PAGE_MAPPING_DAX_COW;
> +                       page->share = 1;
> +               page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
>         }
> -       page->index++;
> +       page->share++;
> +}
> +
> +static inline unsigned long dax_page_drop_sharing(struct page *page)
> +{
> +       return --page->share;
>  }
>  
>  /*
> - * When it is called in dax_insert_entry(), the cow flag will
> indicate that
> + * When it is called in dax_insert_entry(), the shared flag will
> indicate that
>   * whether this entry is shared by multiple files.  If so, set the
> page->mapping
> - * FS_DAX_MAPPING_COW, and use page->index as refcount.
> + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
>   */
>  static void dax_associate_entry(void *entry, struct address_space
> *mapping,
> -               struct vm_area_struct *vma, unsigned long address,
> bool cow)
> +               struct vm_area_struct *vma, unsigned long address,
> bool shared)
>  {
>         unsigned long size = dax_entry_size(entry), pfn, index;
>         int i = 0;
> @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry,
> struct address_space *mapping,
>         for_each_mapped_pfn(entry, pfn) {
>                 struct page *page = pfn_to_page(pfn);
>  
> -               if (cow) {
> -                       dax_mapping_set_cow(page);
> +               if (shared) {
> +                       dax_page_bump_sharing(page);
>                 } else {
>                         WARN_ON_ONCE(page->mapping);
>                         page->mapping = mapping;
> @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry,
> struct address_space *mapping,
>                 struct page *page = pfn_to_page(pfn);
>  
>                 WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> -               if (dax_mapping_is_cow(page->mapping)) {
> -                       /* keep the CoW flag if this page is still
> shared */
> -                       if (page->index-- > 0)
> +               if (dax_page_is_shared(page)) {
> +                       /* keep the shared flag if this page is still
> shared */
> +                       if (dax_page_drop_sharing(page) > 0)
>                                 continue;
>                 } else
>                         WARN_ON_ONCE(page->mapping && page->mapping
> != mapping);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..f46cac3657ad 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -103,7 +103,10 @@ struct page {
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
> -                       pgoff_t index;          /* Our offset within
> mapping. */
> +                       union {
> +                               pgoff_t index;          /* Our offset
> within mapping. */
> +                               unsigned long share;    /* share
> count for fsdax */
> +                       };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if
> PagePrivate.
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0b0ae5084e60..c8a3aa02278d 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted,
> vmemmap_self_hosted)
>   * Different with flags above, this flag is used only for fsdax
> mode.  It
>   * indicates that this page->mapping is now under reflink case.
>   */
> -#define PAGE_MAPPING_DAX_COW   0x1
> +#define PAGE_MAPPING_DAX_SHARED        0x1
>  
>  static __always_inline bool folio_mapping_flags(struct folio *folio)
>  {

2022-12-03 02:15:59

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

Shiyang Ruan wrote:
> fsdax page is used not only when CoW, but also mapread. To make the it
> easily understood, use 'share' to indicate that the dax page is shared
> by more than one extent. And add helper functions to use it.
>
> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
>
> Signed-off-by: Shiyang Ruan <[email protected]>
> ---
> fs/dax.c | 38 ++++++++++++++++++++++----------------
> include/linux/mm_types.h | 5 ++++-
> include/linux/page-flags.h | 2 +-
> 3 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 1c6867810cbd..edbacb273ab5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
> for (pfn = dax_to_pfn(entry); \
> pfn < dax_end_pfn(entry); pfn++)
>
> -static inline bool dax_mapping_is_cow(struct address_space *mapping)
> +static inline bool dax_page_is_shared(struct page *page)
> {
> - return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
> + return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
> }
>
> /*
> - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
> + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
> + * refcount.
> */
> -static inline void dax_mapping_set_cow(struct page *page)
> +static inline void dax_page_bump_sharing(struct page *page)

Similar to page_ref naming I would call this page_share_get() and the
corresponding function page_share_put().

> {
> - if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
> + if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
> /*
> * Reset the index if the page was already mapped
> * regularly before.
> */
> if (page->mapping)
> - page->index = 1;
> - page->mapping = (void *)PAGE_MAPPING_DAX_COW;
> + page->share = 1;
> + page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;

Small nit, You could save a cast here by defining
PAGE_MAPPING_DAX_SHARED as "((void *) 1)".

> }
> - page->index++;
> + page->share++;
> +}
> +
> +static inline unsigned long dax_page_drop_sharing(struct page *page)
> +{
> + return --page->share;
> }
>
> /*
> - * When it is called in dax_insert_entry(), the cow flag will indicate that
> + * When it is called in dax_insert_entry(), the shared flag will indicate that
> * whether this entry is shared by multiple files. If so, set the page->mapping
> - * FS_DAX_MAPPING_COW, and use page->index as refcount.
> + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
> */
> static void dax_associate_entry(void *entry, struct address_space *mapping,
> - struct vm_area_struct *vma, unsigned long address, bool cow)
> + struct vm_area_struct *vma, unsigned long address, bool shared)
> {
> unsigned long size = dax_entry_size(entry), pfn, index;
> int i = 0;
> @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
> for_each_mapped_pfn(entry, pfn) {
> struct page *page = pfn_to_page(pfn);
>
> - if (cow) {
> - dax_mapping_set_cow(page);
> + if (shared) {
> + dax_page_bump_sharing(page);
> } else {
> WARN_ON_ONCE(page->mapping);
> page->mapping = mapping;
> @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
> struct page *page = pfn_to_page(pfn);
>
> WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> - if (dax_mapping_is_cow(page->mapping)) {
> - /* keep the CoW flag if this page is still shared */
> - if (page->index-- > 0)
> + if (dax_page_is_shared(page)) {
> + /* keep the shared flag if this page is still shared */
> + if (dax_page_drop_sharing(page) > 0)
> continue;

I think part of what makes this hard to read is trying to preserve the
same code paths for shared pages and typical pages.

page_share_put() should, in addition to decrementing the share, clear
out page->mapping value.

> } else
> WARN_ON_ONCE(page->mapping && page->mapping != mapping);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..f46cac3657ad 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -103,7 +103,10 @@ struct page {
> };
> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> struct address_space *mapping;
> - pgoff_t index; /* Our offset within mapping. */
> + union {
> + pgoff_t index; /* Our offset within mapping. */
> + unsigned long share; /* share count for fsdax */
> + };
> /**
> * @private: Mapping-private opaque data.
> * Usually used for buffer_heads if PagePrivate.
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0b0ae5084e60..c8a3aa02278d 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
> * Different with flags above, this flag is used only for fsdax mode. It
> * indicates that this page->mapping is now under reflink case.
> */
> -#define PAGE_MAPPING_DAX_COW 0x1
> +#define PAGE_MAPPING_DAX_SHARED 0x1
>
> static __always_inline bool folio_mapping_flags(struct folio *folio)
> {
> --
> 2.38.1
>


2022-12-03 02:18:00

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v2 0/8] fsdax,xfs: fix warning messages

Shiyang Ruan wrote:
> Changes since v1:
> 1. Added a snippet of the warning message and some of the failed cases
> 2. Separated the patch for easily review
> 3. Added page->share and its helper functions
> 4. Included the patch[1] that removes the restrictions of fsdax and reflink
> [1] https://lore.kernel.org/linux-xfs/[email protected]/
>
> Many testcases failed in dax+reflink mode with warning message in dmesg.
> Such as generic/051,075,127. The warning message is like this:
> [ 775.509337] ------------[ cut here ]------------
> [ 775.509636] WARNING: CPU: 1 PID: 16815 at fs/dax.c:386 dax_insert_entry.cold+0x2e/0x69
> [ 775.510151] Modules linked in: auth_rpcgss oid_registry nfsv4 algif_hash af_alg af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables x_tables dax_pmem nd_pmem nd_btt sch_fq_codel configfs xfs libcrc32c fuse
> [ 775.524288] CPU: 1 PID: 16815 Comm: fsx Kdump: loaded Tainted: G W 6.1.0-rc4+ #164 eb34e4ee4200c7cbbb47de2b1892c5a3e027fd6d
> [ 775.524904] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.0-3-3 04/01/2014
> [ 775.525460] RIP: 0010:dax_insert_entry.cold+0x2e/0x69
> [ 775.525797] Code: c7 c7 18 eb e0 81 48 89 4c 24 20 48 89 54 24 10 e8 73 6d ff ff 48 83 7d 18 00 48 8b 54 24 10 48 8b 4c 24 20 0f 84 e3 e9 b9 ff <0f> 0b e9 dc e9 b9 ff 48 c7 c6 a0 20 c3 81 48 c7 c7 f0 ea e0 81 48
> [ 775.526708] RSP: 0000:ffffc90001d57b30 EFLAGS: 00010082
> [ 775.527042] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000042
> [ 775.527396] RDX: ffffea000a0f6c80 RSI: ffffffff81dfab1b RDI: 00000000ffffffff
> [ 775.527819] RBP: ffffea000a0f6c40 R08: 0000000000000000 R09: ffffffff820625e0
> [ 775.528241] R10: ffffc90001d579d8 R11: ffffffff820d2628 R12: ffff88815fc98320
> [ 775.528598] R13: ffffc90001d57c18 R14: 0000000000000000 R15: 0000000000000001
> [ 775.528997] FS: 00007f39fc75d740(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
> [ 775.529474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 775.529800] CR2: 00007f39fc772040 CR3: 0000000107eb6001 CR4: 00000000003706e0
> [ 775.530214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 775.530592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 775.531002] Call Trace:
> [ 775.531230] <TASK>
> [ 775.531444] dax_fault_iter+0x267/0x6c0
> [ 775.531719] dax_iomap_pte_fault+0x198/0x3d0
> [ 775.532002] __xfs_filemap_fault+0x24a/0x2d0 [xfs aa8d25411432b306d9554da38096f4ebb86bdfe7]
> [ 775.532603] __do_fault+0x30/0x1e0
> [ 775.532903] do_fault+0x314/0x6c0
> [ 775.533166] __handle_mm_fault+0x646/0x1250
> [ 775.533480] handle_mm_fault+0xc1/0x230
> [ 775.533810] do_user_addr_fault+0x1ac/0x610
> [ 775.534110] exc_page_fault+0x63/0x140
> [ 775.534389] asm_exc_page_fault+0x22/0x30
> [ 775.534678] RIP: 0033:0x7f39fc55820a
> [ 775.534950] Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 0f 1f 44 00 00
> [ 775.535839] RSP: 002b:00007ffc66a08118 EFLAGS: 00010202
> [ 775.536157] RAX: 00007f39fc772001 RBX: 0000000000042001 RCX: 00000000000063c1
> [ 775.536537] RDX: 0000000000006400 RSI: 00007f39fac42050 RDI: 00007f39fc772040
> [ 775.536919] RBP: 0000000000006400 R08: 00007f39fc772001 R09: 0000000000042000
> [ 775.537304] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
> [ 775.537694] R13: 00007f39fc772000 R14: 0000000000006401 R15: 0000000000000003
> [ 775.538086] </TASK>
> [ 775.538333] ---[ end trace 0000000000000000 ]---
>
> This also effects dax+noreflink mode if we run the test after a
> dax+reflink test. So, the most urgent thing is solving the warning
> messages.
>
> With these fixes, most warning messages in dax_associate_entry() are
> gone. But honestly, generic/388 will randomly failed with the warning.
> The case shutdown the xfs when fsstress is running, and do it for many
> times. I think the reason is that dax pages in use are not able to be
> invalidated in time when fs is shutdown. The next time dax page to be
> associated, it still remains the mapping value set last time. I'll keep
> on solving it.

This one also sounds like it is going to be relevant for CXL PMEM, and
the improvements to the reference counting. CXL has a facility where the
driver asserts that no more writes are in-flight to the device so that
the device can assert a clean shutdown. Part of that will be making sure
that page access ends at fs shutdown.

> The warning message in dax_writeback_one() can also be fixed because of
> the dax unshare.
>
>
> Shiyang Ruan (8):
> fsdax: introduce page->share for fsdax in reflink mode
> fsdax: invalidate pages when CoW
> fsdax: zero the edges if source is HOLE or UNWRITTEN
> fsdax,xfs: set the shared flag when file extent is shared
> fsdax: dedupe: iter two files at the same time
> xfs: use dax ops for zero and truncate in fsdax mode
> fsdax,xfs: port unshare to fsdax
> xfs: remove restrictions for fsdax and reflink
>
> fs/dax.c | 220 +++++++++++++++++++++++++------------
> fs/xfs/xfs_ioctl.c | 4 -
> fs/xfs/xfs_iomap.c | 6 +-
> fs/xfs/xfs_iops.c | 4 -
> fs/xfs/xfs_reflink.c | 8 +-
> include/linux/dax.h | 2 +
> include/linux/mm_types.h | 5 +-
> include/linux/page-flags.h | 2 +-
> 8 files changed, 166 insertions(+), 85 deletions(-)
>
> --
> 2.38.1
>
>


2022-12-05 06:43:05

by Shiyang Ruan

[permalink] [raw]
Subject: Re: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode



在 2022/12/3 10:07, Dan Williams 写道:
> Shiyang Ruan wrote:
>> fsdax page is used not only when CoW, but also mapread. To make the it
>> easily understood, use 'share' to indicate that the dax page is shared
>> by more than one extent. And add helper functions to use it.
>>
>> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
>>
>> Signed-off-by: Shiyang Ruan <[email protected]>
>> ---
>> fs/dax.c | 38 ++++++++++++++++++++++----------------
>> include/linux/mm_types.h | 5 ++++-
>> include/linux/page-flags.h | 2 +-
>> 3 files changed, 27 insertions(+), 18 deletions(-)
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index 1c6867810cbd..edbacb273ab5 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
>> for (pfn = dax_to_pfn(entry); \
>> pfn < dax_end_pfn(entry); pfn++)
>>
>> -static inline bool dax_mapping_is_cow(struct address_space *mapping)
>> +static inline bool dax_page_is_shared(struct page *page)
>> {
>> - return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
>> + return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
>> }
>>
>> /*
>> - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
>> + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
>> + * refcount.
>> */
>> -static inline void dax_mapping_set_cow(struct page *page)
>> +static inline void dax_page_bump_sharing(struct page *page)
>
> Similar to page_ref naming I would call this page_share_get() and the
> corresponding function page_share_put().
>
>> {
>> - if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
>> + if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
>> /*
>> * Reset the index if the page was already mapped
>> * regularly before.
>> */
>> if (page->mapping)
>> - page->index = 1;
>> - page->mapping = (void *)PAGE_MAPPING_DAX_COW;
>> + page->share = 1;
>> + page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
>
> Small nit, You could save a cast here by defining
> PAGE_MAPPING_DAX_SHARED as "((void *) 1)".

Ok.

>
>> }
>> - page->index++;
>> + page->share++;
>> +}
>> +
>> +static inline unsigned long dax_page_drop_sharing(struct page *page)
>> +{
>> + return --page->share;
>> }
>>
>> /*
>> - * When it is called in dax_insert_entry(), the cow flag will indicate that
>> + * When it is called in dax_insert_entry(), the shared flag will indicate that
>> * whether this entry is shared by multiple files. If so, set the page->mapping
>> - * FS_DAX_MAPPING_COW, and use page->index as refcount.
>> + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
>> */
>> static void dax_associate_entry(void *entry, struct address_space *mapping,
>> - struct vm_area_struct *vma, unsigned long address, bool cow)
>> + struct vm_area_struct *vma, unsigned long address, bool shared)
>> {
>> unsigned long size = dax_entry_size(entry), pfn, index;
>> int i = 0;
>> @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
>> for_each_mapped_pfn(entry, pfn) {
>> struct page *page = pfn_to_page(pfn);
>>
>> - if (cow) {
>> - dax_mapping_set_cow(page);
>> + if (shared) {
>> + dax_page_bump_sharing(page);
>> } else {
>> WARN_ON_ONCE(page->mapping);
>> page->mapping = mapping;
>> @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
>> struct page *page = pfn_to_page(pfn);
>>
>> WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
>> - if (dax_mapping_is_cow(page->mapping)) {
>> - /* keep the CoW flag if this page is still shared */
>> - if (page->index-- > 0)
>> + if (dax_page_is_shared(page)) {
>> + /* keep the shared flag if this page is still shared */
>> + if (dax_page_drop_sharing(page) > 0)
>> continue;
>
> I think part of what makes this hard to read is trying to preserve the
> same code paths for shared pages and typical pages.
>
> page_share_put() should, in addition to decrementing the share, clear
> out page->mapping value.

In order to be consistent, how about naming the 3 helper functions like
this:

bool dax_page_is_shared(struct page *page);
void dax_page_share_get(struct page *page);
unsigned long dax_page_share_put(struct page *page);


--
Thanks,
Ruan.

>
>> } else
>> WARN_ON_ONCE(page->mapping && page->mapping != mapping);
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 500e536796ca..f46cac3657ad 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -103,7 +103,10 @@ struct page {
>> };
>> /* See page-flags.h for PAGE_MAPPING_FLAGS */
>> struct address_space *mapping;
>> - pgoff_t index; /* Our offset within mapping. */
>> + union {
>> + pgoff_t index; /* Our offset within mapping. */
>> + unsigned long share; /* share count for fsdax */
>> + };
>> /**
>> * @private: Mapping-private opaque data.
>> * Usually used for buffer_heads if PagePrivate.
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 0b0ae5084e60..c8a3aa02278d 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
>> * Different with flags above, this flag is used only for fsdax mode. It
>> * indicates that this page->mapping is now under reflink case.
>> */
>> -#define PAGE_MAPPING_DAX_COW 0x1
>> +#define PAGE_MAPPING_DAX_SHARED 0x1
>>
>> static __always_inline bool folio_mapping_flags(struct folio *folio)
>> {
>> --
>> 2.38.1
>>
>
>

2022-12-05 08:04:09

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

On Mon, Dec 05, 2022 at 01:56:24PM +0800, Shiyang Ruan wrote:
>
>
> 在 2022/12/3 10:07, Dan Williams 写道:
> > Shiyang Ruan wrote:
> > > fsdax page is used not only when CoW, but also mapread. To make the it
> > > easily understood, use 'share' to indicate that the dax page is shared
> > > by more than one extent. And add helper functions to use it.
> > >
> > > Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
> > >
> > > Signed-off-by: Shiyang Ruan <[email protected]>
> > > ---
> > > fs/dax.c | 38 ++++++++++++++++++++++----------------
> > > include/linux/mm_types.h | 5 ++++-
> > > include/linux/page-flags.h | 2 +-
> > > 3 files changed, 27 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index 1c6867810cbd..edbacb273ab5 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
> > > for (pfn = dax_to_pfn(entry); \
> > > pfn < dax_end_pfn(entry); pfn++)
> > > -static inline bool dax_mapping_is_cow(struct address_space *mapping)
> > > +static inline bool dax_page_is_shared(struct page *page)
> > > {
> > > - return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
> > > + return (unsigned long)page->mapping == PAGE_MAPPING_DAX_SHARED;
> > > }
> > > /*
> > > - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
> > > + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
> > > + * refcount.
> > > */
> > > -static inline void dax_mapping_set_cow(struct page *page)
> > > +static inline void dax_page_bump_sharing(struct page *page)
> >
> > Similar to page_ref naming I would call this page_share_get() and the
> > corresponding function page_share_put().
> >
> > > {
> > > - if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
> > > + if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
> > > /*
> > > * Reset the index if the page was already mapped
> > > * regularly before.
> > > */
> > > if (page->mapping)
> > > - page->index = 1;
> > > - page->mapping = (void *)PAGE_MAPPING_DAX_COW;
> > > + page->share = 1;
> > > + page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
> >
> > Small nit, You could save a cast here by defining
> > PAGE_MAPPING_DAX_SHARED as "((void *) 1)".
>
> Ok.

It's sort of a pity you can't pass around a pointer to a privately
defined const struct in dax.c. But yeah, you might as well include the
cast in the macro definition.

> >
> > > }
> > > - page->index++;
> > > + page->share++;
> > > +}
> > > +
> > > +static inline unsigned long dax_page_drop_sharing(struct page *page)
> > > +{
> > > + return --page->share;
> > > }
> > > /*
> > > - * When it is called in dax_insert_entry(), the cow flag will indicate that
> > > + * When it is called in dax_insert_entry(), the shared flag will indicate that
> > > * whether this entry is shared by multiple files. If so, set the page->mapping
> > > - * FS_DAX_MAPPING_COW, and use page->index as refcount.
> > > + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
> > > */
> > > static void dax_associate_entry(void *entry, struct address_space *mapping,
> > > - struct vm_area_struct *vma, unsigned long address, bool cow)
> > > + struct vm_area_struct *vma, unsigned long address, bool shared)
> > > {
> > > unsigned long size = dax_entry_size(entry), pfn, index;
> > > int i = 0;
> > > @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
> > > for_each_mapped_pfn(entry, pfn) {
> > > struct page *page = pfn_to_page(pfn);
> > > - if (cow) {
> > > - dax_mapping_set_cow(page);
> > > + if (shared) {
> > > + dax_page_bump_sharing(page);
> > > } else {
> > > WARN_ON_ONCE(page->mapping);
> > > page->mapping = mapping;
> > > @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
> > > struct page *page = pfn_to_page(pfn);
> > > WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> > > - if (dax_mapping_is_cow(page->mapping)) {
> > > - /* keep the CoW flag if this page is still shared */
> > > - if (page->index-- > 0)
> > > + if (dax_page_is_shared(page)) {
> > > + /* keep the shared flag if this page is still shared */
> > > + if (dax_page_drop_sharing(page) > 0)
> > > continue;
> >
> > I think part of what makes this hard to read is trying to preserve the
> > same code paths for shared pages and typical pages.
> >
> > page_share_put() should, in addition to decrementing the share, clear
> > out page->mapping value.
>
> In order to be consistent, how about naming the 3 helper functions like
> this:
>
> bool dax_page_is_shared(struct page *page);
> void dax_page_share_get(struct page *page);
> unsigned long dax_page_share_put(struct page *page);

_sharing_get/_sharing_put ?

Either way sounds fine to me.

--D

>
> --
> Thanks,
> Ruan.
>
> >
> > > } else
> > > WARN_ON_ONCE(page->mapping && page->mapping != mapping);
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 500e536796ca..f46cac3657ad 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -103,7 +103,10 @@ struct page {
> > > };
> > > /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > struct address_space *mapping;
> > > - pgoff_t index; /* Our offset within mapping. */
> > > + union {
> > > + pgoff_t index; /* Our offset within mapping. */
> > > + unsigned long share; /* share count for fsdax */
> > > + };
> > > /**
> > > * @private: Mapping-private opaque data.
> > > * Usually used for buffer_heads if PagePrivate.
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 0b0ae5084e60..c8a3aa02278d 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
> > > * Different with flags above, this flag is used only for fsdax mode. It
> > > * indicates that this page->mapping is now under reflink case.
> > > */
> > > -#define PAGE_MAPPING_DAX_COW 0x1
> > > +#define PAGE_MAPPING_DAX_SHARED 0x1
> > > static __always_inline bool folio_mapping_flags(struct folio *folio)
> > > {
> > > --
> > > 2.38.1
> > >
> >
> >

2022-12-07 03:02:48

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v2.2 1/8] fsdax: introduce page->share for fsdax in reflink mode

fsdax page is used not only when CoW, but also mapread. To make the it
easily understood, use 'share' to indicate that the dax page is shared
by more than one extent. And add helper functions to use it.

Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.

Signed-off-by: Shiyang Ruan <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
---
fs/dax.c | 38 ++++++++++++++++++++++----------------
include/linux/mm_types.h | 5 ++++-
include/linux/page-flags.h | 2 +-
3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1c6867810cbd..84fadea08705 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
for (pfn = dax_to_pfn(entry); \
pfn < dax_end_pfn(entry); pfn++)

-static inline bool dax_mapping_is_cow(struct address_space *mapping)
+static inline bool dax_page_is_shared(struct page *page)
{
- return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
+ return page->mapping == PAGE_MAPPING_DAX_SHARED;
}

/*
- * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
+ * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
+ * refcount.
*/
-static inline void dax_mapping_set_cow(struct page *page)
+static inline void dax_page_share_get(struct page *page)
{
- if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
+ if (page->mapping != PAGE_MAPPING_DAX_SHARED) {
/*
* Reset the index if the page was already mapped
* regularly before.
*/
if (page->mapping)
- page->index = 1;
- page->mapping = (void *)PAGE_MAPPING_DAX_COW;
+ page->share = 1;
+ page->mapping = PAGE_MAPPING_DAX_SHARED;
}
- page->index++;
+ page->share++;
+}
+
+static inline unsigned long dax_page_share_put(struct page *page)
+{
+ return --page->share;
}

/*
- * When it is called in dax_insert_entry(), the cow flag will indicate that
+ * When it is called in dax_insert_entry(), the shared flag will indicate that
* whether this entry is shared by multiple files. If so, set the page->mapping
- * FS_DAX_MAPPING_COW, and use page->index as refcount.
+ * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
*/
static void dax_associate_entry(void *entry, struct address_space *mapping,
- struct vm_area_struct *vma, unsigned long address, bool cow)
+ struct vm_area_struct *vma, unsigned long address, bool shared)
{
unsigned long size = dax_entry_size(entry), pfn, index;
int i = 0;
@@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);

- if (cow) {
- dax_mapping_set_cow(page);
+ if (shared) {
+ dax_page_share_get(page);
} else {
WARN_ON_ONCE(page->mapping);
page->mapping = mapping;
@@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
struct page *page = pfn_to_page(pfn);

WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
- if (dax_mapping_is_cow(page->mapping)) {
- /* keep the CoW flag if this page is still shared */
- if (page->index-- > 0)
+ if (dax_page_is_shared(page)) {
+ /* keep the shared flag if this page is still shared */
+ if (dax_page_share_put(page) > 0)
continue;
} else
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..f46cac3657ad 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -103,7 +103,10 @@ struct page {
};
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
- pgoff_t index; /* Our offset within mapping. */
+ union {
+ pgoff_t index; /* Our offset within mapping. */
+ unsigned long share; /* share count for fsdax */
+ };
/**
* @private: Mapping-private opaque data.
* Usually used for buffer_heads if PagePrivate.
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b0ae5084e60..d8e94f2f704a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* Different with flags above, this flag is used only for fsdax mode. It
* indicates that this page->mapping is now under reflink case.
*/
-#define PAGE_MAPPING_DAX_COW 0x1
+#define PAGE_MAPPING_DAX_SHARED ((void *)0x1)

static __always_inline bool folio_mapping_flags(struct folio *folio)
{
--
2.38.1

2022-12-08 01:39:12

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v2.2 1/8] fsdax: introduce page->share for fsdax in reflink mode

On Wed, Dec 07, 2022 at 02:49:19AM +0000, Shiyang Ruan wrote:
> fsdax page is used not only when CoW, but also mapread. To make the it
> easily understood, use 'share' to indicate that the dax page is shared
> by more than one extent. And add helper functions to use it.
>
> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
>
> Signed-off-by: Shiyang Ruan <[email protected]>
> Reviewed-by: Allison Henderson <[email protected]>

Looks fine to me,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/dax.c | 38 ++++++++++++++++++++++----------------
> include/linux/mm_types.h | 5 ++++-
> include/linux/page-flags.h | 2 +-
> 3 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 1c6867810cbd..84fadea08705 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
> for (pfn = dax_to_pfn(entry); \
> pfn < dax_end_pfn(entry); pfn++)
>
> -static inline bool dax_mapping_is_cow(struct address_space *mapping)
> +static inline bool dax_page_is_shared(struct page *page)
> {
> - return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
> + return page->mapping == PAGE_MAPPING_DAX_SHARED;
> }
>
> /*
> - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the refcount.
> + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
> + * refcount.
> */
> -static inline void dax_mapping_set_cow(struct page *page)
> +static inline void dax_page_share_get(struct page *page)
> {
> - if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
> + if (page->mapping != PAGE_MAPPING_DAX_SHARED) {
> /*
> * Reset the index if the page was already mapped
> * regularly before.
> */
> if (page->mapping)
> - page->index = 1;
> - page->mapping = (void *)PAGE_MAPPING_DAX_COW;
> + page->share = 1;
> + page->mapping = PAGE_MAPPING_DAX_SHARED;
> }
> - page->index++;
> + page->share++;
> +}
> +
> +static inline unsigned long dax_page_share_put(struct page *page)
> +{
> + return --page->share;
> }
>
> /*
> - * When it is called in dax_insert_entry(), the cow flag will indicate that
> + * When it is called in dax_insert_entry(), the shared flag will indicate that
> * whether this entry is shared by multiple files. If so, set the page->mapping
> - * FS_DAX_MAPPING_COW, and use page->index as refcount.
> + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
> */
> static void dax_associate_entry(void *entry, struct address_space *mapping,
> - struct vm_area_struct *vma, unsigned long address, bool cow)
> + struct vm_area_struct *vma, unsigned long address, bool shared)
> {
> unsigned long size = dax_entry_size(entry), pfn, index;
> int i = 0;
> @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
> for_each_mapped_pfn(entry, pfn) {
> struct page *page = pfn_to_page(pfn);
>
> - if (cow) {
> - dax_mapping_set_cow(page);
> + if (shared) {
> + dax_page_share_get(page);
> } else {
> WARN_ON_ONCE(page->mapping);
> page->mapping = mapping;
> @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
> struct page *page = pfn_to_page(pfn);
>
> WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> - if (dax_mapping_is_cow(page->mapping)) {
> - /* keep the CoW flag if this page is still shared */
> - if (page->index-- > 0)
> + if (dax_page_is_shared(page)) {
> + /* keep the shared flag if this page is still shared */
> + if (dax_page_share_put(page) > 0)
> continue;
> } else
> WARN_ON_ONCE(page->mapping && page->mapping != mapping);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..f46cac3657ad 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -103,7 +103,10 @@ struct page {
> };
> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> struct address_space *mapping;
> - pgoff_t index; /* Our offset within mapping. */
> + union {
> + pgoff_t index; /* Our offset within mapping. */
> + unsigned long share; /* share count for fsdax */
> + };
> /**
> * @private: Mapping-private opaque data.
> * Usually used for buffer_heads if PagePrivate.
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0b0ae5084e60..d8e94f2f704a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -641,7 +641,7 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
> * Different with flags above, this flag is used only for fsdax mode. It
> * indicates that this page->mapping is now under reflink case.
> */
> -#define PAGE_MAPPING_DAX_COW 0x1
> +#define PAGE_MAPPING_DAX_SHARED ((void *)0x1)
>
> static __always_inline bool folio_mapping_flags(struct folio *folio)
> {
> --
> 2.38.1
>

2022-12-29 08:46:03

by Shiyang Ruan

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] fsdax,xfs: fix warning messages



在 2022/12/3 9:21, Dan Williams 写道:
> Shiyang Ruan wrote:
>> Changes since v1:
>> 1. Added a snippet of the warning message and some of the failed cases
>> 2. Separated the patch for easily review
>> 3. Added page->share and its helper functions
>> 4. Included the patch[1] that removes the restrictions of fsdax and reflink
>> [1] https://lore.kernel.org/linux-xfs/[email protected]/
>>
...
>>
>> This also effects dax+noreflink mode if we run the test after a
>> dax+reflink test. So, the most urgent thing is solving the warning
>> messages.
>>
>> With these fixes, most warning messages in dax_associate_entry() are
>> gone. But honestly, generic/388 will randomly failed with the warning.
>> The case shutdown the xfs when fsstress is running, and do it for many
>> times. I think the reason is that dax pages in use are not able to be
>> invalidated in time when fs is shutdown. The next time dax page to be
>> associated, it still remains the mapping value set last time. I'll keep
>> on solving it.
>
> This one also sounds like it is going to be relevant for CXL PMEM, and
> the improvements to the reference counting. CXL has a facility where the
> driver asserts that no more writes are in-flight to the device so that
> the device can assert a clean shutdown. Part of that will be making sure
> that page access ends at fs shutdown.

I was trying to locate the root cause of the fail on generic/388. But
since it's a fsstress test, I can't relpay the operation sequence to
help me locate the operations. So, I tried to replace fsstress with
fsx, which can do replay after the case fails, but it can't reproduce
the fail. I think another important factor is that fsstress tests with
multiple threads. So, for now, it's hard for me to locate the cause by
running the test.

Then I updated the kernel to the latest v6.2-rc1 and run generic/388 for
many times. The warning dmesg doesn't show any more.

How is your test on this case? Does it still fail on the latest kernel?
If so, I think I have to keep on locating the cause, and need your advice.


--
Thanks,
Ruan.

>
>> The warning message in dax_writeback_one() can also be fixed because of
>> the dax unshare.
>>
>>
>> Shiyang Ruan (8):
>> fsdax: introduce page->share for fsdax in reflink mode
>> fsdax: invalidate pages when CoW
>> fsdax: zero the edges if source is HOLE or UNWRITTEN
>> fsdax,xfs: set the shared flag when file extent is shared
>> fsdax: dedupe: iter two files at the same time
>> xfs: use dax ops for zero and truncate in fsdax mode
>> fsdax,xfs: port unshare to fsdax
>> xfs: remove restrictions for fsdax and reflink
>>
>> fs/dax.c | 220 +++++++++++++++++++++++++------------
>> fs/xfs/xfs_ioctl.c | 4 -
>> fs/xfs/xfs_iomap.c | 6 +-
>> fs/xfs/xfs_iops.c | 4 -
>> fs/xfs/xfs_reflink.c | 8 +-
>> include/linux/dax.h | 2 +
>> include/linux/mm_types.h | 5 +-
>> include/linux/page-flags.h | 2 +-
>> 8 files changed, 166 insertions(+), 85 deletions(-)
>>
>> --
>> 2.38.1
>>
>>
>
>