2023-02-04 14:59:16

by Shiyang Ruan

[permalink] [raw]
Subject: [RESEND PATCH v9 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

Changes since v9:
1. Rebase on 6.2-rc6

Changes since v8:
1. P2: rename drop_pagecache_sb() to super_drop_pagecache().
2. P2: let super_drop_pagecache() accept invalidate method.
3. P3: invalidate all dax mappings by invalidate_inode_pages2().
4. P3: shutdown the filesystem when it is to be removed.
5. Rebase on 6.0-rc6 + Darrick's patch[1] + Dan's patch[2].

[1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
[2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/

Shiyang Ruan (3):
xfs: fix the calculation of length and end
fs: move drop_pagecache_sb() for others to use
mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

drivers/dax/super.c | 3 ++-
fs/drop_caches.c | 35 ++----------------------------
fs/super.c | 43 +++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_notify_failure.c | 36 ++++++++++++++++++++++++++-----
include/linux/fs.h | 1 +
include/linux/mm.h | 1 +
include/linux/pagemap.h | 1 +
mm/truncate.c | 20 +++++++++++++++--
8 files changed, 99 insertions(+), 41 deletions(-)

--
2.39.1



2023-02-04 14:59:39

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v9 1/3] xfs: fix the calculation of length and end

The end should be start + length - 1. Also fix the calculation of the
length when seeking for intersection of notify range and device.

Signed-off-by: Shiyang Ruan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/xfs_notify_failure.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index c4078d0ec108..3830f908e215 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -114,7 +114,7 @@ xfs_dax_notify_ddev_failure(
int error = 0;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr);
xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno);
- xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen);
+ xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen - 1);
xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno);

error = xfs_trans_alloc_empty(mp, &tp);
@@ -210,7 +210,7 @@ xfs_dax_notify_failure(
ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;

/* Ignore the range out of filesystem area */
- if (offset + len < ddev_start)
+ if (offset + len - 1 < ddev_start)
return -ENXIO;
if (offset > ddev_end)
return -ENXIO;
@@ -222,8 +222,8 @@ xfs_dax_notify_failure(
len -= ddev_start - offset;
offset = 0;
}
- if (offset + len > ddev_end)
- len -= ddev_end - offset;
+ if (offset + len - 1 > ddev_end)
+ len -= offset + len - 1 - ddev_end;

return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len),
mf_flags);
--
2.39.1


2023-02-04 14:59:45

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v9 2/3] fs: move drop_pagecache_sb() for others to use

xfs_notify_failure.c requires a method to invalidate all dax mappings.
drop_pagecache_sb() can do this but it is a static function and only
build with CONFIG_SYSCTL. Now, move it to super.c and make it available
for others. And use its second argument to choose which invalidate
method to use.

Signed-off-by: Shiyang Ruan <[email protected]>
---
fs/drop_caches.c | 35 ++-------------------------------
fs/super.c | 43 +++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 1 +
include/linux/pagemap.h | 1 +
mm/truncate.c | 20 +++++++++++++++++--
5 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index e619c31b6bd9..4c9281885077 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -15,38 +15,6 @@
/* A global variable is a bit ugly, but it keeps the code simple */
int sysctl_drop_caches;

-static void drop_pagecache_sb(struct super_block *sb, void *unused)
-{
- struct inode *inode, *toput_inode = NULL;
-
- spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- spin_lock(&inode->i_lock);
- /*
- * We must skip inodes in unusual state. We may also skip
- * inodes without pages but we deliberately won't in case
- * we need to reschedule to avoid softlockups.
- */
- if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
- (mapping_empty(inode->i_mapping) && !need_resched())) {
- spin_unlock(&inode->i_lock);
- continue;
- }
- __iget(inode);
- spin_unlock(&inode->i_lock);
- spin_unlock(&sb->s_inode_list_lock);
-
- invalidate_mapping_pages(inode->i_mapping, 0, -1);
- iput(toput_inode);
- toput_inode = inode;
-
- cond_resched();
- spin_lock(&sb->s_inode_list_lock);
- }
- spin_unlock(&sb->s_inode_list_lock);
- iput(toput_inode);
-}
-
int drop_caches_sysctl_handler(struct ctl_table *table, int write,
void *buffer, size_t *length, loff_t *ppos)
{
@@ -59,7 +27,8 @@ int drop_caches_sysctl_handler(struct ctl_table *table, int write,
static int stfu;

if (sysctl_drop_caches & 1) {
- iterate_supers(drop_pagecache_sb, NULL);
+ iterate_supers(super_drop_pagecache,
+ invalidate_inode_pages);
count_vm_event(DROP_PAGECACHE);
}
if (sysctl_drop_caches & 2) {
diff --git a/fs/super.c b/fs/super.c
index 12c08cb20405..d788b73f93f0 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -36,6 +36,7 @@
#include <linux/lockdep.h>
#include <linux/user_namespace.h>
#include <linux/fs_context.h>
+#include <linux/pagemap.h>
#include <uapi/linux/mount.h>
#include "internal.h"

@@ -678,6 +679,48 @@ void drop_super_exclusive(struct super_block *sb)
}
EXPORT_SYMBOL(drop_super_exclusive);

+/*
+ * super_drop_pagecache - drop all page caches of a filesystem
+ * @sb: superblock to invalidate
+ * @arg: invalidate method, such as invalidate_inode_pages(),
+ * invalidate_inode_pages2()
+ *
+ * Scans the inodes of a filesystem, drop all page caches.
+ */
+void super_drop_pagecache(struct super_block *sb, void *arg)
+{
+ struct inode *inode, *toput_inode = NULL;
+ int (*invalidator)(struct address_space *) = arg;
+
+ spin_lock(&sb->s_inode_list_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ spin_lock(&inode->i_lock);
+ /*
+ * We must skip inodes in unusual state. We may also skip
+ * inodes without pages but we deliberately won't in case
+ * we need to reschedule to avoid softlockups.
+ */
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (mapping_empty(inode->i_mapping) && !need_resched())) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
+ __iget(inode);
+ spin_unlock(&inode->i_lock);
+ spin_unlock(&sb->s_inode_list_lock);
+
+ invalidator(inode->i_mapping);
+ iput(toput_inode);
+ toput_inode = inode;
+
+ cond_resched();
+ spin_lock(&sb->s_inode_list_lock);
+ }
+ spin_unlock(&sb->s_inode_list_lock);
+ iput(toput_inode);
+}
+EXPORT_SYMBOL(super_drop_pagecache);
+
static void __iterate_supers(void (*f)(struct super_block *))
{
struct super_block *sb, *p = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1769a2c5d70..b853632e76cd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3308,6 +3308,7 @@ extern struct super_block *get_super(struct block_device *);
extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
extern void drop_super_exclusive(struct super_block *sb);
+void super_drop_pagecache(struct super_block *sb, void *unused);
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 29e1f9e76eb6..d0a180268baa 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -27,6 +27,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
S_ISLNK(inode->i_mode))
invalidate_mapping_pages(inode->i_mapping, 0, -1);
}
+int invalidate_inode_pages(struct address_space *mapping);
int invalidate_inode_pages2(struct address_space *mapping);
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b..131f2ab2d566 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -540,12 +540,13 @@ unsigned long invalidate_mapping_pagevec(struct address_space *mapping,
}

/**
- * invalidate_mapping_pages - Invalidate all clean, unlocked cache of one inode
+ * invalidate_mapping_pages - Invalidate range of clean, unlocked cache of one
+ * inode
* @mapping: the address_space which holds the cache to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
*
- * This function removes pages that are clean, unmapped and unlocked,
+ * This function removes range of pages that are clean, unmapped and unlocked,
* as well as shadow entries. It will not block on IO activity.
*
* If you want to remove all the pages of one inode, regardless of
@@ -560,6 +561,21 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
}
EXPORT_SYMBOL(invalidate_mapping_pages);

+/**
+ * invalidate_inode_pages - Invalidate all clean, unlocked cache of one inode
+ * @mapping: the address_space which holds the cache to invalidate
+ *
+ * This function removes all pages that are clean, unmapped and unlocked,
+ * as well as shadow entries. It will not block on IO activity.
+ */
+int invalidate_inode_pages(struct address_space *mapping)
+{
+ invalidate_mapping_pages(mapping, 0, -1);
+
+ return 0;
+}
+EXPORT_SYMBOL(invalidate_inode_pages);
+
/*
* This is like invalidate_inode_page(), except it ignores the page's
* refcount. We do this because invalidate_inode_pages2() needs stronger
--
2.39.1


2023-02-04 14:59:48

by Shiyang Ruan

[permalink] [raw]
Subject: [PATCH v9 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1]. With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
(or mapped device) on it to unmap all files in use and notify processes
who are using those files.

Call trace:
trigger unbind
-> unbind_store()
-> ... (skip)
-> devres_release_all() # was pmem driver ->remove() in v1
-> kill_dax()
-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
-> xfs_dax_notify_failure()

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event. So do not shutdown filesystem directly if something not
supported, or if failure range includes metadata area. Make sure all
files and processes are handled correctly.

[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/

Signed-off-by: Shiyang Ruan <[email protected]>
---
drivers/dax/super.c | 3 ++-
fs/xfs/xfs_notify_failure.c | 28 +++++++++++++++++++++++++++-
include/linux/mm.h | 1 +
3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index da4438f3188c..40274d19f4f9 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
return;

if (dax_dev->holder_data != NULL)
- dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
+ dax_holder_notify_failure(dax_dev, 0, U64_MAX,
+ MF_MEM_PRE_REMOVE);

clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
synchronize_srcu(&dax_srcu);
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index 3830f908e215..5c1e678a1285 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -22,6 +22,7 @@

#include <linux/mm.h>
#include <linux/dax.h>
+#include <linux/fs.h>

struct xfs_failure_info {
xfs_agblock_t startblock;
@@ -77,6 +78,9 @@ xfs_dax_failure_fn(

if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
(rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
+ /* The device is about to be removed. Not a really failure. */
+ if (notify->mf_flags & MF_MEM_PRE_REMOVE)
+ return 0;
notify->want_shutdown = true;
return 0;
}
@@ -168,7 +172,9 @@ xfs_dax_notify_ddev_failure(
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
if (!error)
error = -EFSCORRUPTED;
- }
+ } else if (mf_flags & MF_MEM_PRE_REMOVE)
+ xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
+
return error;
}

@@ -182,12 +188,24 @@ xfs_dax_notify_failure(
struct xfs_mount *mp = dax_holder(dax_dev);
u64 ddev_start;
u64 ddev_end;
+ int error;

if (!(mp->m_super->s_flags & SB_BORN)) {
xfs_warn(mp, "filesystem is not ready for notify_failure()!");
return -EIO;
}

+ if (mf_flags & MF_MEM_PRE_REMOVE) {
+ xfs_info(mp, "device is about to be removed!");
+ down_write(&mp->m_super->s_umount);
+ error = sync_filesystem(mp->m_super);
+ /* invalidate_inode_pages2() invalidates dax mapping */
+ super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
+ up_write(&mp->m_super->s_umount);
+ if (error)
+ return error;
+ }
+
if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
xfs_debug(mp,
"notify_failure() not supported on realtime device!");
@@ -196,6 +214,8 @@ xfs_dax_notify_failure(

if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
mp->m_logdev_targp != mp->m_ddev_targp) {
+ if (mf_flags & MF_MEM_PRE_REMOVE)
+ return 0;
xfs_err(mp, "ondisk log corrupt, shutting down fs!");
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
return -EFSCORRUPTED;
@@ -209,6 +229,12 @@ xfs_dax_notify_failure(
ddev_start = mp->m_ddev_targp->bt_dax_part_off;
ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;

+ /* Notify failure on the whole device */
+ if (offset == 0 && len == U64_MAX) {
+ offset = ddev_start;
+ len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev);
+ }
+
/* Ignore the range out of filesystem area */
if (offset + len - 1 < ddev_start)
return -ENXIO;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..9711dbc9451f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3424,6 +3424,7 @@ enum mf_flags {
MF_UNPOISON = 1 << 4,
MF_SW_SIMULATED = 1 << 5,
MF_NO_RETRY = 1 << 6,
+ MF_MEM_PRE_REMOVE = 1 << 7,
};
int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
unsigned long count, int mf_flags);
--
2.39.1


2023-02-05 11:42:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v9 1/3] xfs: fix the calculation of length and end

On Sat, Feb 04, 2023 at 02:58:36PM +0000, Shiyang Ruan wrote:
> @@ -222,8 +222,8 @@ xfs_dax_notify_failure(
> len -= ddev_start - offset;
> offset = 0;
> }
> - if (offset + len > ddev_end)
> - len -= ddev_end - offset;
> + if (offset + len - 1 > ddev_end)
> + len -= offset + len - 1 - ddev_end;

This _looks_ wrong. Are you sure it shouldn't be:

len = ddev_end - offset + 1;


2023-02-05 11:51:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v9 2/3] fs: move drop_pagecache_sb() for others to use

On Sat, Feb 04, 2023 at 02:58:37PM +0000, Shiyang Ruan wrote:
> @@ -678,6 +679,48 @@ void drop_super_exclusive(struct super_block *sb)
> }
> EXPORT_SYMBOL(drop_super_exclusive);
>
> +/*

You've gone to the trouble of writing kernel-doc, just add the extra '*'
and make it actually part of the documentation!

> + * super_drop_pagecache - drop all page caches of a filesystem
> + * @sb: superblock to invalidate
> + * @arg: invalidate method, such as invalidate_inode_pages(),
> + * invalidate_inode_pages2()
> + *
> + * Scans the inodes of a filesystem, drop all page caches.
> + */

> +++ b/include/linux/fs.h
> @@ -3308,6 +3308,7 @@ extern struct super_block *get_super(struct block_device *);
> extern struct super_block *get_active_super(struct block_device *bdev);
> extern void drop_super(struct super_block *sb);
> extern void drop_super_exclusive(struct super_block *sb);
> +void super_drop_pagecache(struct super_block *sb, void *unused);

But the arg isn't unused. Call it 'invalidator' here.

> +/**
> + * invalidate_inode_pages - Invalidate all clean, unlocked cache of one inode
> + * @mapping: the address_space which holds the cache to invalidate
> + *
> + * This function removes all pages that are clean, unmapped and unlocked,
> + * as well as shadow entries. It will not block on IO activity.
> + */
> +int invalidate_inode_pages(struct address_space *mapping)
> +{
> + invalidate_mapping_pages(mapping, 0, -1);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(invalidate_inode_pages);

I might make this a static function in super.c but maybe you need it
in the next patch.

2023-02-05 21:10:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v9 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

On Sat, Feb 04, 2023 at 02:58:38PM +0000, Shiyang Ruan wrote:
> + if (mf_flags & MF_MEM_PRE_REMOVE) {
> + xfs_info(mp, "device is about to be removed!");
> + down_write(&mp->m_super->s_umount);
> + error = sync_filesystem(mp->m_super);
> + /* invalidate_inode_pages2() invalidates dax mapping */
> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);

OK, there's a better way to handle all of this.

First, an essentially untyped interface with a void * argument is
bad. Second, we can do all this with invalidate_inode_pages2_range()
and invalidate_mapping_pages() without introducing a new function.

Here's the proposal:

void super_drop_pagecache(struct super_block *sb,
int (*invalidate)(struct address_space *, pgoff_t start, pgoff_t end))

In fs/drop-caches.c:

static void drop_pagecache_sb(struct super_block *sb, void *unused)
{
super_drop_pagecache(sb, invalidate_mapping_pages);
}

In XFS:

super_drop_pagecache(mp->m_super,
invalidate_inode_pages2_range);

Much smaller change ...

2023-02-05 21:50:11

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v9 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

On Sat, Feb 04, 2023 at 02:58:38PM +0000, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1]. With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.

.....

> @@ -182,12 +188,24 @@ xfs_dax_notify_failure(
> struct xfs_mount *mp = dax_holder(dax_dev);
> u64 ddev_start;
> u64 ddev_end;
> + int error;
>
> if (!(mp->m_super->s_flags & SB_BORN)) {
> xfs_warn(mp, "filesystem is not ready for notify_failure()!");
> return -EIO;
> }
>
> + if (mf_flags & MF_MEM_PRE_REMOVE) {
> + xfs_info(mp, "device is about to be removed!");
> + down_write(&mp->m_super->s_umount);
> + error = sync_filesystem(mp->m_super);
> + /* invalidate_inode_pages2() invalidates dax mapping */
> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
> + up_write(&mp->m_super->s_umount);

I really don't like this.

super_drop_pagecache() doesn't guarantee that everything is removed
from cache. It is racy - it doesn't touch inodes being freed or
being instantiated, nor does it prevent concurrent accesses to the
inodes from re-instantiating page cache pages and dirtying them
after the inode has been scanned by super_drop_pagecache().

If we are about to remove the block device and we want to guarantee
that the filesystem is cleaned and stable before the device gets
yanked out from under running applications, then we have to
guarantee that we stall the running applications trying to modify
the filesystem between the MF_MEM_PRE_REMOVE and the actual removal
event that then shuts down the filesystem. Invalidating the page
cache is not enough to guarantee this.

Keep in mind that we're going to walk the rmap after writing the
data to kill any processes that have mmap()d files in the filesystem
after we've dropped the page cache - the page cache invalidation
doesn't change this at all - and this will kill any active userspace
DAX mappings before the device is unplugged. So I don't actually see
how walking the page cache to invalidate it here actually helps
"invalidate dax mapping" reliably as new write page faults on dax
VMAs can still occur between super_drop_pagecache() and the rmap
walk triggering kills on processes with DAX mapped VMAs.

We also don't care if read-only operations race with device unplug -
they are going to get EIO the moment the device is actually
unplugged or the filesystem is shutdown anyway, so it doesn't matter
if reads race with the device remove event. Hence all we really
care about here is not dirtying the filesystem after we've started
processing the MF_MEM_PRE_REMOVE event.

Realistically, I think we need to freeze the filesystem here to
prevent racing modifications occurring during the rmap + VMA walk +
proc kill. That could be write() IO dirtying new data or other
transactions running dirtying the journal/metadata. Both
sync_filesystem() and super_drop_pagecache() operate on current
state - they don't prevent future dax mapping instantiation or
dirtying from happening on the device, so they don't prevent this...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2023-02-06 06:46:20

by Shiyang Ruan

[permalink] [raw]
Subject: Re: [PATCH v9 1/3] xfs: fix the calculation of length and end



在 2023/2/5 19:42, Matthew Wilcox 写道:
> On Sat, Feb 04, 2023 at 02:58:36PM +0000, Shiyang Ruan wrote:
>> @@ -222,8 +222,8 @@ xfs_dax_notify_failure(
>> len -= ddev_start - offset;
>> offset = 0;
>> }
>> - if (offset + len > ddev_end)
>> - len -= ddev_end - offset;
>> + if (offset + len - 1 > ddev_end)
>> + len -= offset + len - 1 - ddev_end;
>
> This _looks_ wrong. Are you sure it shouldn't be:
>
> len = ddev_end - offset + 1;
>

It is to make sure the range won't beyond the end of device.

But actually, both of us are rgiht.
Mine: len -= offset + len - 1 - ddev_end;
=> len = len - (offset + len - 1 - ddev_end);
=> len = len - offset - len + 1 + ddev_end;
=> len = ddev_end - offset + 1; --> Yours

I forgot to simplify it. Will fix.


--
Thanks,
Ruan.