Memory failure occurs in fsdax mode will finally be handled in
filesystem. We introduce this interface to find out files or metadata
affected by the corrupted range, and try to recover the corrupted data
if possiable.
Signed-off-by: Shiyang Ruan <[email protected]>
---
include/linux/fs.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..92af36c4225f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2176,6 +2176,8 @@ struct super_operations {
struct shrink_control *);
long (*free_cached_objects)(struct super_block *,
struct shrink_control *);
+ int (*corrupted_range)(struct super_block *sb, struct block_device *bdev,
+ loff_t offset, size_t len, void *data);
};
/*
--
2.31.1
[ drop old [email protected], add [email protected] ]
On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <[email protected]> wrote:
>
> Memory failure occurs in fsdax mode will finally be handled in
> filesystem. We introduce this interface to find out files or metadata
> affected by the corrupted range, and try to recover the corrupted data
> if possiable.
>
> Signed-off-by: Shiyang Ruan <[email protected]>
> ---
> include/linux/fs.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c3c88fdb9b2a..92af36c4225f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2176,6 +2176,8 @@ struct super_operations {
> struct shrink_control *);
> long (*free_cached_objects)(struct super_block *,
> struct shrink_control *);
> + int (*corrupted_range)(struct super_block *sb, struct block_device *bdev,
> + loff_t offset, size_t len, void *data);
Why does the superblock need a new operation? Wouldn't whatever
function is specified here just be specified to the dax_dev as the
->notify_failure() holder callback?
> -----Original Message-----
> From: Dan Williams <[email protected]>
> Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock
>
> [ drop old [email protected], add [email protected] ]
>
> On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <[email protected]> wrote:
> >
> > Memory failure occurs in fsdax mode will finally be handled in
> > filesystem. We introduce this interface to find out files or metadata
> > affected by the corrupted range, and try to recover the corrupted data
> > if possiable.
> >
> > Signed-off-by: Shiyang Ruan <[email protected]>
> > ---
> > include/linux/fs.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h index
> > c3c88fdb9b2a..92af36c4225f 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2176,6 +2176,8 @@ struct super_operations {
> > struct shrink_control *);
> > long (*free_cached_objects)(struct super_block *,
> > struct shrink_control *);
> > + int (*corrupted_range)(struct super_block *sb, struct block_device
> *bdev,
> > + loff_t offset, size_t len, void *data);
>
> Why does the superblock need a new operation? Wouldn't whatever function is
> specified here just be specified to the dax_dev as the
> ->notify_failure() holder callback?
Because we need to find out which file is effected by the given poison page so that memory-failure code can do collect_procs() and kill_procs() jobs. And it needs filesystem to use its rmap feature to search the file from a given offset. So, we need this implemented by the specified filesystem and called by dax_device's holder.
This is the call trace I described in cover letter:
memory_failure()
* fsdax case
pgmap->ops->memory_failure() => pmem_pgmap_memory_failure()
dax_device->holder_ops->corrupted_range() =>
- fs_dax_corrupted_range()
- md_dax_corrupted_range()
sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== **HERE**
xfs_rmap_query_range()
xfs_currupt_helper()
* corrupted on metadata
try to recover data, call xfs_force_shutdown()
* corrupted on file data
try to recover data, call mf_dax_kill_procs()
* normal case
mf_generic_kill_procs()
As you can see, this new added operation is an important for the whole progress.
--
Thanks,
Ruan.
On Wed, Jun 16, 2021 at 11:51 PM [email protected]
<[email protected]> wrote:
>
> > -----Original Message-----
> > From: Dan Williams <[email protected]>
> > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock
> >
> > [ drop old [email protected], add [email protected] ]
> >
> > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <[email protected]> wrote:
> > >
> > > Memory failure occurs in fsdax mode will finally be handled in
> > > filesystem. We introduce this interface to find out files or metadata
> > > affected by the corrupted range, and try to recover the corrupted data
> > > if possiable.
> > >
> > > Signed-off-by: Shiyang Ruan <[email protected]>
> > > ---
> > > include/linux/fs.h | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h index
> > > c3c88fdb9b2a..92af36c4225f 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -2176,6 +2176,8 @@ struct super_operations {
> > > struct shrink_control *);
> > > long (*free_cached_objects)(struct super_block *,
> > > struct shrink_control *);
> > > + int (*corrupted_range)(struct super_block *sb, struct block_device
> > *bdev,
> > > + loff_t offset, size_t len, void *data);
> >
> > Why does the superblock need a new operation? Wouldn't whatever function is
> > specified here just be specified to the dax_dev as the
> > ->notify_failure() holder callback?
>
> Because we need to find out which file is effected by the given poison page so that memory-failure code can do collect_procs() and kill_procs() jobs. And it needs filesystem to use its rmap feature to search the file from a given offset. So, we need this implemented by the specified filesystem and called by dax_device's holder.
>
> This is the call trace I described in cover letter:
> memory_failure()
> * fsdax case
> pgmap->ops->memory_failure() => pmem_pgmap_memory_failure()
> dax_device->holder_ops->corrupted_range() =>
> - fs_dax_corrupted_range()
> - md_dax_corrupted_range()
> sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== **HERE**
> xfs_rmap_query_range()
> xfs_currupt_helper()
> * corrupted on metadata
> try to recover data, call xfs_force_shutdown()
> * corrupted on file data
> try to recover data, call mf_dax_kill_procs()
> * normal case
> mf_generic_kill_procs()
>
> As you can see, this new added operation is an important for the whole progress.
I don't think you need either fs_dax_corrupted_range() nor
sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range()
looks broken because the filesystem may not even be mounted on the
device associated with the error. The holder_data and holder_op should
be sufficient from communicating the stack of notifications:
pgmap->notify_memory_failure() => pmem_pgmap_notify_failure()
pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) =>
md_dax_notify_failure()
md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure()
I.e. the entire chain just walks dax_dev holder ops.
> -----Original Message-----
> From: Dan Williams <[email protected]>
> Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock
>
> On Wed, Jun 16, 2021 at 11:51 PM [email protected]
> <[email protected]> wrote:
> >
> > > -----Original Message-----
> > > From: Dan Williams <[email protected]>
> > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for
> > > superblock
> > >
> > > [ drop old [email protected], add [email protected] ]
> > >
> > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <[email protected]>
> wrote:
> > > >
> > > > Memory failure occurs in fsdax mode will finally be handled in
> > > > filesystem. We introduce this interface to find out files or
> > > > metadata affected by the corrupted range, and try to recover the
> > > > corrupted data if possiable.
> > > >
> > > > Signed-off-by: Shiyang Ruan <[email protected]>
> > > > ---
> > > > include/linux/fs.h | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index
> > > > c3c88fdb9b2a..92af36c4225f 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -2176,6 +2176,8 @@ struct super_operations {
> > > > struct shrink_control *);
> > > > long (*free_cached_objects)(struct super_block *,
> > > > struct shrink_control *);
> > > > + int (*corrupted_range)(struct super_block *sb, struct
> > > > + block_device
> > > *bdev,
> > > > + loff_t offset, size_t len, void
> > > > + *data);
> > >
> > > Why does the superblock need a new operation? Wouldn't whatever
> > > function is specified here just be specified to the dax_dev as the
> > > ->notify_failure() holder callback?
> >
> > Because we need to find out which file is effected by the given poison page so
> that memory-failure code can do collect_procs() and kill_procs() jobs. And it
> needs filesystem to use its rmap feature to search the file from a given offset.
> So, we need this implemented by the specified filesystem and called by
> dax_device's holder.
> >
> > This is the call trace I described in cover letter:
> > memory_failure()
> > * fsdax case
> > pgmap->ops->memory_failure() => pmem_pgmap_memory_failure()
> > dax_device->holder_ops->corrupted_range() =>
> > - fs_dax_corrupted_range()
> > - md_dax_corrupted_range()
> > sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <==
> **HERE**
> > xfs_rmap_query_range()
> > xfs_currupt_helper()
> > * corrupted on metadata
> > try to recover data, call xfs_force_shutdown()
> > * corrupted on file data
> > try to recover data, call mf_dax_kill_procs()
> > * normal case
> > mf_generic_kill_procs()
> >
> > As you can see, this new added operation is an important for the whole
> progress.
>
> I don't think you need either fs_dax_corrupted_range() nor
> sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range()
> looks broken because the filesystem may not even be mounted on the device
> associated with the error.
If filesystem is not mounted, then there won't be any process using the broken page and no one need to be killed in memory-failure. So, I think we can just return and handle the error on driver level if needed.
> The holder_data and holder_op should be sufficient
> from communicating the stack of notifications:
>
> pgmap->notify_memory_failure() => pmem_pgmap_notify_failure()
> pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) =>
> md_dax_notify_failure()
> md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure()
>
> I.e. the entire chain just walks dax_dev holder ops.
Oh, I see. Just need to implement holder_ops in filesystem or mapped_device directly. I made the routine complicated.
--
Thanks,
Ruan.