Subject: [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers

Add the callbacks as opt-in to override the default behavior for
the transaction's inode list, instead of moving that code around.

This is important as not only ext4 uses the inode list: ocfs2 too,
via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.

To opt-out of the default behavior (i.e., to do nothing), one has
to opt-in with a no-op function.
---
fs/jbd2/commit.c | 21 ++++++++++++++++-----
include/linux/jbd2.h | 21 ++++++++++++++++++++-
2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 51f713089e35..b98d227b50d8 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal,
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
+ * This can be overriden with a custom callback.
*/
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
- err = journal_submit_inode_data_buffers(mapping, dirty_start,
- dirty_end);
+ if (journal->j_submit_inode_data_buffers)
+ err = journal->j_submit_inode_data_buffers(jinode);
+ else
+ err = journal_submit_inode_data_buffers(mapping,
+ dirty_start, dirty_end);
if (!ret)
ret = err;
spin_lock(&journal->j_list_lock);
@@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
continue;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
- err = filemap_fdatawait_range_keep_errors(
- jinode->i_vfs_inode->i_mapping, dirty_start,
- dirty_end);
+ /*
+ * Wait for the inode data buffers writeout.
+ * This can be overriden with a custom callback.
+ */
+ if (journal->j_finish_inode_data_buffers)
+ err = journal->j_finish_inode_data_buffers(jinode);
+ else
+ err = filemap_fdatawait_range_keep_errors(
+ jinode->i_vfs_inode->i_mapping,
+ dirty_start, dirty_end);
if (!ret)
ret = err;
spin_lock(&journal->j_list_lock);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d56128df2aff..24efe88eda1b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -628,7 +628,8 @@ struct transaction_s
struct journal_head *t_shadow_list;

/*
- * List of inodes whose data we've modified in data=ordered mode.
+ * List of inodes whose data we've modified in data=ordered mode
+ * or whose pages we should write-protect in data=journaled mode.
* [j_list_lock]
*/
struct list_head t_inode_list;
@@ -1110,6 +1111,24 @@ struct journal_s
void (*j_commit_callback)(journal_t *,
transaction_t *);

+ /**
+ * @j_submit_inode_data_buffers:
+ *
+ * This function is called before flushing metadata buffers.
+ * This overrides the default behavior (writeout data buffers.)
+ */
+ int (*j_submit_inode_data_buffers)
+ (struct jbd2_inode *);
+
+ /**
+ * @j_finish_inode_data_buffers:
+ *
+ * This function is called after flushing metadata buffers.
+ * This overrides the default behavior (wait writeout.)
+ */
+ int (*j_finish_inode_data_buffers)
+ (struct jbd2_inode *);
+
/*
* Journal statistics
*/
--
2.17.1


2020-08-18 14:52:41

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers

On Sun 09-08-20 22:02:05, Mauricio Faria de Oliveira wrote:
> Add the callbacks as opt-in to override the default behavior for
> the transaction's inode list, instead of moving that code around.
>
> This is important as not only ext4 uses the inode list: ocfs2 too,
> via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.

I'd prefer if the callback is called unconditionally, jbd2 exports the
callback that implements the current behavior and and both ext4 & ocfs2
are adapted to use this callback. We don't care about out of tree code.
That way things are cleaner long term...

> To opt-out of the default behavior (i.e., to do nothing), one has
> to opt-in with a no-op function.

Your Signed-off-by is missing for this patch.

> ---
> fs/jbd2/commit.c | 21 ++++++++++++++++-----
> include/linux/jbd2.h | 21 ++++++++++++++++++++-
> 2 files changed, 36 insertions(+), 6 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 51f713089e35..b98d227b50d8 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal,
> * instead of writepages. Because writepages can do
> * block allocation with delalloc. We need to write
> * only allocated blocks here.
> + * This can be overriden with a custom callback.
> */
> trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> - err = journal_submit_inode_data_buffers(mapping, dirty_start,
> - dirty_end);
> + if (journal->j_submit_inode_data_buffers)
> + err = journal->j_submit_inode_data_buffers(jinode);
> + else
> + err = journal_submit_inode_data_buffers(mapping,
> + dirty_start, dirty_end);
> if (!ret)
> ret = err;
> spin_lock(&journal->j_list_lock);
> @@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
> continue;
> jinode->i_flags |= JI_COMMIT_RUNNING;
> spin_unlock(&journal->j_list_lock);
> - err = filemap_fdatawait_range_keep_errors(
> - jinode->i_vfs_inode->i_mapping, dirty_start,
> - dirty_end);
> + /*
> + * Wait for the inode data buffers writeout.
> + * This can be overriden with a custom callback.
> + */
> + if (journal->j_finish_inode_data_buffers)
> + err = journal->j_finish_inode_data_buffers(jinode);
> + else
> + err = filemap_fdatawait_range_keep_errors(
> + jinode->i_vfs_inode->i_mapping,
> + dirty_start, dirty_end);
> if (!ret)
> ret = err;
> spin_lock(&journal->j_list_lock);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index d56128df2aff..24efe88eda1b 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -628,7 +628,8 @@ struct transaction_s
> struct journal_head *t_shadow_list;
>
> /*
> - * List of inodes whose data we've modified in data=ordered mode.
> + * List of inodes whose data we've modified in data=ordered mode
> + * or whose pages we should write-protect in data=journaled mode.

I'd rather change the comment to generic "List of inodes associated with
the transaction. E.g. ext4 uses this to track inodes in data=ordered and
data=journal mode that need special handling on transaction commit.".

> * [j_list_lock]
> */
> struct list_head t_inode_list;
> @@ -1110,6 +1111,24 @@ struct journal_s
> void (*j_commit_callback)(journal_t *,
> transaction_t *);
>
> + /**
> + * @j_submit_inode_data_buffers:
> + *
> + * This function is called before flushing metadata buffers.
> + * This overrides the default behavior (writeout data buffers.)
> + */

I'd change the comment to:
* This function is called for all inodes associated with the
* committing transaction marked with JI_WRITE_DATA flag before we
* start to write out the transaction to the journal.

> + int (*j_submit_inode_data_buffers)
> + (struct jbd2_inode *);
> +
> + /**
> + * @j_finish_inode_data_buffers:
> + *
> + * This function is called after flushing metadata buffers.
> + * This overrides the default behavior (wait writeout.)
> + */

And here:
* This function is called for all inodes associated with the
* committing transaction marked with JI_WAIT_DATA flag after we
* we have written the transaction to the journal but before we
* write out the commit block.


> + int (*j_finish_inode_data_buffers)
> + (struct jbd2_inode *);
> +
> /*
> * Journal statistics
> */

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers

Hi Jan,

Thanks for reviewing.

On Tue, Aug 18, 2020 at 11:52 AM Jan Kara <[email protected]> wrote:
>
> On Sun 09-08-20 22:02:05, Mauricio Faria de Oliveira wrote:
> > Add the callbacks as opt-in to override the default behavior for
> > the transaction's inode list, instead of moving that code around.
> >
> > This is important as not only ext4 uses the inode list: ocfs2 too,
> > via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.
>
> I'd prefer if the callback is called unconditionally, jbd2 exports the
> callback that implements the current behavior and and both ext4 & ocfs2
> are adapted to use this callback. We don't care about out of tree code.
> That way things are cleaner long term...

Understood.

>
> > To opt-out of the default behavior (i.e., to do nothing), one has
> > to opt-in with a no-op function.
>
> Your Signed-off-by is missing for this patch.

Oh, I thought it wasn't needed in RFC patches.

Thanks for the suggestions below; they're more precise and descriptive.

I had a few questions in the cover letter, but in case you didn't have
the time, I'll just try harder on them; no worries.

Kind regards,
Mauricio

>
> > ---
> > fs/jbd2/commit.c | 21 ++++++++++++++++-----
> > include/linux/jbd2.h | 21 ++++++++++++++++++++-
> > 2 files changed, 36 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> > index 51f713089e35..b98d227b50d8 100644
> > --- a/fs/jbd2/commit.c
> > +++ b/fs/jbd2/commit.c
> > @@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal,
> > * instead of writepages. Because writepages can do
> > * block allocation with delalloc. We need to write
> > * only allocated blocks here.
> > + * This can be overriden with a custom callback.
> > */
> > trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> > - err = journal_submit_inode_data_buffers(mapping, dirty_start,
> > - dirty_end);
> > + if (journal->j_submit_inode_data_buffers)
> > + err = journal->j_submit_inode_data_buffers(jinode);
> > + else
> > + err = journal_submit_inode_data_buffers(mapping,
> > + dirty_start, dirty_end);
> > if (!ret)
> > ret = err;
> > spin_lock(&journal->j_list_lock);
> > @@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
> > continue;
> > jinode->i_flags |= JI_COMMIT_RUNNING;
> > spin_unlock(&journal->j_list_lock);
> > - err = filemap_fdatawait_range_keep_errors(
> > - jinode->i_vfs_inode->i_mapping, dirty_start,
> > - dirty_end);
> > + /*
> > + * Wait for the inode data buffers writeout.
> > + * This can be overriden with a custom callback.
> > + */
> > + if (journal->j_finish_inode_data_buffers)
> > + err = journal->j_finish_inode_data_buffers(jinode);
> > + else
> > + err = filemap_fdatawait_range_keep_errors(
> > + jinode->i_vfs_inode->i_mapping,
> > + dirty_start, dirty_end);
> > if (!ret)
> > ret = err;
> > spin_lock(&journal->j_list_lock);
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index d56128df2aff..24efe88eda1b 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -628,7 +628,8 @@ struct transaction_s
> > struct journal_head *t_shadow_list;
> >
> > /*
> > - * List of inodes whose data we've modified in data=ordered mode.
> > + * List of inodes whose data we've modified in data=ordered mode
> > + * or whose pages we should write-protect in data=journaled mode.
>
> I'd rather change the comment to generic "List of inodes associated with
> the transaction. E.g. ext4 uses this to track inodes in data=ordered and
> data=journal mode that need special handling on transaction commit.".
>
> > * [j_list_lock]
> > */
> > struct list_head t_inode_list;
> > @@ -1110,6 +1111,24 @@ struct journal_s
> > void (*j_commit_callback)(journal_t *,
> > transaction_t *);
> >
> > + /**
> > + * @j_submit_inode_data_buffers:
> > + *
> > + * This function is called before flushing metadata buffers.
> > + * This overrides the default behavior (writeout data buffers.)
> > + */
>
> I'd change the comment to:
> * This function is called for all inodes associated with the
> * committing transaction marked with JI_WRITE_DATA flag before we
> * start to write out the transaction to the journal.
>
> > + int (*j_submit_inode_data_buffers)
> > + (struct jbd2_inode *);
> > +
> > + /**
> > + * @j_finish_inode_data_buffers:
> > + *
> > + * This function is called after flushing metadata buffers.
> > + * This overrides the default behavior (wait writeout.)
> > + */
>
> And here:
> * This function is called for all inodes associated with the
> * committing transaction marked with JI_WAIT_DATA flag after we
> * we have written the transaction to the journal but before we
> * write out the commit block.
>
>
> > + int (*j_finish_inode_data_buffers)
> > + (struct jbd2_inode *);
> > +
> > /*
> > * Journal statistics
> > */
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR



--
Mauricio Faria de Oliveira

2020-08-19 09:17:11

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers

On Tue 18-08-20 22:20:08, Mauricio Faria de Oliveira wrote:
> > > To opt-out of the default behavior (i.e., to do nothing), one has
> > > to opt-in with a no-op function.
> >
> > Your Signed-off-by is missing for this patch.
>
> Oh, I thought it wasn't needed in RFC patches.

Yes, it's not strictly needed if you don't want patches included yet. But
usually they are present so that people have less things to worry about
when preparing final submission :)

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR