2011-05-17 10:28:45

by Jan Kara

[permalink] [raw]
Subject: [PATCH 0/2] Fix sending of a barrier and transaction waiting in ext4_sync_file()


Hi Ted,

the two patches below fix ext4_sync_file() to wait for a transaction commit
properly (return value of jbd2_log_start_commit() is not exactly what
ext4_sync_file() thinks it is). It also optimizes / fixes sending of a barrier
in some cases (e.g. in ordered mode with external journal we cannot really
depend on transaction commit to issue a barrier to fs device because we don't
seem to add inode to a transaction for overwrites).

I already fixed ext3 in this regard some time ago but didn't get to porting the
patches to ext4.

Honza


2011-05-17 10:28:48

by Jan Kara

[permalink] [raw]
Subject: [PATCH 1/2] jbd2: Provide function to check whether transaction will issue data barrier

Provide a function which returns whether a transaction with given tid
will send a barrier to the filesystem device. The function will be used
by ext4 to detect whether fsync needs to send a separate barrier or not.

Signed-off-by: Jan Kara <[email protected]>
---
fs/jbd2/commit.c | 6 +++++-
fs/jbd2/journal.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 3 ++-
3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6e28000..e7812ed 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -800,6 +800,10 @@ wait_for_iobuf:
jbd2_journal_abort(journal, err);

jbd_debug(3, "JBD: commit phase 5\n");
+ write_lock(&journal->j_state_lock);
+ J_ASSERT(commit_transaction->t_state == T_COMMIT);
+ commit_transaction->t_state = T_COMMIT_RECORD;
+ write_unlock(&journal->j_state_lock);

if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
@@ -955,7 +959,7 @@ restart_loop:

jbd_debug(3, "JBD: commit phase 7\n");

- J_ASSERT(commit_transaction->t_state == T_COMMIT);
+ J_ASSERT(commit_transaction->t_state == T_COMMIT_RECORD);

commit_transaction->t_start = jiffies;
stats.run.rs_logging = jbd2_time_diff(stats.run.rs_logging,
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e0ec3db..dbafbc5 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -577,6 +577,43 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid)
}

/*
+ * Return 1 if a given transaction has not yet sent barrier request
+ * connected with a transaction commit. If 0 is returned, transaction
+ * may or may not have sent the barrier. Used to avoid sending barrier
+ * twice in common cases.
+ */
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
+{
+ int ret = 0;
+ transaction_t *commit_trans;
+
+ /*
+ * For external journal sending data barrier is more complex so just
+ * admit we don't know...
+ */
+ if (!(journal->j_flags & JBD2_BARRIER) ||
+ journal->j_fs_dev != journal->j_dev)
+ return 0;
+ read_lock(&journal->j_state_lock);
+ /* Transaction already committed? */
+ if (tid_geq(journal->j_commit_sequence, tid))
+ goto out;
+ /*
+ * Transaction is being committed and we already proceeded to
+ * submitting barrier?
+ */
+ commit_trans = journal->j_committing_transaction;
+ if (commit_trans && commit_trans->t_tid == tid &&
+ commit_trans->t_state >= T_COMMIT_RECORD)
+ goto out;
+ ret = 1;
+out:
+ read_unlock(&journal->j_state_lock);
+ return ret;
+}
+EXPORT_SYMBOL(jbd2_trans_will_send_data_barrier);
+
+/*
* Wait for a specified commit to complete.
* The caller may not hold the journal lock.
*/
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index a32dcae..fea6d6f 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -529,9 +529,9 @@ struct transaction_s
enum {
T_RUNNING,
T_LOCKED,
- T_RUNDOWN,
T_FLUSH,
T_COMMIT,
+ T_COMMIT_RECORD,
T_FINISHED
} t_state;

@@ -1228,6 +1228,7 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
int jbd2_journal_force_commit_nested(journal_t *journal);
int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
int jbd2_log_do_checkpoint(journal_t *journal);
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);

void __jbd2_log_wait_for_space(journal_t *journal);
extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
--
1.7.1


2011-05-17 10:28:50

by Jan Kara

[permalink] [raw]
Subject: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

jbd2_log_start_commit() returns 1 only when we really start a transaction. But
we also need to wait for a transaction when the commit is already running. Fix
this problem by waiting for transaction commit unconditionally (which is just a
quick check if the transaction is already committed).

Also we have to be more careful with sending of a barrier because when
transaction is being committed in parallel to ext4_sync_file() running, we
cannot be sure that the barrier the journalling code sends happens after we
wrote all the data for fsync (note that not every data writeout needs to
trigger metadata changes thus commit of some metadata changes can be running
while other data is still written out). So use jbd2_will_send_data_barrier()
helper to detect the common cases when we can be sure barrier will be issued by
the commit code and issue the barrier ourselves in the remaining cases. This
code change will cause ext4_sync_file() to unnecessarily issue a barrier in
data=ordered and data=journal mode with external journal but solving this would
require ext4 to mess with jbd2 internals since jbd2 does not know about ext4
data journalling modes.

Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/fsync.c | 23 +++++++----------------
1 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e9473cb..1c44f1a 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync)
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
int ret;
tid_t commit_tid;
+ bool needs_barrier = false;

J_ASSERT(ext4_journal_current_handle() == NULL);

@@ -211,22 +212,12 @@ int ext4_sync_file(struct file *file, int datasync)
}

commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
- if (jbd2_log_start_commit(journal, commit_tid)) {
- /*
- * When the journal is on a different device than the
- * fs data disk, we need to issue the barrier in
- * writeback mode. (In ordered mode, the jbd2 layer
- * will take care of issuing the barrier. In
- * data=journal, all of the data blocks are written to
- * the journal device.)
- */
- if (ext4_should_writeback_data(inode) &&
- (journal->j_fs_dev != journal->j_dev) &&
- (journal->j_flags & JBD2_BARRIER))
- blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
- NULL);
- ret = jbd2_log_wait_commit(journal, commit_tid);
- } else if (journal->j_flags & JBD2_BARRIER)
+ if (journal->j_flags & JBD2_BARRIER &&
+ !jbd2_trans_will_send_data_barrier(journal, commit_tid))
+ needs_barrier = true;
+ jbd2_log_start_commit(journal, commit_tid);
+ ret = jbd2_log_wait_commit(journal, commit_tid);
+ if (needs_barrier)
blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
out:
trace_ext4_sync_file_exit(inode, ret);
--
1.7.1


2011-05-18 17:43:28

by djwong

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix sending of a barrier and transaction waiting in ext4_sync_file()

On Tue, May 17, 2011 at 12:28:13PM +0200, Jan Kara wrote:
>
> Hi Ted,
>
> the two patches below fix ext4_sync_file() to wait for a transaction commit
> properly (return value of jbd2_log_start_commit() is not exactly what
> ext4_sync_file() thinks it is). It also optimizes / fixes sending of a barrier
> in some cases (e.g. in ordered mode with external journal we cannot really
> depend on transaction commit to issue a barrier to fs device because we don't
> seem to add inode to a transaction for overwrites).

Probably a silly nit to pick, but do you mean "flush" instead of "barrier"?

--D
>
> I already fixed ext3 in this regard some time ago but didn't get to porting the
> patches to ext4.
>
> Honza
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-05-19 10:52:12

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix sending of a barrier and transaction waiting in ext4_sync_file()

On Wed 18-05-11 10:43:27, Darrick J. Wong wrote:
> On Tue, May 17, 2011 at 12:28:13PM +0200, Jan Kara wrote:
> >
> > Hi Ted,
> >
> > the two patches below fix ext4_sync_file() to wait for a transaction commit
> > properly (return value of jbd2_log_start_commit() is not exactly what
> > ext4_sync_file() thinks it is). It also optimizes / fixes sending of a barrier
> > in some cases (e.g. in ordered mode with external journal we cannot really
> > depend on transaction commit to issue a barrier to fs device because we don't
> > seem to add inode to a transaction for overwrites).
>
> Probably a silly nit to pick, but do you mean "flush" instead of "barrier"?
Oh, right. I mean flush :). Thanks.

Honza

> > I already fixed ext3 in this regard some time ago but didn't get to porting the
> > patches to ext4.
> >
> > Honza
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-05-22 21:12:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

On Tue, May 17, 2011 at 12:28:15PM +0200, Jan Kara wrote:
> This code change will cause ext4_sync_file() to unnecessarily issue
> a barrier in data=ordered and data=journal mode with external
> journal but solving this would require ext4 to mess with jbd2
> internals since jbd2 does not know about ext4 data journalling
> modes.

I really don't like this. I think the following two patches should
address this. Could you take a look please?

-- Ted

2011-05-22 21:13:47

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/2] jbd2: Add function jbd2_trans_will_send_data_barrier()

From: Jan Kara <[email protected]>

Provide a function which returns whether a transaction with given tid
will send a barrier to the filesystem device. The function will be
used by ext4 to detect whether fsync needs to send a separate barrier
or not.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/commit.c | 6 +++++-
fs/jbd2/journal.c | 32 ++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 3 ++-
3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 78c2992..861262a 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -805,6 +805,10 @@ wait_for_iobuf:
jbd2_journal_abort(journal, err);

jbd_debug(3, "JBD: commit phase 5\n");
+ write_lock(&journal->j_state_lock);
+ J_ASSERT(commit_transaction->t_state == T_COMMIT);
+ commit_transaction->t_state = T_COMMIT_RECORD;
+ write_unlock(&journal->j_state_lock);

if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
@@ -960,7 +964,7 @@ restart_loop:

jbd_debug(3, "JBD: commit phase 7\n");

- J_ASSERT(commit_transaction->t_state == T_COMMIT);
+ J_ASSERT(commit_transaction->t_state == T_COMMIT_RECORD);

commit_transaction->t_start = jiffies;
stats.run.rs_logging = jbd2_time_diff(stats.run.rs_logging,
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index cd2d341..287ba3f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -588,6 +588,38 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid)
}

/*
+ * Return 1 if a given transaction has not yet sent a barrier request
+ * connected with a transaction commit. If 0 is returned, transaction
+ * may or may not have sent the barrier. Used to avoid sending barrier
+ * twice in common cases.
+ */
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
+{
+ int ret = 0;
+ transaction_t *commit_trans;
+
+ if (!(journal->j_flags & JBD2_BARRIER))
+ return 0;
+ read_lock(&journal->j_state_lock);
+ /* Transaction already committed? */
+ if (tid_geq(journal->j_commit_sequence, tid))
+ goto out;
+ /*
+ * Transaction is being committed and we already proceeded to
+ * submitting barrier?
+ */
+ commit_trans = journal->j_committing_transaction;
+ if (commit_trans && commit_trans->t_tid == tid &&
+ commit_trans->t_state >= T_COMMIT_RECORD)
+ goto out;
+ ret = 1;
+out:
+ read_unlock(&journal->j_state_lock);
+ return ret;
+}
+EXPORT_SYMBOL(jbd2_trans_will_send_data_barrier);
+
+/*
* Wait for a specified commit to complete.
* The caller may not hold the journal lock.
*/
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index a32dcae..fea6d6f 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -529,9 +529,9 @@ struct transaction_s
enum {
T_RUNNING,
T_LOCKED,
- T_RUNDOWN,
T_FLUSH,
T_COMMIT,
+ T_COMMIT_RECORD,
T_FINISHED
} t_state;

@@ -1228,6 +1228,7 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
int jbd2_journal_force_commit_nested(journal_t *journal);
int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
int jbd2_log_do_checkpoint(journal_t *journal);
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);

void __jbd2_log_wait_for_space(journal_t *journal);
extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
--
1.7.3.1


2011-05-22 21:13:47

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

From: Jan Kara <[email protected]>

jbd2_log_start_commit() returns 1 only when we really start a
transaction. But we also need to wait for a transaction when the
commit is already running. Fix this problem by waiting for
transaction commit unconditionally (which is just a quick check if the
transaction is already committed).

Also we have to be more careful with sending of a barrier because when
transaction is being committed in parallel to ext4_sync_file()
running, we cannot be sure that the barrier the journalling code sends
happens after we wrote all the data for fsync (note that not every
data writeout needs to trigger metadata changes thus commit of some
metadata changes can be running while other data is still written
out). So use jbd2_will_send_data_barrier() helper to detect the common
cases when we can be sure barrier will be issued by the commit code
and issue the barrier ourselves in the remaining cases.

[ Modified by tytso so that the external journal cases are handled in
ext4_sync_file() to avoid needlessly issuing extra flush requests in
the data=ordered and data=journalled cases. ]

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/fsync.c | 31 ++++++++++++++++++++-----------
1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 60fe572..b0e03fa 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync)
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
int ret;
tid_t commit_tid;
+ bool needs_barrier = false;

J_ASSERT(ext4_journal_current_handle() == NULL);

@@ -211,22 +212,30 @@ int ext4_sync_file(struct file *file, int datasync)
}

commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
- if (jbd2_log_start_commit(journal, commit_tid)) {
+ if (journal->j_flags & JBD2_BARRIER) {
/*
* When the journal is on a different device than the
- * fs data disk, we need to issue the barrier in
- * writeback mode. (In ordered mode, the jbd2 layer
- * will take care of issuing the barrier. In
+ * fs data disk, when data=writeback, we need to issue
+ * a barrier unconditionally. (In ordered mode, the
+ * jbd2 layer will take care of issuing the barrier if
+ * there were any writes associated with the inode; in
* data=journal, all of the data blocks are written to
* the journal device.)
*/
- if (ext4_should_writeback_data(inode) &&
- (journal->j_fs_dev != journal->j_dev) &&
- (journal->j_flags & JBD2_BARRIER))
- blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
- NULL);
- ret = jbd2_log_wait_commit(journal, commit_tid);
- } else if (journal->j_flags & JBD2_BARRIER)
+ if ((journal->j_fs_dev != journal->j_dev) &&
+ ext4_should_writeback_data(inode))
+ needs_barrier = true;
+ else if (!jbd2_trans_will_send_data_barrier(journal,
+ commit_tid))
+ /*
+ * If the journal layer isn't going to issue
+ * the barrier, then we'd better.
+ */
+ needs_barrier = true;
+ }
+ jbd2_log_start_commit(journal, commit_tid);
+ ret = jbd2_log_wait_commit(journal, commit_tid);
+ if (needs_barrier)
blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
out:
trace_ext4_sync_file_exit(inode, ret);
--
1.7.3.1


2011-05-23 17:17:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

On Sun 22-05-11 17:13:45, Ted Tso wrote:
> From: Jan Kara <[email protected]>
>
> jbd2_log_start_commit() returns 1 only when we really start a
> transaction. But we also need to wait for a transaction when the
> commit is already running. Fix this problem by waiting for
> transaction commit unconditionally (which is just a quick check if the
> transaction is already committed).
>
> Also we have to be more careful with sending of a barrier because when
> transaction is being committed in parallel to ext4_sync_file()
> running, we cannot be sure that the barrier the journalling code sends
> happens after we wrote all the data for fsync (note that not every
> data writeout needs to trigger metadata changes thus commit of some
> metadata changes can be running while other data is still written
> out). So use jbd2_will_send_data_barrier() helper to detect the common
> cases when we can be sure barrier will be issued by the commit code
> and issue the barrier ourselves in the remaining cases.
>
> [ Modified by tytso so that the external journal cases are handled in
> ext4_sync_file() to avoid needlessly issuing extra flush requests in
> the data=ordered and data=journalled cases. ]
Well, in data=journal case I agree your change will work (but that's your
minor concern I guess). In data=ordered it's harder:
a) The flush of j_fs_dev in jbd2_journal_commit_transaction() is issued
earlier than we set T_COMMIT_RECORD but that's easy to handle.
b) Whether we do or don't send the flush in
jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks
is set. We can't know in advance whether it gets set or not because it
depends on whether some inode is in transaction's t_inode_list and inodes
can get removed from there when flusher thread has written all the pages
and inode has been reclaimed. OTOH this looks like a bug in the commit code
anyway - I guess t_flushed_data_blocks (or better named equivalent) should
be set in jbd2_journal_file_inode(). Then such variable will also become
a reliable indicator whether the data flush is going to be sent or not.

I'll update the patch set to reflect this...

Honza
> Signed-off-by: Jan Kara <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/fsync.c | 31 ++++++++++++++++++++-----------
> 1 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 60fe572..b0e03fa 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync)
> journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> int ret;
> tid_t commit_tid;
> + bool needs_barrier = false;
>
> J_ASSERT(ext4_journal_current_handle() == NULL);
>
> @@ -211,22 +212,30 @@ int ext4_sync_file(struct file *file, int datasync)
> }
>
> commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
> - if (jbd2_log_start_commit(journal, commit_tid)) {
> + if (journal->j_flags & JBD2_BARRIER) {
> /*
> * When the journal is on a different device than the
> - * fs data disk, we need to issue the barrier in
> - * writeback mode. (In ordered mode, the jbd2 layer
> - * will take care of issuing the barrier. In
> + * fs data disk, when data=writeback, we need to issue
> + * a barrier unconditionally. (In ordered mode, the
> + * jbd2 layer will take care of issuing the barrier if
> + * there were any writes associated with the inode; in
> * data=journal, all of the data blocks are written to
> * the journal device.)
> */
> - if (ext4_should_writeback_data(inode) &&
> - (journal->j_fs_dev != journal->j_dev) &&
> - (journal->j_flags & JBD2_BARRIER))
> - blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
> - NULL);
> - ret = jbd2_log_wait_commit(journal, commit_tid);
> - } else if (journal->j_flags & JBD2_BARRIER)
> + if ((journal->j_fs_dev != journal->j_dev) &&
> + ext4_should_writeback_data(inode))
> + needs_barrier = true;
> + else if (!jbd2_trans_will_send_data_barrier(journal,
> + commit_tid))
> + /*
> + * If the journal layer isn't going to issue
> + * the barrier, then we'd better.
> + */
> + needs_barrier = true;
> + }
> + jbd2_log_start_commit(journal, commit_tid);
> + ret = jbd2_log_wait_commit(journal, commit_tid);
> + if (needs_barrier)
> blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
> out:
> trace_ext4_sync_file_exit(inode, ret);
> --
> 1.7.3.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-23 19:28:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

On Mon, May 23, 2011 at 07:17:47PM +0200, Jan Kara wrote:
> b) Whether we do or don't send the flush in
> jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks
> is set. We can't know in advance whether it gets set or not because it
> depends on whether some inode is in transaction's t_inode_list and inodes
> can get removed from there when flusher thread has written all the pages
> and inode has been reclaimed. OTOH this looks like a bug in the commit code
> anyway - I guess t_flushed_data_blocks (or better named equivalent) should
> be set in jbd2_journal_file_inode(). Then such variable will also become
> a reliable indicator whether the data flush is going to be sent or not.

Um, I guess I don't see where an inode gets removed from t_inode_list
after the writeback daemon is done with an inode?

- Ted

2011-05-23 20:25:22

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: Fix waiting and sending of a barrier in ext4_sync_file()

On Mon 23-05-11 15:28:34, Ted Tso wrote:
> On Mon, May 23, 2011 at 07:17:47PM +0200, Jan Kara wrote:
> > b) Whether we do or don't send the flush in
> > jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks
> > is set. We can't know in advance whether it gets set or not because it
> > depends on whether some inode is in transaction's t_inode_list and inodes
> > can get removed from there when flusher thread has written all the pages
> > and inode has been reclaimed. OTOH this looks like a bug in the commit code
> > anyway - I guess t_flushed_data_blocks (or better named equivalent) should
> > be set in jbd2_journal_file_inode(). Then such variable will also become
> > a reliable indicator whether the data flush is going to be sent or not.
>
> Um, I guess I don't see where an inode gets removed from t_inode_list
> after the writeback daemon is done with an inode?
ext4_evict_inode()->ext4_clear_inode()->jbd2_journal_release_jbd_inode()
removes the inode from transaction's list. Note that nothing prevents inode
which is still part of the running transaction to be cleaned by a flusher
thread and thus inode shrinker can reap it...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR