2022-07-21 06:03:52

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 0/8] Ext4 fast commit performance patch series

This is the V4 of the patch series. This patch series supersedes the
patch "ext4: remove journal barrier during fast commit" sent in Feb
2022.

The main difference from V3 is that this patch series makes use extent
status tree in the commit path. While this results in no regressions
in the "quick" group, there is a failure in "log" group (generic/311)
when fast commit is turned on. I am in the middle of getting to the
root of it, but my experiments so far suggest that there is an
inconsistency in ext4_map_blocks()'s behavior when
"EXT4_GET_BLOCKS_CACHED_NOWAIT" flag is passed even if shrinking es
tree is disabled. Due to that failure, I have converted this patch
series to RFC. I will update the series once I fix the inconsistency
issue. But, I wanted to get the current version out any way to get
feedback on the direction this patch series going and also to
potentially catch issues that I might have missed.

Original Cover Letter
---------------------

This patch series reworks the fast commit's commit path to improve
overall performance of the commit path. Following optimizations have
been added in this series:

* Avoid having to lock the journal throughout the fast commit.
* Remove tracking of open handles per inode.

With the changes introduced in this patch series, now the commit path
for fast commits is as follows:

[1] Lock the journal by calling jbd2_journal_lock_updates. This
ensures that all the exsiting handles finish and no new handles
can start.
[2] Mark all the fast commit eligible inodes as undergoing fast commit
by setting "EXT4_STATE_FC_COMMITTING" state.
[3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
starting of new handles. If new handles try to start an update on
any of the inodes that are being committed, ext4_fc_track_inode()
will block until those inodes have finished the fast commit.
[4] Submit data buffers of all the committing inodes.
[5] Wait for [4] to complete.
[6] Commit all the directory entry updates in the fast commit space.
[7] Commit all the changed inodes in the fast commit space and clear
"EXT4_STATE_FC_COMMITTING" for all the inodes.
[8] Write tail tag to ensure atomicity of commits.

(The above flow has been documented in the code as well)

I verified that the patch series introduces no regressions in "quick"
and "log" groups when "fast_commit" feature is enabled.

Harshad Shirwadkar (8):
ext4: convert i_fc_lock to spinlock
ext4: for committing inode, make ext4_fc_track_inode wait
ext4: use extent status tree in fast commit path
ext4: rework fast commit commit path
ext4: ext4_fc_track_inode() bug fix
ext4: drop i_fc_updates from inode fc info
ext4: add lockdep annotation in fc commit path
ext4: update code documentation

fs/ext4/ext4.h | 21 ++--
fs/ext4/extents_status.c | 3 +-
fs/ext4/fast_commit.c | 248 ++++++++++++++++++++-------------------
fs/ext4/inline.c | 3 +
fs/ext4/inode.c | 5 +-
fs/ext4/super.c | 2 +-
fs/jbd2/journal.c | 2 -
7 files changed, 149 insertions(+), 135 deletions(-)

--
2.37.0.170.g444d1eabd0-goog


2022-07-21 06:05:07

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 8/8] ext4: update code documentation

This patch updates code documentation to reflect the commit path changes
made in this series.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/fast_commit.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 727a87073e6a..7a7bb540904f 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -49,14 +49,21 @@
* that need to be committed during a fast commit in another in memory queue of
* inodes. During the commit operation, we commit in the following order:
*
- * [1] Lock inodes for any further data updates by setting COMMITTING state
- * [2] Submit data buffers of all the inodes
- * [3] Wait for [2] to complete
- * [4] Commit all the directory entry updates in the fast commit space
- * [5] Commit all the changed inode structures
- * [6] Write tail tag (this tag ensures the atomicity, please read the following
+ * [1] Lock the journal by calling jbd2_journal_lock_updates. This ensures that
+ * all the exsiting handles finish and no new handles can start.
+ * [2] Mark all the fast commit eligible inodes as undergoing fast commit
+ * by setting "EXT4_STATE_FC_COMMITTING" state.
+ * [3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
+ * starting of new handles. If new handles try to start an update on
+ * any of the inodes that are being committed, ext4_fc_track_inode()
+ * will block until those inodes have finished the fast commit.
+ * [4] Submit data buffers of all the committing inodes.
+ * [5] Wait for [4] to complete.
+ * [6] Commit all the directory entry updates in the fast commit space.
+ * [7] Commit all the changed inodes in the fast commit space and clear
+ * "EXT4_STATE_FC_COMMITTING" for these inodes.
+ * [8] Write tail tag (this tag ensures the atomicity, please read the following
* section for more details).
- * [7] Wait for [4], [5] and [6] to complete.
*
* All the inode updates must call ext4_fc_start_update() before starting an
* update. If such an ongoing update is present, fast commit waits for it to
@@ -142,6 +149,13 @@
* similarly. Thus, by converting a non-idempotent procedure into a series of
* idempotent outcomes, fast commits ensured idempotence during the replay.
*
+ * Locking
+ * -------
+ * sbi->s_fc_lock protects the fast commit inodes queue and the fast commit
+ * dentry queue. ei->i_fc_lock protects the fast commit related info in a given
+ * inode. Most of the code avoids acquiring both the locks, but if one must do
+ * that then sbi->s_fc_lock must be acquired before ei->i_fc_lock.
+ *
* TODOs
* -----
*
@@ -156,13 +170,12 @@
* fast commit recovery even if that area is invalidated by later full
* commits.
*
- * 1) Fast commit's commit path locks the entire file system during fast
- * commit. This has significant performance penalty. Instead of that, we
- * should use ext4_fc_start/stop_update functions to start inode level
- * updates from ext4_journal_start/stop. Once we do that we can drop file
- * system locking during commit path.
+ * 1) Handle more ineligible cases.
*
- * 2) Handle more ineligible cases.
+ * 2) Change ext4_fc_commit() to lookup logical to physical mapping using extent
+ * status tree. This would get rid of the need to call ext4_fc_track_inode()
+ * before acquiring i_data_sem. To do that we would need to ensure that
+ * modified extents from the extent status tree are not evicted from memory.
*/

#include <trace/events/ext4.h>
--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:10

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 6/8] ext4: drop i_fc_updates from inode fc info

The new logic introduced in this series does not require tracking number
of active handles open on an inode. So, drop it.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 5 ----
fs/ext4/fast_commit.c | 70 -------------------------------------------
2 files changed, 75 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 12c8691f08d3..51ed372faff0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1061,9 +1061,6 @@ struct ext4_inode_info {
/* End of lblk range that needs to be committed in this fast commit */
ext4_lblk_t i_fc_lblk_len;

- /* Number of ongoing updates on this inode */
- atomic_t i_fc_updates;
-
/* Fast commit wait queue for this inode */
wait_queue_head_t i_fc_wait;

@@ -2882,8 +2879,6 @@ void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
void ext4_fc_track_create(handle_t *handle, struct dentry *dentry);
void ext4_fc_track_inode(handle_t *handle, struct inode *inode);
void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handle);
-void ext4_fc_start_update(struct inode *inode);
-void ext4_fc_stop_update(struct inode *inode);
void ext4_fc_del(struct inode *inode);
bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
void ext4_fc_replay_cleanup(struct super_block *sb);
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 0307e21e5b29..a9cac460de26 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -201,76 +201,6 @@ void ext4_fc_init_inode(struct inode *inode)
INIT_LIST_HEAD(&ei->i_fc_list);
INIT_LIST_HEAD(&ei->i_fc_dilist);
init_waitqueue_head(&ei->i_fc_wait);
- atomic_set(&ei->i_fc_updates, 0);
-}
-
-/* This function must be called with sbi->s_fc_lock held. */
-static void ext4_fc_wait_committing_inode(struct inode *inode)
-__releases(&EXT4_SB(inode->i_sb)->s_fc_lock)
-{
- wait_queue_head_t *wq;
- struct ext4_inode_info *ei = EXT4_I(inode);
-
-#if (BITS_PER_LONG < 64)
- DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
-#else
- DEFINE_WAIT_BIT(wait, &ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
-#endif
- lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock);
- prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- schedule();
- finish_wait(wq, &wait.wq_entry);
-}
-
-/*
- * Inform Ext4's fast about start of an inode update
- *
- * This function is called by the high level call VFS callbacks before
- * performing any inode update. This function blocks if there's an ongoing
- * fast commit on the inode in question.
- */
-void ext4_fc_start_update(struct inode *inode)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
-
- if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
- (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
- return;
-
-restart:
- spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- if (list_empty(&ei->i_fc_list))
- goto out;
-
- if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- ext4_fc_wait_committing_inode(inode);
- goto restart;
- }
-out:
- atomic_inc(&ei->i_fc_updates);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
-}
-
-/*
- * Stop inode update and wake up waiting fast commits if any.
- */
-void ext4_fc_stop_update(struct inode *inode)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
-
- if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
- (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
- return;
-
- if (atomic_dec_and_test(&ei->i_fc_updates))
- wake_up_all(&ei->i_fc_wait);
}

/*
--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:10

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 3/8] ext4: use extent status tree in fast commit path

This patch moves fc commit path to use extent status tree to lookup
logical to physical mappings. In order to preserve the uncommitted
entries in the es cache, this patch makes all the inodes on fast
commit list as shrinker ineligible. Making the uncommitted entries in
es cache to stick around is left as a future enhancement.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/extents_status.c | 3 ++-
fs/ext4/fast_commit.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 9a3a8996aacf..07fb86746534 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -1500,7 +1500,8 @@ static int __es_shrink(struct ext4_sb_info *sbi, int nr_to_scan,
continue;
}

- if (ei == locked_ei || !write_trylock(&ei->i_es_lock)) {
+ if (!list_empty(&ei->i_fc_list) || ei == locked_ei ||
+ !write_trylock(&ei->i_es_lock)) {
nr_skipped++;
continue;
}
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 4d2384adcbb0..916f62cfa7f7 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -951,7 +951,8 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
while (cur_lblk_off <= new_blk_size) {
map.m_lblk = cur_lblk_off;
map.m_len = new_blk_size - cur_lblk_off + 1;
- ret = ext4_map_blocks(NULL, inode, &map, 0);
+ ret = ext4_map_blocks(NULL, inode, &map,
+ EXT4_GET_BLOCKS_CACHED_NOWAIT);
if (ret < 0)
return -ECANCELED;

--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:10

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 4/8] ext4: rework fast commit commit path

This patch reworks fast commit's commit path to remove locking the
journal for the entire duration of a fast commit. Instead, we only lock
the journal while marking all the eligible inodes as "committing". This
allows handles to make progress in parallel with the fast commit.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/fast_commit.c | 73 ++++++++++++++++++++++++++++---------------
fs/jbd2/journal.c | 2 --
2 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 916f62cfa7f7..608ae16afcd6 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -287,20 +287,30 @@ void ext4_fc_del(struct inode *inode)
(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
return;

-restart:
spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
return;
}

- if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- ext4_fc_wait_committing_inode(inode);
- goto restart;
- }
-
- if (!list_empty(&ei->i_fc_list))
- list_del_init(&ei->i_fc_list);
+ /*
+ * Since ext4_fc_del is called from ext4_evict_inode while having a
+ * handle open, there is no need for us to wait here even if a fast
+ * commit is going on. That is because, if this inode is being
+ * committed, ext4_mark_inode_dirty would have waited for inode commit
+ * operation to finish before we come here. So, by the time we come
+ * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
+ * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
+ * here.
+ *
+ * We may come here without any handles open in the "no_delete" case of
+ * ext4_evict_inode as well. However, if that happens, we first mark the
+ * file system as fast commit ineligible anyway. So, even in that case,
+ * it is okay to remove the inode from the fc list.
+ */
+ WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
+ && !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
+ list_del_init(&ei->i_fc_list);

/*
* Since this inode is getting removed, let's also remove all FC
@@ -323,8 +333,6 @@ void ext4_fc_del(struct inode *inode)
fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
kfree(fc_dentry->fcd_name.name);
kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
-
- return;
}

/*
@@ -1006,19 +1014,6 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)

spin_lock(&sbi->s_fc_lock);
list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
- ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING);
- while (atomic_read(&ei->i_fc_updates)) {
- DEFINE_WAIT(wait);
-
- prepare_to_wait(&ei->i_fc_wait, &wait,
- TASK_UNINTERRUPTIBLE);
- if (atomic_read(&ei->i_fc_updates)) {
- spin_unlock(&sbi->s_fc_lock);
- schedule();
- spin_lock(&sbi->s_fc_lock);
- }
- finish_wait(&ei->i_fc_wait, &wait);
- }
spin_unlock(&sbi->s_fc_lock);
ret = jbd2_submit_inode_data(ei->jinode);
if (ret)
@@ -1131,6 +1126,16 @@ static int ext4_fc_perform_commit(journal_t *journal)
int ret = 0;
u32 crc = 0;

+ /* Lock the journal */
+ jbd2_journal_lock_updates(journal);
+ spin_lock(&sbi->s_fc_lock);
+ list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
+ ext4_set_inode_state(&iter->vfs_inode,
+ EXT4_STATE_FC_COMMITTING);
+ }
+ spin_unlock(&sbi->s_fc_lock);
+ jbd2_journal_unlock_updates(journal);
+
ret = ext4_fc_submit_inode_data_all(journal);
if (ret)
return ret;
@@ -1183,6 +1188,20 @@ static int ext4_fc_perform_commit(journal_t *journal)
goto out;
spin_lock(&sbi->s_fc_lock);
}
+ list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
+ ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
+ /*
+ * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+ * visible before we send the wakeup. Pairs with implicit
+ * barrier in prepare_to_wait() in ext4_fc_track_inode().
+ */
+ smp_mb();
+#if (BITS_PER_LONG < 64)
+ wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
+#else
+ wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
+#endif
+ }
spin_unlock(&sbi->s_fc_lock);

ret = ext4_fc_write_tail(sb, crc);
@@ -1318,13 +1337,17 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
spin_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
i_fc_list) {
- list_del_init(&iter->i_fc_list);
ext4_clear_inode_state(&iter->vfs_inode,
EXT4_STATE_FC_COMMITTING);
if (iter->i_sync_tid <= tid)
ext4_fc_reset_inode(&iter->vfs_inode);
- /* Make sure EXT4_STATE_FC_COMMITTING bit is clear */
+ /*
+ * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+ * visible before we send the wakeup. Pairs with implicit
+ * barrier in prepare_to_wait() in ext4_fc_track_inode().
+ */
smp_mb();
+ list_del_init(&iter->i_fc_list);
#if (BITS_PER_LONG < 64)
wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
#else
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c0cbeeaec2d1..64b56bf02c52 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -757,7 +757,6 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
}
journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
write_unlock(&journal->j_state_lock);
- jbd2_journal_lock_updates(journal);

return 0;
}
@@ -769,7 +768,6 @@ EXPORT_SYMBOL(jbd2_fc_begin_commit);
*/
static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
{
- jbd2_journal_unlock_updates(journal);
if (journal->j_fc_cleanup_callback)
journal->j_fc_cleanup_callback(journal, 0, tid);
write_lock(&journal->j_state_lock);
--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:10

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 5/8] ext4: ext4_fc_track_inode() bug fix

This patch fixes a bug in ext4_fc_track_inode() where we should not
immediately return from ext4_fc_track_inode() if the inode is on fast
commit list. It is possible that the inode is on fast commit list and
then somebody calls ext4_fc_track_inode(). If we immediately return if
the inode is on fc list, we will let the caller modify the inode while
it is being committed.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 608ae16afcd6..0307e21e5b29 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -611,8 +611,7 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)

if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) ||
- ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE) ||
- !list_empty(&ei->i_fc_list))
+ ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
return;

while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:10

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 7/8] ext4: add lockdep annotation in fc commit path

This patch adds lockdep annotation in fast commit commit path.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/ext4.h | 9 +++++++++
fs/ext4/fast_commit.c | 12 ++++++++++--
2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 51ed372faff0..eabf0c9a3765 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1070,6 +1070,15 @@ struct ext4_inode_info {
*/
spinlock_t i_fc_lock;

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Lockdep entity to track fast commit waiting dependencies. Any caller
+ * who sets / resets EXT4_STATE_FC_COMMITTING flag should let lockdep
+ * know that they are doing so by locking / unlocking fc_commit_map.
+ */
+ struct lockdep_map i_fc_commit_map;
+ struct lock_class_key i_fc_commit_key;
+#endif
/*
* i_disksize keeps track of what the inode size is ON DISK, not
* in memory. During truncate, i_size is set to the new size by
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index a9cac460de26..727a87073e6a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -201,6 +201,8 @@ void ext4_fc_init_inode(struct inode *inode)
INIT_LIST_HEAD(&ei->i_fc_list);
INIT_LIST_HEAD(&ei->i_fc_dilist);
init_waitqueue_head(&ei->i_fc_wait);
+ lockdep_init_map(&ei->i_fc_commit_map, "i_fc_commit",
+ &ei->i_fc_commit_key, 0);
}

/*
@@ -1061,6 +1063,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
ext4_set_inode_state(&iter->vfs_inode,
EXT4_STATE_FC_COMMITTING);
+ mutex_acquire(&iter->i_fc_commit_map, 0, 0, _THIS_IP_);
}
spin_unlock(&sbi->s_fc_lock);
jbd2_journal_unlock_updates(journal);
@@ -1119,6 +1122,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
}
list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
+ mutex_release(&iter->i_fc_commit_map, _THIS_IP_);
/*
* Make sure clearing of EXT4_STATE_FC_COMMITTING is
* visible before we send the wakeup. Pairs with implicit
@@ -1266,8 +1270,12 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
spin_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
i_fc_list) {
- ext4_clear_inode_state(&iter->vfs_inode,
- EXT4_STATE_FC_COMMITTING);
+ if (ext4_test_inode_state(&iter->vfs_inode,
+ EXT4_STATE_FC_COMMITTING)) {
+ ext4_clear_inode_state(&iter->vfs_inode,
+ EXT4_STATE_FC_COMMITTING);
+ mutex_release(&iter->i_fc_commit_map, _THIS_IP_);
+ }
if (iter->i_sync_tid <= tid)
ext4_fc_reset_inode(&iter->vfs_inode);
/*
--
2.37.0.170.g444d1eabd0-goog

2022-07-21 06:05:26

by harshad shirwadkar

[permalink] [raw]
Subject: [RFC PATCH v4 2/8] ext4: for committing inode, make ext4_fc_track_inode wait

If the inode that's being requested to track using ext4_fc_track_inode
is being committed, then wait until the inode finishes the
commit. Also, add calls to ext4_fc_track_inode at the right places.

With this patch, now calling ext4_reserve_inode_write() results in
inode being tracked for next fast commit. A subtle lock ordering
requirement with i_data_sem (which is documented in the code) requires
that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
this patch also adds explicit ext4_fc_track_inode() calls in places
where i_data_sem grabbed.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 30 ++++++++++++++++++++++++++++++
fs/ext4/inline.c | 3 +++
fs/ext4/inode.c | 5 ++++-
3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 8d6d5155a646..4d2384adcbb0 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -574,8 +574,14 @@ static int __track_inode(struct inode *inode, void *arg, bool update)
return 0;
}

+/*
+ * Track inode as part of the next fast commit. If the inode is being
+ * committed, this function will wait for the commit to finish.
+ */
void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ wait_queue_head_t *wq;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret;

@@ -595,6 +601,30 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
return;

+ if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
+ (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) ||
+ ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE) ||
+ !list_empty(&ei->i_fc_list))
+ return;
+
+ while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+#if (BITS_PER_LONG < 64)
+ DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+#else
+ DEFINE_WAIT_BIT(wait, &ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+#endif
+ prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
+ if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
+ schedule();
+ finish_wait(wq, &wait.wq_entry);
+ }
+
ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
trace_ext4_fc_track_inode(handle, inode, ret);
}
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index cff52ff6549d..7f5edfb34095 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -586,6 +586,7 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
goto out;
}

+ ext4_fc_track_inode(handle, inode);
ret = ext4_destroy_inline_data_nolock(handle, inode);
if (ret)
goto out;
@@ -686,6 +687,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
goto convert;
}

+ ext4_fc_track_inode(handle, inode);
ret = ext4_journal_get_write_access(handle, inode->i_sb, iloc.bh,
EXT4_JTR_NONE);
if (ret)
@@ -964,6 +966,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
if (ret < 0)
goto out_release_page;
}
+ ext4_fc_track_inode(handle, inode);
ret = ext4_journal_get_write_access(handle, inode->i_sb, iloc.bh,
EXT4_JTR_NONE);
if (ret)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84c0eb55071d..f3de6748a96b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -627,6 +627,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
*/
map->m_flags &= ~EXT4_MAP_FLAGS;

+ ext4_fc_track_inode(handle, inode);
/*
* New blocks allocate and/or writing to unwritten extent
* will possibly result in updating i_data, so we take
@@ -4076,7 +4077,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)

/* If there are blocks to remove, do it */
if (stop_block > first_block) {
-
+ ext4_fc_track_inode(handle, inode);
down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_preallocations(inode, 0);

@@ -4232,6 +4233,7 @@ int ext4_truncate(struct inode *inode)
if (err)
goto out_stop;

+ ext4_fc_track_inode(handle, inode);
down_write(&EXT4_I(inode)->i_data_sem);

ext4_discard_preallocations(inode, 0);
@@ -5752,6 +5754,7 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
brelse(iloc->bh);
iloc->bh = NULL;
}
+ ext4_fc_track_inode(handle, inode);
}
ext4_std_error(inode->i_sb, err);
return err;
--
2.37.0.170.g444d1eabd0-goog