Hello,
Sending this RFC out for getting some review comments/suggestions on the
problem listed in this cover letter. It would be helpful if someone has any
pointers around these.
1. Patch-2 fixes the problem reported by Steven about dereferencing pointers from
the ring buffer [1].
One of the problem in this patch which I might need some help is, how to
expose EXT4_FC_REASON_MAX enum for
+ __array(unsigned int, fc_ineligible_rc, EXT4_FC_REASON_MAX)
My observation is that this trace_event is correctly printing the array
values when called from cat /sys/kernel/debug/tracing/trace_pipe.
But when I am using perf record -e ext4:ext4_fc_stats to record these trace
events, I am seeing the array values to be all 0.
Do you know what will be the right and easy way to fix above, so that this works
properly for perf tools too? And we should be able to backport this too.
2. Patch-9 discusses one of the problem where FC might miss to track/commit
inode's update. This also needs some discussion. I have added my observations
on the commit log of patch-9 itself.
3. Remaining are FC trace event improvement patches, which I found useful while
debugging some of the recent fast_commit issues.
Also a careful review of patches is always helpful :)
[1]: https://lore.kernel.org/linux-ext4/[email protected]/T/#u
Ritesh Harjani (9):
ext4: Remove unused enum EXT4_FC_COMMIT_FAILED
ext4: Fix ext4_fc_stats trace point
ext4: Add couple of more fast_commit tracepoints
ext4: Do not call FC trace event if FS does not support FC
ext4: Add commit_tid info in jbd debug log
ext4: Add commit tid info in ext4_fc_commit_start/stop trace events
ext4: Fix remaining two trace events to use same printk convention
ext4: Convert ext4_fc_track_dentry type events to use event class
ext4: fast_commit missing tracking updates to a file
fs/ext4/fast_commit.c | 30 +++--
fs/ext4/fast_commit.h | 1 -
include/trace/events/ext4.h | 229 +++++++++++++++++++++++++-----------
3 files changed, 182 insertions(+), 78 deletions(-)
--
2.31.1
This adds commit_tid argument in ext4_fc_update_stats()
so that we can add this information too in jbd_debug logs.
This is also required in a later patch to pass the commit_tid info in
ext4_fc_commit_start/stop() trace events.
Signed-off-by: Ritesh Harjani <[email protected]>
---
fs/ext4/fast_commit.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 7fb1eceef30c..ee32aac0cbbf 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1127,11 +1127,12 @@ static int ext4_fc_perform_commit(journal_t *journal)
}
static void ext4_fc_update_stats(struct super_block *sb, int status,
- u64 commit_time, int nblks)
+ u64 commit_time, int nblks, tid_t commit_tid)
{
struct ext4_fc_stats *stats = &EXT4_SB(sb)->s_fc_stats;
- jbd_debug(1, "Fast commit ended with status = %d", status);
+ jbd_debug(1, "Fast commit ended with status = %d for tid %u",
+ status, commit_tid);
if (status == EXT4_FC_STATUS_OK) {
stats->fc_num_commits++;
stats->fc_numblks += nblks;
@@ -1181,14 +1182,16 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
if (atomic_read(&sbi->s_fc_subtid) <= subtid &&
commit_tid > journal->j_commit_sequence)
goto restart_fc;
- ext4_fc_update_stats(sb, EXT4_FC_STATUS_SKIPPED, 0, 0);
+ ext4_fc_update_stats(sb, EXT4_FC_STATUS_SKIPPED, 0, 0,
+ commit_tid);
return 0;
} else if (ret) {
/*
* Commit couldn't start. Just update stats and perform a
* full commit.
*/
- ext4_fc_update_stats(sb, EXT4_FC_STATUS_FAILED, 0, 0);
+ ext4_fc_update_stats(sb, EXT4_FC_STATUS_FAILED, 0, 0,
+ commit_tid);
return jbd2_complete_transaction(journal, commit_tid);
}
@@ -1220,12 +1223,12 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
* don't react too strongly to vast changes in the commit time
*/
commit_time = ktime_to_ns(ktime_sub(ktime_get(), start_time));
- ext4_fc_update_stats(sb, status, commit_time, nblks);
+ ext4_fc_update_stats(sb, status, commit_time, nblks, commit_tid);
return ret;
fallback:
ret = jbd2_fc_end_commit_fallback(journal);
- ext4_fc_update_stats(sb, status, 0, 0);
+ ext4_fc_update_stats(sb, status, 0, 0, commit_tid);
return ret;
}
--
2.31.1
<DO NOT MERGE THIS YET>
Testcase
==========
1. i=0; while [ $i -lt 1000 ]; do xfs_io -f -c "pwrite -S 0xaa -b 32k 0 32k" -c "fsync" /mnt/$i; i=$(($i+1)); done && sudo ./src/godown -v /mnt && sudo umount /mnt && sudo mount /dev/loop2 /mnt'
2. ls -alih /mnt/ -> In this you will observe one such file with 0 bytes (which ideally should not happen)
^^^ say if you don't see the issue because your underlying storage
device is very fast, then maybe try with commit=1 mount option.
Analysis
==========
It seems a file's updates can be a part of two transaction tid.
Below are the sequence of events which could cause this issue.
jbd2_handle_start -> (t_tid = 38)
__ext4_new_inode
ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid = 38)
<track more updates>
jbd2_start_commit -> (t_tid = 38)
jbd2_handle_start (tid = 39)
ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid 39)
-> ext4_fc_reset_inode & ei->i_sync_tid = t_tid
ext4_fc_commit_start -> (will wait since jbd2 full commit is in progress)
jbd2_end_commit (t_tid = 38)
-> jbd2_fc_cleanup() -> this will cleanup entries in sbi->s_fc_q[FC_Q_MAIN]
-> And the above could result inode size as 0 as after effect.
ext4_fc_commit_stop
You could find the logs for the above behavior for inode 979 at [1].
-> So what is happening here is since the ei->i_fc_list is not empty
(because it is already part of sb's MAIN queue), we don't add this inode
again into neither sb's MAIN or STAGING queue.
And after jbd2_fc_cleanup() is called from jbd2 full commit, we
just remove this inode from the main queue.
So as a simple fix, what I did below was to check if it is a jbd2 full commit
in ext4_fc_cleanup(), and if the ei->i_sync_tid > tid, that means we
need not remove that from MAIN queue. This is since neither jbd2 nor FC
has committed updates of those inodes for this new txn tid yet.
But below are some quick queries on this
=========================================
1. why do we call ext4_fc_reset_inode() when inode tid and
running txn tid does not match?
2. Also is this an expected behavior from the design perspective of
fast_commit. i.e.
a. the inode can be part of two tids?
b. And that while a full commit is in progress, the inode can still
receive updates but using a new transaction tid.
Frankly speaking, since I was also working on other things, so I haven't
yet got the chance to completely analyze the situation yet.
Once I have those things sorted, I will spend more time on this, to
understand it more. Meanwhile if you already have some answers to above
queries/observations, please do share those here.
Links
=========
[1] https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/fast_commit/fc_inode_missing_updates_ino_979.txt
Signed-off-by: Ritesh Harjani <[email protected]>
---
fs/ext4/fast_commit.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 8803ba087b07..769b584c2552 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1252,6 +1252,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
spin_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
i_fc_list) {
+ if (full && iter->i_sync_tid > tid)
+ continue;
list_del_init(&iter->i_fc_list);
ext4_clear_inode_state(&iter->vfs_inode,
EXT4_STATE_FC_COMMITTING);
--
2.31.1
On 22/03/09 12:48PM, Theodore Ts'o wrote:
> Ritesh,
>
> Were you going to be sending a revised version of this patch series?
Hello Ted,
Due to some unexpected guests at home, I was on leave since last weekend.
I am starting to work from today. Let me work on the revised version of this
patch series. I will try to complete it before end of day i.e. before
our call.
-ritesh