2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 00/10] Ext4 fast commit fixes

ext4: fast commit fixes

This patch series adds several documentation and code-only (no on disk
format changes) fixes for the fast commit code. I verified that there
were no regressions introduced by this patch series in xfstests auto
and log groups in fast_commit and 4k configurations. Thanks Jan for
suggesting most of the fixes.

Signed-off-by: Harshad Shirwadkar <[email protected]>

Harshad Shirwadkar (10):
ext4: describe fast_commit feature flags
ext4: mark fc ineligible if inode gets evictied due to mem pressure
ext4: pass handle to ext4_fc_track_* functions
ext4: clean up the JBD2 API that initializes fast commits
jbd2: fix fast commit journalling APIs
ext4: dedpulicate the code to wait on inode that's being committed
ext4: misc fast commit fixes
ext4: fix inode dirty check in case of fast commits
ext4: disable fast commit with data journalling
ext4: issue fsdev cache flush before starting fast commit

Documentation/filesystems/ext4/journal.rst | 6 +
Documentation/filesystems/ext4/super.rst | 7 ++
fs/ext4/ext4.h | 22 ++--
fs/ext4/ext4_jbd2.h | 6 +-
fs/ext4/extents.c | 11 +-
fs/ext4/fast_commit.c | 138 ++++++++++-----------
fs/ext4/fast_commit.h | 3 +-
fs/ext4/file.c | 2 -
fs/ext4/inode.c | 14 +--
fs/ext4/namei.c | 50 ++++----
fs/ext4/super.c | 5 +-
fs/jbd2/commit.c | 11 +-
fs/jbd2/journal.c | 94 ++++++++------
fs/jbd2/recovery.c | 2 +-
include/linux/jbd2.h | 13 +-
include/trace/events/ext4.h | 10 +-
16 files changed, 220 insertions(+), 174 deletions(-)

--
2.29.1.341.ge80a0c044ae-goog


2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 02/10] ext4: mark fc ineligible if inode gets evictied due to mem pressure

If inode gets evicted due to memory pressure, we have to remove it
from the fast commit list. However, that inode may have uncommitted
changes that fast commits will lose. So, just fall back to full
commits in this case. Also, rename the fast commit ineligiblity reason
from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_CRUNCH" for better
expression.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 4 ++--
fs/ext4/fast_commit.h | 2 +-
fs/ext4/inode.c | 1 +
include/trace/events/ext4.h | 6 +++---
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 8d43058386c3..354f81ff819d 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -384,7 +384,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
mutex_unlock(&ei->i_fc_lock);
node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS);
if (!node) {
- ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM);
+ ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM_CRUNCH);
mutex_lock(&ei->i_fc_lock);
return -ENOMEM;
}
@@ -397,7 +397,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
if (!node->fcd_name.name) {
kmem_cache_free(ext4_fc_dentry_cachep, node);
ext4_fc_mark_ineligible(inode->i_sb,
- EXT4_FC_REASON_MEM);
+ EXT4_FC_REASON_MEM_CRUNCH);
mutex_lock(&ei->i_fc_lock);
return -ENOMEM;
}
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 06907d485989..cde86747faf8 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -100,7 +100,7 @@ enum {
EXT4_FC_REASON_XATTR = 0,
EXT4_FC_REASON_CROSS_RENAME,
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
- EXT4_FC_REASON_MEM,
+ EXT4_FC_REASON_MEM_CRUNCH,
EXT4_FC_REASON_SWAP_BOOT,
EXT4_FC_REASON_RESIZE,
EXT4_FC_REASON_RENAME_DIR,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b96a18679a27..52ff71236290 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -327,6 +327,7 @@ void ext4_evict_inode(struct inode *inode)
ext4_xattr_inode_array_free(ea_inode_array);
return;
no_delete:
+ ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM_CRUNCH);
ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
}

diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index b14314fcf732..239e98014f9b 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -100,7 +100,7 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B);
{ EXT4_FC_REASON_XATTR, "XATTR"}, \
{ EXT4_FC_REASON_CROSS_RENAME, "CROSS_RENAME"}, \
{ EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, "JOURNAL_FLAG_CHANGE"}, \
- { EXT4_FC_REASON_MEM, "NO_MEM"}, \
+ { EXT4_FC_REASON_MEM_CRUNCH, "MEM_CRUNCH"}, \
{ EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \
{ EXT4_FC_REASON_RESIZE, "RESIZE"}, \
{ EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
@@ -2917,13 +2917,13 @@ TRACE_EVENT(ext4_fc_stats,
),

TP_printk("dev %d:%d fc ineligible reasons:\n"
- "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s,%d; "
+ "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; "
"num_commits:%ld, ineligible: %ld, numblks: %ld",
MAJOR(__entry->dev), MINOR(__entry->dev),
FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
FC_REASON_NAME_STAT(EXT4_FC_REASON_CROSS_RENAME),
FC_REASON_NAME_STAT(EXT4_FC_REASON_JOURNAL_FLAG_CHANGE),
- FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM),
+ FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM_CRUNCH),
FC_REASON_NAME_STAT(EXT4_FC_REASON_SWAP_BOOT),
FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE),
FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR),
--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 06/10] ext4: dedpulicate the code to wait on inode that's being committed

This patch removes the deduplicates the code that implements waiting
on inode that's being committed. That code is moved into a new
function.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 59 ++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index b1ca55c7d32a..0f2543220d1d 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -155,6 +155,28 @@ void ext4_fc_init_inode(struct inode *inode)
ei->i_fc_committed_subtid = 0;
}

+static void ext4_fc_wait_committing_inode(struct inode *inode)
+{
+ wait_queue_head_t *wq;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+
+#if (BITS_PER_LONG < 64)
+ DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+#else
+ DEFINE_WAIT_BIT(wait, &ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+#endif
+ prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
+ spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
+ schedule();
+ finish_wait(wq, &wait.wq_entry);
+}
+
/*
* Inform Ext4's fast about start of an inode update
*
@@ -176,22 +198,7 @@ void ext4_fc_start_update(struct inode *inode)
goto out;

if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- wait_queue_head_t *wq;
-#if (BITS_PER_LONG < 64)
- DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
-#else
- DEFINE_WAIT_BIT(wait, &ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
-#endif
- prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- schedule();
- finish_wait(wq, &wait.wq_entry);
+ ext4_fc_wait_committing_inode(inode);
goto restart;
}
out:
@@ -234,26 +241,10 @@ void ext4_fc_del(struct inode *inode)
}

if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- wait_queue_head_t *wq;
-#if (BITS_PER_LONG < 64)
- DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
-#else
- DEFINE_WAIT_BIT(wait, &ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
-#endif
- prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- schedule();
- finish_wait(wq, &wait.wq_entry);
+ ext4_fc_wait_committing_inode(inode);
goto restart;
}
- if (!list_empty(&ei->i_fc_list))
- list_del_init(&ei->i_fc_list);
+ list_del_init(&ei->i_fc_list);
spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
}

--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 01/10] ext4: describe fast_commit feature flags

Fast commit feature has flags in the file system as well in JBD2. The
meaning of fast commit feature flags can get confusing. Update docs
and code to add more documentation about it.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
Documentation/filesystems/ext4/journal.rst | 6 ++++++
Documentation/filesystems/ext4/super.rst | 7 +++++++
fs/ext4/ext4.h | 7 +++++++
3 files changed, 20 insertions(+)

diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst
index 805a1e9ea3a5..849d5b119eb8 100644
--- a/Documentation/filesystems/ext4/journal.rst
+++ b/Documentation/filesystems/ext4/journal.rst
@@ -256,6 +256,10 @@ which is 1024 bytes long:
- s\_padding2
-
* - 0x54
+ - \_\_be32
+ - s\_num\_fc\_blocks
+ - Number of fast commit blocks in the journal.
+ * - 0x58
- \_\_u32
- s\_padding[42]
-
@@ -310,6 +314,8 @@ The journal incompat features are any combination of the following:
- This journal uses v3 of the checksum on-disk format. This is the same as
v2, but the journal block tag size is fixed regardless of the size of
block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3)
+ * - 0x20
+ - Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT)

.. _jbd2_checksum_type:

diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 93e55d7c1d40..2eb1ab20498d 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -596,6 +596,13 @@ following:
- Sparse Super Block, v2. If this flag is set, the SB field s\_backup\_bgs
points to the two block groups that contain backup superblocks
(COMPAT\_SPARSE\_SUPER2).
+ * - 0x400
+ - Fast commits supported. Although fast commits blocks are
+ backward incompatible, fast commit blocks are not always
+ present in the journal. If fast commit blocks are present in
+ the journal, JBD2 incompat feature
+ (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
+ set (COMPAT\_FAST\_COMMIT).

.. _super_incompat:

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2337e443fa30..12673f9ec880 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1875,6 +1875,13 @@ static inline bool ext4_verity_in_progress(struct inode *inode)
#define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010
#define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020
#define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
+/*
+ * The reason why "FAST_COMMIT" is a compat feature is that, FS becomes
+ * incompatible only if fast commit blocks are present in the FS. Since we
+ * clear the journal (and thus the fast commit blocks), we don't mark FS as
+ * incompatible. We also have a JBD2 incompat feature, which gets set when
+ * there are fast commit blocks present in the journal.
+ */
#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400
#define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800

--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 05/10] jbd2: fix fast commit journalling APIs

This patch adds a few misc fixes for jbd2 fast commit functions.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 2 +-
fs/jbd2/commit.c | 9 +++++++++
fs/jbd2/journal.c | 28 +++++++++++++---------------
include/linux/jbd2.h | 10 +++++++---
4 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 1b62d82b9622..b1ca55c7d32a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1134,7 +1134,7 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
"Fast commit ended with blks = %d, reason = %d, subtid - %d",
nblks, reason, subtid);
if (reason == EXT4_FC_REASON_FC_FAILED)
- return jbd2_fc_end_commit_fallback(journal, commit_tid);
+ return jbd2_fc_end_commit_fallback(journal);
if (reason == EXT4_FC_REASON_FC_START_FAILED ||
reason == EXT4_FC_REASON_INELIGIBLE)
return jbd2_complete_transaction(journal, commit_tid);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 353534403769..2444414ad89e 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -450,6 +450,15 @@ void jbd2_journal_commit_transaction(journal_t *journal)
schedule();
write_lock(&journal->j_state_lock);
finish_wait(&journal->j_fc_wait, &wait);
+ /*
+ * TODO: by blocking fast commits here, we are increasing
+ * fsync() latency slightly. Strictly speaking, we don't need
+ * to block fast commits until the transaction enters T_FLUSH
+ * state. So an optimization is possible where we block new fast
+ * commits here and wait for existing ones to complete
+ * just before we enter T_FLUSH. That way, the existing fast
+ * commits and this full commit can proceed parallely.
+ */
}
write_unlock(&journal->j_state_lock);

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ea15f55aff5c..368727ef3912 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -734,10 +734,12 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
if (!journal->j_stats.ts_tid)
return -EINVAL;

- if (tid <= journal->j_commit_sequence)
+ write_lock(&journal->j_state_lock);
+ if (tid <= journal->j_commit_sequence) {
+ write_unlock(&journal->j_state_lock);
return -EALREADY;
+ }

- write_lock(&journal->j_state_lock);
if (journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
(journal->j_flags & JBD2_FAST_COMMIT_ONGOING)) {
DEFINE_WAIT(wait);
@@ -777,13 +779,19 @@ static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)

int jbd2_fc_end_commit(journal_t *journal)
{
- return __jbd2_fc_end_commit(journal, 0, 0);
+ return __jbd2_fc_end_commit(journal, 0, false);
}
EXPORT_SYMBOL(jbd2_fc_end_commit);

-int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid)
+int jbd2_fc_end_commit_fallback(journal_t *journal)
{
- return __jbd2_fc_end_commit(journal, tid, 1);
+ tid_t tid;
+
+ read_lock(&journal->j_state_lock);
+ tid = journal->j_running_transaction ?
+ journal->j_running_transaction->t_tid : 0;
+ read_unlock(&journal->j_state_lock);
+ return __jbd2_fc_end_commit(journal, tid, true);
}
EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);

@@ -865,7 +873,6 @@ int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
int fc_off;

*bh_out = NULL;
- write_lock(&journal->j_state_lock);

if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) {
fc_off = journal->j_fc_off;
@@ -874,7 +881,6 @@ int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
} else {
ret = -EINVAL;
}
- write_unlock(&journal->j_state_lock);

if (ret)
return ret;
@@ -887,11 +893,7 @@ int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
if (!bh)
return -ENOMEM;

- lock_buffer(bh);

- clear_buffer_uptodate(bh);
- set_buffer_dirty(bh);
- unlock_buffer(bh);
journal->j_fc_wbuf[fc_off] = bh;

*bh_out = bh;
@@ -909,9 +911,7 @@ int jbd2_fc_wait_bufs(journal_t *journal, int num_blks)
struct buffer_head *bh;
int i, j_fc_off;

- read_lock(&journal->j_state_lock);
j_fc_off = journal->j_fc_off;
- read_unlock(&journal->j_state_lock);

/*
* Wait in reverse order to minimize chances of us being woken up before
@@ -939,9 +939,7 @@ int jbd2_fc_release_bufs(journal_t *journal)
struct buffer_head *bh;
int i, j_fc_off;

- read_lock(&journal->j_state_lock);
j_fc_off = journal->j_fc_off;
- read_unlock(&journal->j_state_lock);

/*
* Wait in reverse order to minimize chances of us being woken up before
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 9b4e87a0068b..df1285da7276 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -946,7 +946,9 @@ struct journal_s
* @j_fc_off:
*
* Number of fast commit blocks currently allocated.
- * [j_state_lock].
+ * [j_state_lock]. During the commit path, this variable is not
+ * protected by j_state_lock since only one process performs commit
+ * at a time.
*/
unsigned long j_fc_off;

@@ -1110,7 +1112,9 @@ struct journal_s

/**
* @j_fc_wbuf: Array of fast commit bhs for
- * jbd2_journal_commit_transaction.
+ * jbd2_journal_commit_transaction. During the commit path, this
+ * variable is not protected by j_state_lock since only one process
+ * performs commit at a time.
*/
struct buffer_head **j_fc_wbuf;

@@ -1618,7 +1622,7 @@ extern int jbd2_cleanup_journal_tail(journal_t *);
int jbd2_fc_init(journal_t *journal);
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
int jbd2_fc_end_commit(journal_t *journal);
-int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid);
+int jbd2_fc_end_commit_fallback(journal_t *journal);
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
int jbd2_submit_inode_data(struct jbd2_inode *jinode);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

This patch cleans up the jbd2_fc_init() API and its related functions
to simplify enabling fast commits and configuring the number of blocks
that fast commits will use. With this change, the number of fast
commit blocks to use is solely determined by the JBD2 layer. However,
whether or not to use fast commits is determined by the file
system. The file system just calls jbd2_fc_init() to tell the JBD2
layer that it is interested in enabling fast commits. JBD2 layer
determines how many blocks to use for fast commits (based on the value
found in the JBD2 superblock).

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 12 +-------
fs/jbd2/commit.c | 2 +-
fs/jbd2/journal.c | 66 +++++++++++++++++++++++++++----------------
fs/jbd2/recovery.c | 2 +-
include/linux/jbd2.h | 3 +-
5 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 5c3af472287a..1b62d82b9622 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -2082,8 +2082,6 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,

void ext4_fc_init(struct super_block *sb, journal_t *journal)
{
- int num_fc_blocks;
-
/*
* We set replay callback even if fast commit disabled because we may
* could still have fast commit blocks that need to be replayed even if
@@ -2093,15 +2091,7 @@ void ext4_fc_init(struct super_block *sb, journal_t *journal)
if (!test_opt2(sb, JOURNAL_FAST_COMMIT))
return;
journal->j_fc_cleanup_callback = ext4_fc_cleanup;
- if (!buffer_uptodate(journal->j_sb_buffer)
- && ext4_read_bh_lock(journal->j_sb_buffer, REQ_META | REQ_PRIO,
- true)) {
- ext4_msg(sb, KERN_ERR, "I/O error on journal");
- return;
- }
- num_fc_blocks = be32_to_cpu(journal->j_superblock->s_num_fc_blks);
- if (jbd2_fc_init(journal, num_fc_blocks ? num_fc_blocks :
- EXT4_NUM_FC_BLKS)) {
+ if (jbd2_fc_init(journal)) {
pr_warn("Error while enabling fast commits, turning off.");
ext4_clear_feature_fast_commit(sb);
}
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index fa688e163a80..353534403769 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -801,7 +801,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (first_block < journal->j_tail)
freed += journal->j_last - journal->j_first;
/* Update tail only if we free significant amount of space */
- if (freed < journal->j_maxlen / 4)
+ if (freed < (journal->j_maxlen - journal->j_fc_wbufsize) / 4)
update_tail = 0;
}
J_ASSERT(commit_transaction->t_state == T_COMMIT);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0c7c42bd530f..ea15f55aff5c 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1357,14 +1357,6 @@ static journal_t *journal_init_common(struct block_device *bdev,
if (!journal->j_wbuf)
goto err_cleanup;

- if (journal->j_fc_wbufsize > 0) {
- journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
- sizeof(struct buffer_head *),
- GFP_KERNEL);
- if (!journal->j_fc_wbuf)
- goto err_cleanup;
- }
-
bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
if (!bh) {
pr_err("%s: Cannot get buffer for journal superblock\n",
@@ -1378,19 +1370,21 @@ static journal_t *journal_init_common(struct block_device *bdev,

err_cleanup:
kfree(journal->j_wbuf);
- kfree(journal->j_fc_wbuf);
jbd2_journal_destroy_revoke(journal);
kfree(journal);
return NULL;
}

-int jbd2_fc_init(journal_t *journal, int num_fc_blks)
+int jbd2_fc_init(journal_t *journal)
{
- journal->j_fc_wbufsize = num_fc_blks;
- journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
- sizeof(struct buffer_head *), GFP_KERNEL);
- if (!journal->j_fc_wbuf)
- return -ENOMEM;
+ /*
+ * Only set j_fc_wbufsize here to indicate that the client file
+ * system is interested in using fast commits. The actual number of
+ * fast commit blocks is found inside jbd2_superblock and is only
+ * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
+ * gets set in journal_reset().
+ */
+ journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
return 0;
}
EXPORT_SYMBOL(jbd2_fc_init);
@@ -1500,7 +1494,7 @@ static void journal_fail_superblock(journal_t *journal)
static int journal_reset(journal_t *journal)
{
journal_superblock_t *sb = journal->j_superblock;
- unsigned long long first, last;
+ unsigned long long first, last, num_fc_blocks;

first = be32_to_cpu(sb->s_first);
last = be32_to_cpu(sb->s_maxlen);
@@ -1513,6 +1507,28 @@ static int journal_reset(journal_t *journal)

journal->j_first = first;

+ /*
+ * At this point, fast commit recovery has finished. Now, we solely
+ * rely on the file system to decide whether it wants fast commits
+ * or not. File system that wishes to use fast commits must have
+ * already called jbd2_fc_init() before we get here.
+ */
+ if (journal->j_fc_wbufsize > 0)
+ jbd2_journal_set_features(journal, 0, 0,
+ JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
+ else
+ jbd2_journal_clear_features(journal, 0, 0,
+ JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
+
+ /* If valid, prefer the value found in superblock over the default */
+ num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
+ if (num_fc_blocks > 0 && num_fc_blocks < last)
+ journal->j_fc_wbufsize = num_fc_blocks;
+
+ if (jbd2_has_feature_fast_commit(journal))
+ journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
+ sizeof(struct buffer_head *), GFP_KERNEL);
+
if (jbd2_has_feature_fast_commit(journal) &&
journal->j_fc_wbufsize > 0) {
journal->j_fc_last = last;
@@ -1531,7 +1547,8 @@ static int journal_reset(journal_t *journal)
journal->j_commit_sequence = journal->j_transaction_sequence - 1;
journal->j_commit_request = journal->j_commit_sequence;

- journal->j_max_transaction_buffers = journal->j_maxlen / 4;
+ journal->j_max_transaction_buffers =
+ (journal->j_maxlen - journal->j_fc_wbufsize) / 4;

/*
* As a special case, if the on-disk copy is already marked as needing
@@ -1872,6 +1889,7 @@ static int load_superblock(journal_t *journal)
{
int err;
journal_superblock_t *sb;
+ int num_fc_blocks;

err = journal_get_superblock(journal);
if (err)
@@ -1884,10 +1902,12 @@ static int load_superblock(journal_t *journal)
journal->j_first = be32_to_cpu(sb->s_first);
journal->j_errno = be32_to_cpu(sb->s_errno);

- if (jbd2_has_feature_fast_commit(journal) &&
- journal->j_fc_wbufsize > 0) {
+ if (jbd2_has_feature_fast_commit(journal)) {
journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
- journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
+ num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
+ if (!num_fc_blocks || num_fc_blocks >= journal->j_fc_last)
+ num_fc_blocks = JBD2_MIN_FC_BLOCKS;
+ journal->j_last = journal->j_fc_last - num_fc_blocks;
journal->j_fc_first = journal->j_last + 1;
journal->j_fc_off = 0;
} else {
@@ -1954,9 +1974,6 @@ int jbd2_journal_load(journal_t *journal)
*/
journal->j_flags &= ~JBD2_ABORT;

- if (journal->j_fc_wbufsize > 0)
- jbd2_journal_set_features(journal, 0, 0,
- JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
/* OK, we've finished with the dynamic journal bits:
* reinitialise the dynamic contents of the superblock in memory
* and reset them on disk. */
@@ -2040,8 +2057,7 @@ int jbd2_journal_destroy(journal_t *journal)
jbd2_journal_destroy_revoke(journal);
if (journal->j_chksum_driver)
crypto_free_shash(journal->j_chksum_driver);
- if (journal->j_fc_wbufsize > 0)
- kfree(journal->j_fc_wbuf);
+ kfree(journal->j_fc_wbuf);
kfree(journal->j_wbuf);
kfree(journal);

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index eb2606133cd8..822f16cbf9b3 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -134,7 +134,7 @@ static int jread(struct buffer_head **bhp, journal_t *journal,

*bhp = NULL;

- if (offset >= journal->j_maxlen) {
+ if (offset >= journal->j_maxlen + journal->j_fc_wbufsize) {
printk(KERN_ERR "JBD2: corrupted journal superblock\n");
return -EFSCORRUPTED;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 1d5566af48ac..9b4e87a0068b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -68,6 +68,7 @@ extern void *jbd2_alloc(size_t size, gfp_t flags);
extern void jbd2_free(void *ptr, size_t size);

#define JBD2_MIN_JOURNAL_BLOCKS 1024
+#define JBD2_MIN_FC_BLOCKS 256

#ifdef __KERNEL__

@@ -1614,7 +1615,7 @@ extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
extern int jbd2_cleanup_journal_tail(journal_t *);

/* Fast commit related APIs */
-int jbd2_fc_init(journal_t *journal, int num_fc_blks);
+int jbd2_fc_init(journal_t *journal);
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
int jbd2_fc_end_commit(journal_t *journal);
int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid);
--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:06:29

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 03/10] ext4: pass handle to ext4_fc_track_* functions

Use transaction id found in handle->h_transaction->h_tid for tracking
fast commit updates. This patch also restructures ext4_unlink to make
handle available inside ext4_unlink for fast commit tracking.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/ext4.h | 12 +++++------
fs/ext4/extents.c | 11 +++++-----
fs/ext4/fast_commit.c | 39 +++++++++++++++++----------------
fs/ext4/inode.c | 10 ++++-----
fs/ext4/namei.c | 50 +++++++++++++++++++++----------------------
fs/ext4/super.c | 2 +-
6 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 12673f9ec880..573db158382f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2750,12 +2750,12 @@ extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate);
int ext4_fc_info_show(struct seq_file *seq, void *v);
void ext4_fc_init(struct super_block *sb, journal_t *journal);
void ext4_fc_init_inode(struct inode *inode);
-void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start,
+void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start,
ext4_lblk_t end);
-void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry);
-void ext4_fc_track_link(struct inode *inode, struct dentry *dentry);
-void ext4_fc_track_create(struct inode *inode, struct dentry *dentry);
-void ext4_fc_track_inode(struct inode *inode);
+void ext4_fc_track_unlink(handle_t *handle, struct inode *inode, struct dentry *dentry);
+void ext4_fc_track_link(handle_t *handle, struct inode *inode, struct dentry *dentry);
+void ext4_fc_track_create(handle_t *handle, struct inode *inode, struct dentry *dentry);
+void ext4_fc_track_inode(handle_t *handle, struct inode *inode);
void ext4_fc_mark_ineligible(struct super_block *sb, int reason);
void ext4_fc_start_ineligible(struct super_block *sb, int reason);
void ext4_fc_stop_ineligible(struct super_block *sb);
@@ -3471,7 +3471,7 @@ extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode,
extern int ext4_ci_compare(const struct inode *parent,
const struct qstr *fname,
const struct qstr *entry, bool quick);
-extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
+extern int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name,
struct inode *inode);
extern int __ext4_link(struct inode *dir, struct inode *inode,
struct dentry *dentry);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 559100f3e23c..7c8f05db9402 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3723,7 +3723,7 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
out:
ext4_ext_show_leaf(inode, path);
- ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
+ ext4_fc_track_range(handle, inode, ee_block, ee_block + ee_len - 1);
return err;
}

@@ -3795,7 +3795,7 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,
if (*allocated > map->m_len)
*allocated = map->m_len;
map->m_len = *allocated;
- ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
+ ext4_fc_track_range(handle, inode, ee_block, ee_block + ee_len - 1);
return 0;
}

@@ -4329,7 +4329,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
map->m_len = ar.len;
allocated = map->m_len;
ext4_ext_show_leaf(inode, path);
- ext4_fc_track_range(inode, map->m_lblk, map->m_lblk + map->m_len - 1);
+ ext4_fc_track_range(handle, inode, map->m_lblk,
+ map->m_lblk + map->m_len - 1);
out:
ext4_ext_drop_refs(path);
kfree(path);
@@ -4602,7 +4603,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
ret = ext4_mark_inode_dirty(handle, inode);
if (unlikely(ret))
goto out_handle;
- ext4_fc_track_range(inode, offset >> inode->i_sb->s_blocksize_bits,
+ ext4_fc_track_range(handle, inode, offset >> inode->i_sb->s_blocksize_bits,
(offset + len - 1) >> inode->i_sb->s_blocksize_bits);
/* Zero out partial block at the edges of the range */
ret = ext4_zero_partial_blocks(handle, inode, offset, len);
@@ -4651,8 +4652,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
- ext4_fc_track_range(inode, offset >> blkbits,
- (offset + len - 1) >> blkbits);

ext4_fc_start_update(inode);

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 354f81ff819d..5c3af472287a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -323,15 +323,18 @@ static inline int ext4_fc_is_ineligible(struct super_block *sb)
* If enqueue is set, this function enqueues the inode in fast commit list.
*/
static int ext4_fc_track_template(
- struct inode *inode, int (*__fc_track_fn)(struct inode *, void *, bool),
+ handle_t *handle, struct inode *inode,
+ int (*__fc_track_fn)(struct inode *, void *, bool),
void *args, int enqueue)
{
- tid_t running_txn_tid;
bool update = false;
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ tid_t tid = 0;
int ret;

+ if (ext4_handle_valid(handle) && handle->h_transaction)
+ tid = handle->h_transaction->t_tid;
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
(sbi->s_mount_state & EXT4_FC_REPLAY))
return -EOPNOTSUPP;
@@ -339,15 +342,12 @@ static int ext4_fc_track_template(
if (ext4_fc_is_ineligible(inode->i_sb))
return -EINVAL;

- running_txn_tid = sbi->s_journal ?
- sbi->s_journal->j_commit_sequence + 1 : 0;
-
mutex_lock(&ei->i_fc_lock);
- if (running_txn_tid == ei->i_sync_tid) {
+ if (tid == ei->i_sync_tid) {
update = true;
} else {
ext4_fc_reset_inode(inode);
- ei->i_sync_tid = running_txn_tid;
+ ei->i_sync_tid = tid;
}
ret = __fc_track_fn(inode, args, update);
mutex_unlock(&ei->i_fc_lock);
@@ -422,7 +422,8 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
return 0;
}

-void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry)
+void ext4_fc_track_unlink(handle_t *handle,
+ struct inode *inode, struct dentry *dentry)
{
struct __track_dentry_update_args args;
int ret;
@@ -430,12 +431,13 @@ void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry)
args.dentry = dentry;
args.op = EXT4_FC_TAG_UNLINK;

- ret = ext4_fc_track_template(inode, __track_dentry_update,
+ ret = ext4_fc_track_template(handle, inode, __track_dentry_update,
(void *)&args, 0);
trace_ext4_fc_track_unlink(inode, dentry, ret);
}

-void ext4_fc_track_link(struct inode *inode, struct dentry *dentry)
+void ext4_fc_track_link(handle_t *handle,
+ struct inode *inode, struct dentry *dentry)
{
struct __track_dentry_update_args args;
int ret;
@@ -443,12 +445,13 @@ void ext4_fc_track_link(struct inode *inode, struct dentry *dentry)
args.dentry = dentry;
args.op = EXT4_FC_TAG_LINK;

- ret = ext4_fc_track_template(inode, __track_dentry_update,
+ ret = ext4_fc_track_template(handle, inode, __track_dentry_update,
(void *)&args, 0);
trace_ext4_fc_track_link(inode, dentry, ret);
}

-void ext4_fc_track_create(struct inode *inode, struct dentry *dentry)
+void ext4_fc_track_create(handle_t *handle, struct inode *inode,
+ struct dentry *dentry)
{
struct __track_dentry_update_args args;
int ret;
@@ -456,7 +459,7 @@ void ext4_fc_track_create(struct inode *inode, struct dentry *dentry)
args.dentry = dentry;
args.op = EXT4_FC_TAG_CREAT;

- ret = ext4_fc_track_template(inode, __track_dentry_update,
+ ret = ext4_fc_track_template(handle, inode, __track_dentry_update,
(void *)&args, 0);
trace_ext4_fc_track_create(inode, dentry, ret);
}
@@ -472,14 +475,14 @@ static int __track_inode(struct inode *inode, void *arg, bool update)
return 0;
}

-void ext4_fc_track_inode(struct inode *inode)
+void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
{
int ret;

if (S_ISDIR(inode->i_mode))
return;

- ret = ext4_fc_track_template(inode, __track_inode, NULL, 1);
+ ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
trace_ext4_fc_track_inode(inode, ret);
}

@@ -515,7 +518,7 @@ static int __track_range(struct inode *inode, void *arg, bool update)
return 0;
}

-void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start,
+void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start,
ext4_lblk_t end)
{
struct __track_range_args args;
@@ -527,7 +530,7 @@ void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start,
args.start = start;
args.end = end;

- ret = ext4_fc_track_template(inode, __track_range, &args, 1);
+ ret = ext4_fc_track_template(handle, inode, __track_range, &args, 1);

trace_ext4_fc_track_range(inode, start, end, ret);
}
@@ -1263,7 +1266,7 @@ static int ext4_fc_replay_unlink(struct super_block *sb, struct ext4_fc_tl *tl)
return 0;
}

- ret = __ext4_unlink(old_parent, &entry, inode);
+ ret = __ext4_unlink(NULL, old_parent, &entry, inode);
/* -ENOENT ok coz it might not exist anymore. */
if (ret == -ENOENT)
ret = 0;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 52ff71236290..7f6af784e74f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -731,7 +731,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
if (ret)
return ret;
}
- ext4_fc_track_range(inode, map->m_lblk,
+ ext4_fc_track_range(handle, inode, map->m_lblk,
map->m_lblk + map->m_len - 1);
}

@@ -4110,7 +4110,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)

up_write(&EXT4_I(inode)->i_data_sem);
}
- ext4_fc_track_range(inode, first_block, stop_block);
+ ext4_fc_track_range(handle, inode, first_block, stop_block);
if (IS_SYNC(inode))
ext4_handle_sync(handle);

@@ -5443,14 +5443,14 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
}

if (shrink)
- ext4_fc_track_range(inode,
+ ext4_fc_track_range(handle, inode,
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
inode->i_sb->s_blocksize_bits,
(oldsize > 0 ? oldsize - 1 : 0) >>
inode->i_sb->s_blocksize_bits);
else
ext4_fc_track_range(
- inode,
+ handle, inode,
(oldsize > 0 ? oldsize - 1 : oldsize) >>
inode->i_sb->s_blocksize_bits,
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
@@ -5700,7 +5700,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
put_bh(iloc->bh);
return -EIO;
}
- ext4_fc_track_inode(inode);
+ ext4_fc_track_inode(handle, inode);

if (IS_I_VERSION(inode))
inode_inc_iversion(inode);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5159830dacb8..f3f8bf61072e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2631,7 +2631,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
inode_save = inode;
ihold(inode_save);
err = ext4_add_nondir(handle, dentry, &inode);
- ext4_fc_track_create(inode_save, dentry);
+ ext4_fc_track_create(handle, inode_save, dentry);
iput(inode_save);
}
if (handle)
@@ -2668,7 +2668,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
ihold(inode_save);
err = ext4_add_nondir(handle, dentry, &inode);
if (!err)
- ext4_fc_track_create(inode_save, dentry);
+ ext4_fc_track_create(handle, inode_save, dentry);
iput(inode_save);
}
if (handle)
@@ -2833,7 +2833,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
iput(inode);
goto out_retry;
}
- ext4_fc_track_create(inode, dentry);
+ ext4_fc_track_create(handle, inode, dentry);
ext4_inc_count(dir);

ext4_update_dx_flag(dir);
@@ -3175,7 +3175,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
goto end_rmdir;
ext4_dec_count(dir);
ext4_update_dx_flag(dir);
- ext4_fc_track_unlink(inode, dentry);
+ ext4_fc_track_unlink(handle, inode, dentry);
retval = ext4_mark_inode_dirty(handle, dir);

#ifdef CONFIG_UNICODE
@@ -3196,13 +3196,12 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
return retval;
}

-int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
+int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name,
struct inode *inode)
{
int retval = -ENOENT;
struct buffer_head *bh;
struct ext4_dir_entry_2 *de;
- handle_t *handle = NULL;
int skip_remove_dentry = 0;

bh = ext4_find_entry(dir, d_name, &de, NULL);
@@ -3221,14 +3220,7 @@ int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
skip_remove_dentry = 1;
else
- goto out_bh;
- }
-
- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
- if (IS_ERR(handle)) {
- retval = PTR_ERR(handle);
- goto out_bh;
+ goto out;
}

if (IS_DIRSYNC(dir))
@@ -3237,12 +3229,12 @@ int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
if (!skip_remove_dentry) {
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
- goto out_handle;
+ goto out;
dir->i_ctime = dir->i_mtime = current_time(dir);
ext4_update_dx_flag(dir);
retval = ext4_mark_inode_dirty(handle, dir);
if (retval)
- goto out_handle;
+ goto out;
} else {
retval = 0;
}
@@ -3256,15 +3248,14 @@ int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
inode->i_ctime = current_time(inode);
retval = ext4_mark_inode_dirty(handle, inode);

-out_handle:
- ext4_journal_stop(handle);
-out_bh:
+out:
brelse(bh);
return retval;
}

static int ext4_unlink(struct inode *dir, struct dentry *dentry)
{
+ handle_t *handle;
int retval;

if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
@@ -3282,9 +3273,16 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
if (retval)
goto out_trace;

- retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
+ if (IS_ERR(handle)) {
+ retval = PTR_ERR(handle);
+ goto out_trace;
+ }
+
+ retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry));
if (!retval)
- ext4_fc_track_unlink(d_inode(dentry), dentry);
+ ext4_fc_track_unlink(handle, d_inode(dentry), dentry);
#ifdef CONFIG_UNICODE
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
@@ -3295,6 +3293,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
if (IS_CASEFOLDED(dir))
d_invalidate(dentry);
#endif
+ if (handle)
+ ext4_journal_stop(handle);

out_trace:
trace_ext4_unlink_exit(dentry, retval);
@@ -3451,7 +3451,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)

err = ext4_add_entry(handle, dentry, inode);
if (!err) {
- ext4_fc_track_link(inode, dentry);
+ ext4_fc_track_link(handle, inode, dentry);
err = ext4_mark_inode_dirty(handle, inode);
/* this can happen only for tmpfile being
* linked the first time
@@ -3919,9 +3919,9 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
EXT4_FC_REASON_RENAME_DIR);
} else {
if (new.inode)
- ext4_fc_track_unlink(new.inode, new.dentry);
- ext4_fc_track_link(old.inode, new.dentry);
- ext4_fc_track_unlink(old.inode, old.dentry);
+ ext4_fc_track_unlink(handle, new.inode, new.dentry);
+ ext4_fc_track_link(handle, old.inode, new.dentry);
+ ext4_fc_track_unlink(handle, old.inode, old.dentry);
}

if (new.inode) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d92de21212e9..e67d2fa41a78 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6561,7 +6561,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
brelse(bh);
out:
if (inode->i_size < off + len) {
- ext4_fc_track_range(inode,
+ ext4_fc_track_range(handle, inode,
(inode->i_size > 0 ? inode->i_size - 1 : 0)
>> inode->i_sb->s_blocksize_bits,
(off + len) >> inode->i_sb->s_blocksize_bits);
--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:07:12

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 08/10] ext4: fix inode dirty check in case of fast commits

In case of fast commits, determine if the inode is dirty by checking
if the inode is on fast commit list. This also helps us get rid of
ext4_inode_info.i_fc_committed_subtid field.

Reported-by: Andrea Righi <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/ext4.h | 3 ---
fs/ext4/fast_commit.c | 3 ---
fs/ext4/inode.c | 3 +--
3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 573db158382f..7222a9ba5d66 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1028,9 +1028,6 @@ struct ext4_inode_info {
* protected by sbi->s_fc_lock.
*/

- /* Fast commit subtid when this inode was committed */
- unsigned int i_fc_committed_subtid;
-
/* Start of lblk range that needs to be committed in this fast commit */
ext4_lblk_t i_fc_lblk_start;

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index b7b1fe6dbb24..4c0a3e858ea3 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -152,7 +152,6 @@ void ext4_fc_init_inode(struct inode *inode)
INIT_LIST_HEAD(&ei->i_fc_list);
init_waitqueue_head(&ei->i_fc_wait);
atomic_set(&ei->i_fc_updates, 0);
- ei->i_fc_committed_subtid = 0;
}

static void ext4_fc_wait_committing_inode(struct inode *inode)
@@ -1026,8 +1025,6 @@ static int ext4_fc_perform_commit(journal_t *journal)
if (ret)
goto out;
spin_lock(&sbi->s_fc_lock);
- EXT4_I(inode)->i_fc_committed_subtid =
- atomic_read(&sbi->s_fc_subtid);
}
spin_unlock(&sbi->s_fc_lock);

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7f6af784e74f..d36c3908272f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3311,8 +3311,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
EXT4_I(inode)->i_datasync_tid))
return false;
if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
- return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) <
- EXT4_I(inode)->i_fc_committed_subtid;
+ return !list_empty(&EXT4_I(inode)->i_fc_list);
return true;
}

--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:07:20

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 09/10] ext4: disable fast commit with data journalling

Fast commits don't work with data journalling. This patch disables the
fast commit support when data journalling is turned on.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 7 +++++++
fs/ext4/fast_commit.h | 1 +
fs/ext4/super.c | 3 ++-
include/trace/events/ext4.h | 6 ++++--
4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 4c0a3e858ea3..9ae8ba213961 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -472,6 +472,12 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
if (S_ISDIR(inode->i_mode))
return;

+ if (ext4_should_journal_data(inode)) {
+ ext4_fc_mark_ineligible(inode->i_sb,
+ EXT4_FC_REASON_INODE_JOURNAL_DATA);
+ return;
+ }
+
ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
trace_ext4_fc_track_inode(inode, ret);
}
@@ -2095,6 +2101,7 @@ const char *fc_ineligible_reasons[] = {
"Resize",
"Dir renamed",
"Falloc range op",
+ "Data journalling",
"FC Commit Failed"
};

diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index cde86747faf8..cdb36a10dfd0 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -105,6 +105,7 @@ enum {
EXT4_FC_REASON_RESIZE,
EXT4_FC_REASON_RENAME_DIR,
EXT4_FC_REASON_FALLOC_RANGE,
+ EXT4_FC_REASON_INODE_JOURNAL_DATA,
EXT4_FC_COMMIT_FAILED,
EXT4_FC_REASON_MAX
};
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e67d2fa41a78..9333475737ac 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4340,9 +4340,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#endif

if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) {
- printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support!\n");
+ printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!\n");
/* can't mount with both data=journal and dioread_nolock. */
clear_opt(sb, DIOREAD_NOLOCK);
+ clear_opt2(sb, JOURNAL_FAST_COMMIT);
if (test_opt2(sb, EXPLICIT_DELALLOC)) {
ext4_msg(sb, KERN_ERR, "can't mount with "
"both data=journal and delalloc");
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 239e98014f9b..ee7362f31eb6 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -104,7 +104,8 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B);
{ EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \
{ EXT4_FC_REASON_RESIZE, "RESIZE"}, \
{ EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
- { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"})
+ { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \
+ { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"})

TRACE_EVENT(ext4_other_inode_update_time,
TP_PROTO(struct inode *inode, ino_t orig_ino),
@@ -2917,7 +2918,7 @@ TRACE_EVENT(ext4_fc_stats,
),

TP_printk("dev %d:%d fc ineligible reasons:\n"
- "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; "
+ "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; "
"num_commits:%ld, ineligible: %ld, numblks: %ld",
MAJOR(__entry->dev), MINOR(__entry->dev),
FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
@@ -2928,6 +2929,7 @@ TRACE_EVENT(ext4_fc_stats,
FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE),
FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR),
FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
+ FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA),
__entry->sbi->s_fc_stats.fc_num_commits,
__entry->sbi->s_fc_stats.fc_ineligible_commits,
__entry->sbi->s_fc_stats.fc_numblks)
--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:07:36

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 07/10] ext4: misc fast commit fixes

This patch adds a small number of misc fast commit fixes. Along with
functional fixes such as setting the right buffer flags, there also
typo fixes and comment additions.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/ext4_jbd2.h | 6 +++++-
fs/ext4/fast_commit.c | 5 +++--
fs/ext4/file.c | 2 --
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 00dc668e052b..10855cd230c7 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -422,9 +422,13 @@ static inline int ext4_journal_force_commit(journal_t *journal)
static inline int ext4_jbd2_inode_add_write(handle_t *handle,
struct inode *inode, loff_t start_byte, loff_t length)
{
- if (ext4_handle_valid(handle))
+ if (ext4_handle_valid(handle)) {
+ ext4_fc_track_range(handle, inode,
+ start_byte >> inode->i_sb->s_blocksize_bits,
+ (start_byte + length) >> inode->i_sb->s_blocksize_bits);
return jbd2_journal_inode_ranged_write(handle,
EXT4_I(inode)->jinode, start_byte, length);
+ }
return 0;
}

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 0f2543220d1d..b7b1fe6dbb24 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -83,7 +83,7 @@
*
* Atomicity of commits
* --------------------
- * In order to gaurantee atomicity during the commit operation, fast commit
+ * In order to guarantee atomicity during the commit operation, fast commit
* uses "EXT4_FC_TAG_TAIL" tag that marks a fast commit as complete. Tail
* tag contains CRC of the contents and TID of the transaction after which
* this fast commit should be applied. Recovery code replays fast commit
@@ -531,10 +531,11 @@ static void ext4_fc_submit_bh(struct super_block *sb)
int write_flags = REQ_SYNC;
struct buffer_head *bh = EXT4_SB(sb)->s_fc_bh;

+ /* TODO: REQ_FUA | REQ_PREFLUSH is unnecessarily expensive. */
if (test_opt(sb, BARRIER))
write_flags |= REQ_FUA | REQ_PREFLUSH;
lock_buffer(bh);
- clear_buffer_dirty(bh);
+ set_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh->b_end_io = ext4_end_buffer_io_sync;
submit_bh(REQ_OP_WRITE, write_flags, bh);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d85412d12e3a..80ad5ccc0288 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -761,7 +761,6 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;

- ext4_fc_start_update(inode);
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
@@ -769,7 +768,6 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
} else {
vma->vm_ops = &ext4_file_vm_ops;
}
- ext4_fc_stop_update(inode);
return 0;
}

--
2.29.1.341.ge80a0c044ae-goog

2020-10-31 20:09:06

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH 10/10] ext4: issue fsdev cache flush before starting fast commit

If the journal dev is different from fsdev, issue a cache flush before
committing fast commit blocks to disk.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
---
fs/ext4/fast_commit.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 9ae8ba213961..facf2f59b895 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -996,6 +996,13 @@ static int ext4_fc_perform_commit(journal_t *journal)
if (ret)
return ret;

+ /*
+ * If file system device is different from journal device, issue a cache
+ * flush before we start writing fast commit blocks.
+ */
+ if (journal->j_fs_dev != journal->j_dev)
+ blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS);
+
blk_start_plug(&plug);
if (sbi->s_fc_bytes == 0) {
/*
--
2.29.1.341.ge80a0c044ae-goog

2020-11-01 07:10:24

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH 08/10] ext4: fix inode dirty check in case of fast commits

On Sat, Oct 31, 2020 at 01:05:16PM -0700, Harshad Shirwadkar wrote:
> In case of fast commits, determine if the inode is dirty by checking
> if the inode is on fast commit list. This also helps us get rid of
> ext4_inode_info.i_fc_committed_subtid field.
>
> Reported-by: Andrea Righi <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Tested and looks good to me. Thanks Harshad!

Tested-by: Andrea Righi <[email protected]>

> ---
> fs/ext4/ext4.h | 3 ---
> fs/ext4/fast_commit.c | 3 ---
> fs/ext4/inode.c | 3 +--
> 3 files changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 573db158382f..7222a9ba5d66 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1028,9 +1028,6 @@ struct ext4_inode_info {
> * protected by sbi->s_fc_lock.
> */
>
> - /* Fast commit subtid when this inode was committed */
> - unsigned int i_fc_committed_subtid;
> -
> /* Start of lblk range that needs to be committed in this fast commit */
> ext4_lblk_t i_fc_lblk_start;
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index b7b1fe6dbb24..4c0a3e858ea3 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -152,7 +152,6 @@ void ext4_fc_init_inode(struct inode *inode)
> INIT_LIST_HEAD(&ei->i_fc_list);
> init_waitqueue_head(&ei->i_fc_wait);
> atomic_set(&ei->i_fc_updates, 0);
> - ei->i_fc_committed_subtid = 0;
> }
>
> static void ext4_fc_wait_committing_inode(struct inode *inode)
> @@ -1026,8 +1025,6 @@ static int ext4_fc_perform_commit(journal_t *journal)
> if (ret)
> goto out;
> spin_lock(&sbi->s_fc_lock);
> - EXT4_I(inode)->i_fc_committed_subtid =
> - atomic_read(&sbi->s_fc_subtid);
> }
> spin_unlock(&sbi->s_fc_lock);
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 7f6af784e74f..d36c3908272f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3311,8 +3311,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> EXT4_I(inode)->i_datasync_tid))
> return false;
> if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
> - return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) <
> - EXT4_I(inode)->i_fc_committed_subtid;
> + return !list_empty(&EXT4_I(inode)->i_fc_list);
> return true;
> }
>
> --
> 2.29.1.341.ge80a0c044ae-goog

2020-11-03 10:36:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 01/10] ext4: describe fast_commit feature flags

On Sat 31-10-20 13:05:09, Harshad Shirwadkar wrote:
> Fast commit feature has flags in the file system as well in JBD2. The
> meaning of fast commit feature flags can get confusing. Update docs
> and code to add more documentation about it.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Looks good to me. Thanks! You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> Documentation/filesystems/ext4/journal.rst | 6 ++++++
> Documentation/filesystems/ext4/super.rst | 7 +++++++
> fs/ext4/ext4.h | 7 +++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst
> index 805a1e9ea3a5..849d5b119eb8 100644
> --- a/Documentation/filesystems/ext4/journal.rst
> +++ b/Documentation/filesystems/ext4/journal.rst
> @@ -256,6 +256,10 @@ which is 1024 bytes long:
> - s\_padding2
> -
> * - 0x54
> + - \_\_be32
> + - s\_num\_fc\_blocks
> + - Number of fast commit blocks in the journal.
> + * - 0x58
> - \_\_u32
> - s\_padding[42]
> -
> @@ -310,6 +314,8 @@ The journal incompat features are any combination of the following:
> - This journal uses v3 of the checksum on-disk format. This is the same as
> v2, but the journal block tag size is fixed regardless of the size of
> block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3)
> + * - 0x20
> + - Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT)
>
> .. _jbd2_checksum_type:
>
> diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
> index 93e55d7c1d40..2eb1ab20498d 100644
> --- a/Documentation/filesystems/ext4/super.rst
> +++ b/Documentation/filesystems/ext4/super.rst
> @@ -596,6 +596,13 @@ following:
> - Sparse Super Block, v2. If this flag is set, the SB field s\_backup\_bgs
> points to the two block groups that contain backup superblocks
> (COMPAT\_SPARSE\_SUPER2).
> + * - 0x400
> + - Fast commits supported. Although fast commits blocks are
> + backward incompatible, fast commit blocks are not always
> + present in the journal. If fast commit blocks are present in
> + the journal, JBD2 incompat feature
> + (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
> + set (COMPAT\_FAST\_COMMIT).
>
> .. _super_incompat:
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 2337e443fa30..12673f9ec880 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1875,6 +1875,13 @@ static inline bool ext4_verity_in_progress(struct inode *inode)
> #define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010
> #define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020
> #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
> +/*
> + * The reason why "FAST_COMMIT" is a compat feature is that, FS becomes
> + * incompatible only if fast commit blocks are present in the FS. Since we
> + * clear the journal (and thus the fast commit blocks), we don't mark FS as
> + * incompatible. We also have a JBD2 incompat feature, which gets set when
> + * there are fast commit blocks present in the journal.
> + */
> #define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400
> #define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800
>
> --
> 2.29.1.341.ge80a0c044ae-goog
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 14:15:41

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 02/10] ext4: mark fc ineligible if inode gets evictied due to mem pressure

On Sat 31-10-20 13:05:10, Harshad Shirwadkar wrote:
> If inode gets evicted due to memory pressure, we have to remove it
> from the fast commit list. However, that inode may have uncommitted
> changes that fast commits will lose. So, just fall back to full
> commits in this case. Also, rename the fast commit ineligiblity reason
> from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_CRUNCH" for better
> expression.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

...

> diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> index 06907d485989..cde86747faf8 100644
> --- a/fs/ext4/fast_commit.h
> +++ b/fs/ext4/fast_commit.h
> @@ -100,7 +100,7 @@ enum {
> EXT4_FC_REASON_XATTR = 0,
> EXT4_FC_REASON_CROSS_RENAME,
> EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
> - EXT4_FC_REASON_MEM,
> + EXT4_FC_REASON_MEM_CRUNCH,

Well MEM_CRUNCH doesn't really sound more understandable to me :). I'd
rather call it MEM_RECLAIM or ENOMEM or something like that...

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b96a18679a27..52ff71236290 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -327,6 +327,7 @@ void ext4_evict_inode(struct inode *inode)
> ext4_xattr_inode_array_free(ea_inode_array);
> return;
> no_delete:
> + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM_CRUNCH);
> ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
> }

This will make fs ineligible on every inode reclaim. Even if the inode was
clean, not part of any FC. I guess this is too aggressive...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 14:47:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 03/10] ext4: pass handle to ext4_fc_track_* functions

On Sat 31-10-20 13:05:11, Harshad Shirwadkar wrote:
> Use transaction id found in handle->h_transaction->h_tid for tracking
> fast commit updates. This patch also restructures ext4_unlink to make
> handle available inside ext4_unlink for fast commit tracking.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Thanks for the patch. Couple of comments below:

> @@ -4651,8 +4652,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;
> - ext4_fc_track_range(inode, offset >> blkbits,
> - (offset + len - 1) >> blkbits);

Why do you delete the ext4_fc_track_range() call here?

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 354f81ff819d..5c3af472287a 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -323,15 +323,18 @@ static inline int ext4_fc_is_ineligible(struct super_block *sb)
> * If enqueue is set, this function enqueues the inode in fast commit list.
> */
> static int ext4_fc_track_template(
> - struct inode *inode, int (*__fc_track_fn)(struct inode *, void *, bool),
> + handle_t *handle, struct inode *inode,
> + int (*__fc_track_fn)(struct inode *, void *, bool),
> void *args, int enqueue)
> {
> - tid_t running_txn_tid;
> bool update = false;
> struct ext4_inode_info *ei = EXT4_I(inode);
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + tid_t tid = 0;
> int ret;
>
> + if (ext4_handle_valid(handle) && handle->h_transaction)
> + tid = handle->h_transaction->t_tid;

The handle->h_transaction check is pointless here. It is always true. And
if you move the tid fetching after the JOURNAL_FAST_COMMIT check below, you
don't need the ext4_handle_valid() check either as fastcommit cannot be
enabled without a journal.

> if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
> (sbi->s_mount_state & EXT4_FC_REPLAY))
> return -EOPNOTSUPP;


> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 5159830dacb8..f3f8bf61072e 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2631,7 +2631,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
> inode_save = inode;
> ihold(inode_save);
> err = ext4_add_nondir(handle, dentry, &inode);
> - ext4_fc_track_create(inode_save, dentry);
> + ext4_fc_track_create(handle, inode_save, dentry);
> iput(inode_save);
> }
> if (handle)
> @@ -2668,7 +2668,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
> ihold(inode_save);
> err = ext4_add_nondir(handle, dentry, &inode);
> if (!err)
> - ext4_fc_track_create(inode_save, dentry);
> + ext4_fc_track_create(handle, inode_save, dentry);
> iput(inode_save);
> }

Not directly related to this patch but why do you bother with 'inode_save'
in the above cases? I guess you're afraid by the comment that "inode
reference is consumed by the dentry" but since you have a dentry reference
as well, you can be sure that the inode stays around...

> if (handle)
> @@ -2833,7 +2833,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> iput(inode);
> goto out_retry;
> }
> - ext4_fc_track_create(inode, dentry);
> + ext4_fc_track_create(handle, inode, dentry);
> ext4_inc_count(dir);

And I was also wondering why all the directory tracking functions take both
dentry and the inode. You can fetch inode from a dentry with d_inode()
helper so I don't see a reason for passing it separately. That is, in a
couple of places you call ext4_fc_track_*() before d_instantiate[_new]() so
dentry isn't fully setup yet but there's nothing which prevents you from
calling it after d_instantiate().

The only possible exception to this is the ext4_rename() code. There you
don't have suitable dentry for the link tracking so this would need to
explicitely pass the inode & dentry. But that place can just call a low
level wrapper allowing that. All the other places can use a higher level
function which just takes the dentry.

> static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> {
> + handle_t *handle;
> int retval;
>
> if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
> @@ -3282,9 +3273,16 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> if (retval)
> goto out_trace;
>
> - retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
> + if (IS_ERR(handle)) {
> + retval = PTR_ERR(handle);
> + goto out_trace;
> + }
> +
> + retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry));
> if (!retval)
> - ext4_fc_track_unlink(d_inode(dentry), dentry);
> + ext4_fc_track_unlink(handle, d_inode(dentry), dentry);
> #ifdef CONFIG_UNICODE
> /* VFS negative dentries are incompatible with Encoding and
> * Case-insensitiveness. Eventually we'll want avoid
> @@ -3295,6 +3293,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> if (IS_CASEFOLDED(dir))
> d_invalidate(dentry);
> #endif
> + if (handle)
> + ext4_journal_stop(handle);

How could 'handle' be NULL here?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 16:30:21

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

On Sat 31-10-20 13:05:12, Harshad Shirwadkar wrote:
> This patch cleans up the jbd2_fc_init() API and its related functions
> to simplify enabling fast commits and configuring the number of blocks
> that fast commits will use. With this change, the number of fast
> commit blocks to use is solely determined by the JBD2 layer. However,
> whether or not to use fast commits is determined by the file
> system. The file system just calls jbd2_fc_init() to tell the JBD2
> layer that it is interested in enabling fast commits. JBD2 layer
> determines how many blocks to use for fast commits (based on the value
> found in the JBD2 superblock).
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Thanks for the cleanup. Some comments below...

> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index fa688e163a80..353534403769 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -801,7 +801,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> if (first_block < journal->j_tail)
> freed += journal->j_last - journal->j_first;
> /* Update tail only if we free significant amount of space */
> - if (freed < journal->j_maxlen / 4)
> + if (freed < (journal->j_maxlen - journal->j_fc_wbufsize) / 4)
> update_tail = 0;

This change seems unrelated to the API change in jbd2_fc_init(). Can you
please separate fix for journal length handling into a separate patch with
a proper changelog etc.?

Also can you perhaps rename j_maxlen to j_total_len to give better hint
that there may be multiple parts of the journal and provide wrapper
jbd2_transaction_space(journal) for the
(journal->j_maxlen - journal->j_fc_wbufsize) expression because that's kind
of implementation detail of the current fastcommit code.

> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 0c7c42bd530f..ea15f55aff5c 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1357,14 +1357,6 @@ static journal_t *journal_init_common(struct block_device *bdev,
> if (!journal->j_wbuf)
> goto err_cleanup;
>
> - if (journal->j_fc_wbufsize > 0) {
> - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> - sizeof(struct buffer_head *),
> - GFP_KERNEL);
> - if (!journal->j_fc_wbuf)
> - goto err_cleanup;
> - }
> -
> bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
> if (!bh) {
> pr_err("%s: Cannot get buffer for journal superblock\n",
> @@ -1378,19 +1370,21 @@ static journal_t *journal_init_common(struct block_device *bdev,
>
> err_cleanup:
> kfree(journal->j_wbuf);
> - kfree(journal->j_fc_wbuf);
> jbd2_journal_destroy_revoke(journal);
> kfree(journal);
> return NULL;
> }
>
> -int jbd2_fc_init(journal_t *journal, int num_fc_blks)
> +int jbd2_fc_init(journal_t *journal)
> {
> - journal->j_fc_wbufsize = num_fc_blks;
> - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> - sizeof(struct buffer_head *), GFP_KERNEL);
> - if (!journal->j_fc_wbuf)
> - return -ENOMEM;
> + /*
> + * Only set j_fc_wbufsize here to indicate that the client file
> + * system is interested in using fast commits. The actual number of
> + * fast commit blocks is found inside jbd2_superblock and is only
> + * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
> + * gets set in journal_reset().
> + */
> + journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
> return 0;
> }

When looking at this, is there a reason why jbd2_fc_init() still exists? I
mean why not just make the rule that the journal has FC block number set
iff FC gets enabled? Anything else seems a bit confusing to me and also
dangerous - imagine we have fs with FC running, we write some FCs and then
crash. Then on system recovery we mount with no_fc mount option. We have
just lost data on the filesystem AFAIU... So I'd just remove all the mount
options related to fastcommits and leave everything to the journal setup
(which can be modified with e2fsprogs if needed) to keep things simple.

> EXPORT_SYMBOL(jbd2_fc_init);
> @@ -1500,7 +1494,7 @@ static void journal_fail_superblock(journal_t *journal)
> static int journal_reset(journal_t *journal)
> {
> journal_superblock_t *sb = journal->j_superblock;
> - unsigned long long first, last;
> + unsigned long long first, last, num_fc_blocks;
>
> first = be32_to_cpu(sb->s_first);
> last = be32_to_cpu(sb->s_maxlen);
> @@ -1513,6 +1507,28 @@ static int journal_reset(journal_t *journal)
>
> journal->j_first = first;
>
> + /*
> + * At this point, fast commit recovery has finished. Now, we solely
> + * rely on the file system to decide whether it wants fast commits
> + * or not. File system that wishes to use fast commits must have
> + * already called jbd2_fc_init() before we get here.
> + */
> + if (journal->j_fc_wbufsize > 0)
> + jbd2_journal_set_features(journal, 0, 0,
> + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> + else
> + jbd2_journal_clear_features(journal, 0, 0,
> + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> +
> + /* If valid, prefer the value found in superblock over the default */
> + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> + if (num_fc_blocks > 0 && num_fc_blocks < last)
> + journal->j_fc_wbufsize = num_fc_blocks;
> +
> + if (jbd2_has_feature_fast_commit(journal))
> + journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> + sizeof(struct buffer_head *), GFP_KERNEL);
> +
> if (jbd2_has_feature_fast_commit(journal) &&
> journal->j_fc_wbufsize > 0) {
> journal->j_fc_last = last;
> @@ -1531,7 +1547,8 @@ static int journal_reset(journal_t *journal)
> journal->j_commit_sequence = journal->j_transaction_sequence - 1;
> journal->j_commit_request = journal->j_commit_sequence;
>
> - journal->j_max_transaction_buffers = journal->j_maxlen / 4;
> + journal->j_max_transaction_buffers =
> + (journal->j_maxlen - journal->j_fc_wbufsize) / 4;
>
> /*
> * As a special case, if the on-disk copy is already marked as needing
> @@ -1872,6 +1889,7 @@ static int load_superblock(journal_t *journal)
> {
> int err;
> journal_superblock_t *sb;
> + int num_fc_blocks;
>
> err = journal_get_superblock(journal);
> if (err)
> @@ -1884,10 +1902,12 @@ static int load_superblock(journal_t *journal)
> journal->j_first = be32_to_cpu(sb->s_first);
> journal->j_errno = be32_to_cpu(sb->s_errno);
>
> - if (jbd2_has_feature_fast_commit(journal) &&
> - journal->j_fc_wbufsize > 0) {
> + if (jbd2_has_feature_fast_commit(journal)) {
> journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
> - journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
> + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> + if (!num_fc_blocks || num_fc_blocks >= journal->j_fc_last)

I think this needs to be stricter - we need the check that the journal is
at least JBD2_MIN_JOURNAL_BLOCKS long (which happens at the beginning of
journal_reset()) to happen after we've subtracted fastcommit blocks...

> + num_fc_blocks = JBD2_MIN_FC_BLOCKS;
> + journal->j_last = journal->j_fc_last - num_fc_blocks;
> journal->j_fc_first = journal->j_last + 1;
> journal->j_fc_off = 0;
> } else {
...
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index eb2606133cd8..822f16cbf9b3 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -134,7 +134,7 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
>
> *bhp = NULL;
>
> - if (offset >= journal->j_maxlen) {
> + if (offset >= journal->j_maxlen + journal->j_fc_wbufsize) {

This looks wrong since j_maxlen is currently including fastcommit blocks...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 16:39:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 05/10] jbd2: fix fast commit journalling APIs

On Sat 31-10-20 13:05:13, Harshad Shirwadkar wrote:
> This patch adds a few misc fixes for jbd2 fast commit functions.
>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Please no "misc fixes" patches. If you have trouble writing good
explanatory changelog, it's usually a sign you're trying to cram too much
into a single commit :). In this case I'd split it into 3 changes:

1) TODO update.
2) Removal of j_state_lock protection (with comment updates)
3) Fix of journal->j_running_transaction->t_tid handling.

> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 9b4e87a0068b..df1285da7276 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -946,7 +946,9 @@ struct journal_s
> * @j_fc_off:
> *
> * Number of fast commit blocks currently allocated.
> - * [j_state_lock].
> + * [j_state_lock]. During the commit path, this variable is not

Please remove the [j_state_lock] annotation when the entry isn't really
protected by j_state_lock... Also I'd maybe rephrase the comment like
"Accessed only during fastcommit, currenly only a single process can
perform fastcommit at a time."

> + * protected by j_state_lock since only one process performs commit
> + * at a time.
> */
> unsigned long j_fc_off;
>
> @@ -1110,7 +1112,9 @@ struct journal_s
>
> /**
> * @j_fc_wbuf: Array of fast commit bhs for
> - * jbd2_journal_commit_transaction.
> + * jbd2_journal_commit_transaction. During the commit path, this
> + * variable is not protected by j_state_lock since only one process
> + * performs commit at a time.
> */
> struct buffer_head **j_fc_wbuf;

Here the bh's aren't really used in jbd2_journal_commit_transaction() are
they? Please fix that when updating the comment. Also I'd find a
reformulation like I suggested for the comment above more comprehensible.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 16:42:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 06/10] ext4: dedpulicate the code to wait on inode that's being committed

On Sat 31-10-20 13:05:14, Harshad Shirwadkar wrote:
> This patch removes the deduplicates the code that implements waiting
> on inode that's being committed. That code is moved into a new
> function.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Looks good to me. Just one nit below:

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index b1ca55c7d32a..0f2543220d1d 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -155,6 +155,28 @@ void ext4_fc_init_inode(struct inode *inode)
> ei->i_fc_committed_subtid = 0;
> }
>
> +static void ext4_fc_wait_committing_inode(struct inode *inode)
> +{
> + wait_queue_head_t *wq;
> + struct ext4_inode_info *ei = EXT4_I(inode);
> +

Maybe add lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock) here to
make sure the function is called properly? It's kind of unobvious
requirement (but hard to avoid)...

> +#if (BITS_PER_LONG < 64)
> + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> + EXT4_STATE_FC_COMMITTING);
> + wq = bit_waitqueue(&ei->i_state_flags,
> + EXT4_STATE_FC_COMMITTING);
> +#else
> + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> + EXT4_STATE_FC_COMMITTING);
> + wq = bit_waitqueue(&ei->i_flags,
> + EXT4_STATE_FC_COMMITTING);
> +#endif
> + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> + schedule();
> + finish_wait(wq, &wait.wq_entry);
> +}
> +

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 16:53:15

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 07/10] ext4: misc fast commit fixes

On Sat 31-10-20 13:05:15, Harshad Shirwadkar wrote:
> This patch adds a small number of misc fast commit fixes. Along with
> functional fixes such as setting the right buffer flags, there also
> typo fixes and comment additions.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Again, please don't merge logically separate fixes into one commit.

> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 00dc668e052b..10855cd230c7 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -422,9 +422,13 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> static inline int ext4_jbd2_inode_add_write(handle_t *handle,
> struct inode *inode, loff_t start_byte, loff_t length)
> {
> - if (ext4_handle_valid(handle))
> + if (ext4_handle_valid(handle)) {
> + ext4_fc_track_range(handle, inode,
> + start_byte >> inode->i_sb->s_blocksize_bits,
> + (start_byte + length) >> inode->i_sb->s_blocksize_bits);
> return jbd2_journal_inode_ranged_write(handle,
> EXT4_I(inode)->jinode, start_byte, length);
> + }

Why this change? A good changelog would tell me... I'm suspicious here
because ext4_jbd2_inode_add_write() gets called only in data=ordered mode
but fastcommit can run also in data=writeback mode...

I suppose this is for the mmap coverage we were speaking about. Now that
I'm speaking about it again maybe the ext4_fc_track_range() call in
ext4_map_blocks() is actually enough? I mean once we allocate blocks for a
range (either from page fault, write, or writeback of delalloc), they will
become properly tracked in ext4_map_blocks() and that's all we need? But
then I'm missing why we have so many ext4_fc_track_range() calls around the
code... Can you please explain?

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 16:59:13

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 08/10] ext4: fix inode dirty check in case of fast commits

On Sat 31-10-20 13:05:16, Harshad Shirwadkar wrote:
> In case of fast commits, determine if the inode is dirty by checking
> if the inode is on fast commit list. This also helps us get rid of
> ext4_inode_info.i_fc_committed_subtid field.
>
> Reported-by: Andrea Righi <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

Nice cleanup and looks good to me! You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/ext4.h | 3 ---
> fs/ext4/fast_commit.c | 3 ---
> fs/ext4/inode.c | 3 +--
> 3 files changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 573db158382f..7222a9ba5d66 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1028,9 +1028,6 @@ struct ext4_inode_info {
> * protected by sbi->s_fc_lock.
> */
>
> - /* Fast commit subtid when this inode was committed */
> - unsigned int i_fc_committed_subtid;
> -
> /* Start of lblk range that needs to be committed in this fast commit */
> ext4_lblk_t i_fc_lblk_start;
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index b7b1fe6dbb24..4c0a3e858ea3 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -152,7 +152,6 @@ void ext4_fc_init_inode(struct inode *inode)
> INIT_LIST_HEAD(&ei->i_fc_list);
> init_waitqueue_head(&ei->i_fc_wait);
> atomic_set(&ei->i_fc_updates, 0);
> - ei->i_fc_committed_subtid = 0;
> }
>
> static void ext4_fc_wait_committing_inode(struct inode *inode)
> @@ -1026,8 +1025,6 @@ static int ext4_fc_perform_commit(journal_t *journal)
> if (ret)
> goto out;
> spin_lock(&sbi->s_fc_lock);
> - EXT4_I(inode)->i_fc_committed_subtid =
> - atomic_read(&sbi->s_fc_subtid);
> }
> spin_unlock(&sbi->s_fc_lock);
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 7f6af784e74f..d36c3908272f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3311,8 +3311,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> EXT4_I(inode)->i_datasync_tid))
> return false;
> if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
> - return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) <
> - EXT4_I(inode)->i_fc_committed_subtid;
> + return !list_empty(&EXT4_I(inode)->i_fc_list);
> return true;
> }
>
> --
> 2.29.1.341.ge80a0c044ae-goog
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 17:02:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 10/10] ext4: issue fsdev cache flush before starting fast commit

On Sat 31-10-20 13:05:18, Harshad Shirwadkar wrote:
> If the journal dev is different from fsdev, issue a cache flush before
> committing fast commit blocks to disk.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/fast_commit.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 9ae8ba213961..facf2f59b895 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -996,6 +996,13 @@ static int ext4_fc_perform_commit(journal_t *journal)
> if (ret)
> return ret;
>
> + /*
> + * If file system device is different from journal device, issue a cache
> + * flush before we start writing fast commit blocks.
> + */
> + if (journal->j_fs_dev != journal->j_dev)
> + blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS);
> +
> blk_start_plug(&plug);
> if (sbi->s_fc_bytes == 0) {
> /*
> --
> 2.29.1.341.ge80a0c044ae-goog
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 17:04:02

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 09/10] ext4: disable fast commit with data journalling

On Sat 31-10-20 13:05:17, Harshad Shirwadkar wrote:
> Fast commits don't work with data journalling. This patch disables the
> fast commit support when data journalling is turned on.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Harshad Shirwadkar <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/fast_commit.c | 7 +++++++
> fs/ext4/fast_commit.h | 1 +
> fs/ext4/super.c | 3 ++-
> include/trace/events/ext4.h | 6 ++++--
> 4 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 4c0a3e858ea3..9ae8ba213961 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -472,6 +472,12 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
> if (S_ISDIR(inode->i_mode))
> return;
>
> + if (ext4_should_journal_data(inode)) {
> + ext4_fc_mark_ineligible(inode->i_sb,
> + EXT4_FC_REASON_INODE_JOURNAL_DATA);
> + return;
> + }
> +
> ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
> trace_ext4_fc_track_inode(inode, ret);
> }
> @@ -2095,6 +2101,7 @@ const char *fc_ineligible_reasons[] = {
> "Resize",
> "Dir renamed",
> "Falloc range op",
> + "Data journalling",
> "FC Commit Failed"
> };
>
> diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> index cde86747faf8..cdb36a10dfd0 100644
> --- a/fs/ext4/fast_commit.h
> +++ b/fs/ext4/fast_commit.h
> @@ -105,6 +105,7 @@ enum {
> EXT4_FC_REASON_RESIZE,
> EXT4_FC_REASON_RENAME_DIR,
> EXT4_FC_REASON_FALLOC_RANGE,
> + EXT4_FC_REASON_INODE_JOURNAL_DATA,
> EXT4_FC_COMMIT_FAILED,
> EXT4_FC_REASON_MAX
> };
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e67d2fa41a78..9333475737ac 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4340,9 +4340,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> #endif
>
> if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) {
> - printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support!\n");
> + printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!\n");
> /* can't mount with both data=journal and dioread_nolock. */
> clear_opt(sb, DIOREAD_NOLOCK);
> + clear_opt2(sb, JOURNAL_FAST_COMMIT);
> if (test_opt2(sb, EXPLICIT_DELALLOC)) {
> ext4_msg(sb, KERN_ERR, "can't mount with "
> "both data=journal and delalloc");
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 239e98014f9b..ee7362f31eb6 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -104,7 +104,8 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B);
> { EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \
> { EXT4_FC_REASON_RESIZE, "RESIZE"}, \
> { EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
> - { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"})
> + { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \
> + { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"})
>
> TRACE_EVENT(ext4_other_inode_update_time,
> TP_PROTO(struct inode *inode, ino_t orig_ino),
> @@ -2917,7 +2918,7 @@ TRACE_EVENT(ext4_fc_stats,
> ),
>
> TP_printk("dev %d:%d fc ineligible reasons:\n"
> - "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; "
> + "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; "
> "num_commits:%ld, ineligible: %ld, numblks: %ld",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
> @@ -2928,6 +2929,7 @@ TRACE_EVENT(ext4_fc_stats,
> FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
> + FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA),
> __entry->sbi->s_fc_stats.fc_num_commits,
> __entry->sbi->s_fc_stats.fc_ineligible_commits,
> __entry->sbi->s_fc_stats.fc_numblks)
> --
> 2.29.1.341.ge80a0c044ae-goog
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-03 18:34:46

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 02/10] ext4: mark fc ineligible if inode gets evictied due to mem pressure

On Tue, Nov 3, 2020 at 6:13 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:10, Harshad Shirwadkar wrote:
> > If inode gets evicted due to memory pressure, we have to remove it
> > from the fast commit list. However, that inode may have uncommitted
> > changes that fast commits will lose. So, just fall back to full
> > commits in this case. Also, rename the fast commit ineligiblity reason
> > from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_CRUNCH" for better
> > expression.
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> ...
>
> > diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> > index 06907d485989..cde86747faf8 100644
> > --- a/fs/ext4/fast_commit.h
> > +++ b/fs/ext4/fast_commit.h
> > @@ -100,7 +100,7 @@ enum {
> > EXT4_FC_REASON_XATTR = 0,
> > EXT4_FC_REASON_CROSS_RENAME,
> > EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
> > - EXT4_FC_REASON_MEM,
> > + EXT4_FC_REASON_MEM_CRUNCH,
>
> Well MEM_CRUNCH doesn't really sound more understandable to me :). I'd
> rather call it MEM_RECLAIM or ENOMEM or something like that...
Okay, ENOMEM sounds good, since this is also used in case of memory
allocation failures.
>
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index b96a18679a27..52ff71236290 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -327,6 +327,7 @@ void ext4_evict_inode(struct inode *inode)
> > ext4_xattr_inode_array_free(ea_inode_array);
> > return;
> > no_delete:
> > + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM_CRUNCH);
> > ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
> > }
>
> This will make fs ineligible on every inode reclaim. Even if the inode was
> clean, not part of any FC. I guess this is too aggressive...
Right, I missed that, so first checking if the inode is on FC list and
then marking the FS as ineligible should suffice?

Thanks,
Harshad
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-11-04 09:53:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 02/10] ext4: mark fc ineligible if inode gets evictied due to mem pressure

On Tue 03-11-20 10:33:47, harshad shirwadkar wrote:
> On Tue, Nov 3, 2020 at 6:13 AM Jan Kara <[email protected]> wrote:
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index b96a18679a27..52ff71236290 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -327,6 +327,7 @@ void ext4_evict_inode(struct inode *inode)
> > > ext4_xattr_inode_array_free(ea_inode_array);
> > > return;
> > > no_delete:
> > > + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MEM_CRUNCH);
> > > ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
> > > }
> >
> > This will make fs ineligible on every inode reclaim. Even if the inode was
> > clean, not part of any FC. I guess this is too aggressive...
> Right, I missed that, so first checking if the inode is on FC list and
> then marking the FS as ineligible should suffice?

Yes, that looks good to me.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-04 20:43:12

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

On Tue, Nov 3, 2020 at 8:29 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:12, Harshad Shirwadkar wrote:
> > This patch cleans up the jbd2_fc_init() API and its related functions
> > to simplify enabling fast commits and configuring the number of blocks
> > that fast commits will use. With this change, the number of fast
> > commit blocks to use is solely determined by the JBD2 layer. However,
> > whether or not to use fast commits is determined by the file
> > system. The file system just calls jbd2_fc_init() to tell the JBD2
> > layer that it is interested in enabling fast commits. JBD2 layer
> > determines how many blocks to use for fast commits (based on the value
> > found in the JBD2 superblock).
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> Thanks for the cleanup. Some comments below...
>
> > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> > index fa688e163a80..353534403769 100644
> > --- a/fs/jbd2/commit.c
> > +++ b/fs/jbd2/commit.c
> > @@ -801,7 +801,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> > if (first_block < journal->j_tail)
> > freed += journal->j_last - journal->j_first;
> > /* Update tail only if we free significant amount of space */
> > - if (freed < journal->j_maxlen / 4)
> > + if (freed < (journal->j_maxlen - journal->j_fc_wbufsize) / 4)
> > update_tail = 0;
>
> This change seems unrelated to the API change in jbd2_fc_init(). Can you
> please separate fix for journal length handling into a separate patch with
> a proper changelog etc.?
Sounds good, will do that.
>
> Also can you perhaps rename j_maxlen to j_total_len to give better hint
> that there may be multiple parts of the journal and provide wrapper
> jbd2_transaction_space(journal) for the
> (journal->j_maxlen - journal->j_fc_wbufsize) expression because that's kind
> of implementation detail of the current fastcommit code.
Ack
>
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 0c7c42bd530f..ea15f55aff5c 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -1357,14 +1357,6 @@ static journal_t *journal_init_common(struct block_device *bdev,
> > if (!journal->j_wbuf)
> > goto err_cleanup;
> >
> > - if (journal->j_fc_wbufsize > 0) {
> > - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > - sizeof(struct buffer_head *),
> > - GFP_KERNEL);
> > - if (!journal->j_fc_wbuf)
> > - goto err_cleanup;
> > - }
> > -
> > bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
> > if (!bh) {
> > pr_err("%s: Cannot get buffer for journal superblock\n",
> > @@ -1378,19 +1370,21 @@ static journal_t *journal_init_common(struct block_device *bdev,
> >
> > err_cleanup:
> > kfree(journal->j_wbuf);
> > - kfree(journal->j_fc_wbuf);
> > jbd2_journal_destroy_revoke(journal);
> > kfree(journal);
> > return NULL;
> > }
> >
> > -int jbd2_fc_init(journal_t *journal, int num_fc_blks)
> > +int jbd2_fc_init(journal_t *journal)
> > {
> > - journal->j_fc_wbufsize = num_fc_blks;
> > - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > - sizeof(struct buffer_head *), GFP_KERNEL);
> > - if (!journal->j_fc_wbuf)
> > - return -ENOMEM;
> > + /*
> > + * Only set j_fc_wbufsize here to indicate that the client file
> > + * system is interested in using fast commits. The actual number of
> > + * fast commit blocks is found inside jbd2_superblock and is only
> > + * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
> > + * gets set in journal_reset().
> > + */
> > + journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
> > return 0;
> > }
>
> When looking at this, is there a reason why jbd2_fc_init() still exists? I
> mean why not just make the rule that the journal has FC block number set
> iff FC gets enabled? Anything else seems a bit confusing to me and also
> dangerous - imagine we have fs with FC running, we write some FCs and then
> crash. Then on system recovery we mount with no_fc mount option. We have
> just lost data on the filesystem AFAIU... So I'd just remove all the mount
> options related to fastcommits and leave everything to the journal setup
> (which can be modified with e2fsprogs if needed) to keep things simple.
The problem is whether or not to use fast commits is the file system's
call. The JBD2 feature flag will be cleared on a clean unmount and if
we rely solely on the JBD2 feature flag, fast commit will be turned
off after a clean unmount. Whereas the FS compat flag is the source of
truth about whether fast commit needs to be used or not. That's why we
need an API for the file system to tell JBD2 to still do fast commits.
Mount options that override the feature flag in Ext4 were mainly meant
for debugging purposes. So, perhaps there should be a clear warning
message in the kernel if any of these options are used? Even if we get
rid of the mount options, we still need the jbd2_fc_init() API for the
FS to tell JBD2 that it wants to use fast commit. Note that even if
jbd2_fc_init() is not called, JBD2 will still try to replay fast
commit blocks.
>
> > EXPORT_SYMBOL(jbd2_fc_init);
> > @@ -1500,7 +1494,7 @@ static void journal_fail_superblock(journal_t *journal)
> > static int journal_reset(journal_t *journal)
> > {
> > journal_superblock_t *sb = journal->j_superblock;
> > - unsigned long long first, last;
> > + unsigned long long first, last, num_fc_blocks;
> >
> > first = be32_to_cpu(sb->s_first);
> > last = be32_to_cpu(sb->s_maxlen);
> > @@ -1513,6 +1507,28 @@ static int journal_reset(journal_t *journal)
> >
> > journal->j_first = first;
> >
> > + /*
> > + * At this point, fast commit recovery has finished. Now, we solely
> > + * rely on the file system to decide whether it wants fast commits
> > + * or not. File system that wishes to use fast commits must have
> > + * already called jbd2_fc_init() before we get here.
> > + */
> > + if (journal->j_fc_wbufsize > 0)
> > + jbd2_journal_set_features(journal, 0, 0,
> > + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> > + else
> > + jbd2_journal_clear_features(journal, 0, 0,
> > + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> > +
> > + /* If valid, prefer the value found in superblock over the default */
> > + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> > + if (num_fc_blocks > 0 && num_fc_blocks < last)
> > + journal->j_fc_wbufsize = num_fc_blocks;
> > +
> > + if (jbd2_has_feature_fast_commit(journal))
> > + journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > + sizeof(struct buffer_head *), GFP_KERNEL);
> > +
> > if (jbd2_has_feature_fast_commit(journal) &&
> > journal->j_fc_wbufsize > 0) {
> > journal->j_fc_last = last;
> > @@ -1531,7 +1547,8 @@ static int journal_reset(journal_t *journal)
> > journal->j_commit_sequence = journal->j_transaction_sequence - 1;
> > journal->j_commit_request = journal->j_commit_sequence;
> >
> > - journal->j_max_transaction_buffers = journal->j_maxlen / 4;
> > + journal->j_max_transaction_buffers =
> > + (journal->j_maxlen - journal->j_fc_wbufsize) / 4;
> >
> > /*
> > * As a special case, if the on-disk copy is already marked as needing
> > @@ -1872,6 +1889,7 @@ static int load_superblock(journal_t *journal)
> > {
> > int err;
> > journal_superblock_t *sb;
> > + int num_fc_blocks;
> >
> > err = journal_get_superblock(journal);
> > if (err)
> > @@ -1884,10 +1902,12 @@ static int load_superblock(journal_t *journal)
> > journal->j_first = be32_to_cpu(sb->s_first);
> > journal->j_errno = be32_to_cpu(sb->s_errno);
> >
> > - if (jbd2_has_feature_fast_commit(journal) &&
> > - journal->j_fc_wbufsize > 0) {
> > + if (jbd2_has_feature_fast_commit(journal)) {
> > journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
> > - journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
> > + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> > + if (!num_fc_blocks || num_fc_blocks >= journal->j_fc_last)
>
> I think this needs to be stricter - we need the check that the journal is
> at least JBD2_MIN_JOURNAL_BLOCKS long (which happens at the beginning of
> journal_reset()) to happen after we've subtracted fastcommit blocks...
So are you saying that with FC, the minimum journal size is
JBD2_MIN_JOURNAL_BLOCKS + JBD2_MIN_FC_BLOCKS? I was assuming that we
will reserve JBD2_MIN_FC_BLOCKS (256) blocks out of the total journal
size. That way the users who rely on the journal size to be 1024
blocks, won't see a difference in journal size even after turning FC
on. But I'm not sure if that's something we should care about.
>
> > + num_fc_blocks = JBD2_MIN_FC_BLOCKS;
> > + journal->j_last = journal->j_fc_last - num_fc_blocks;
> > journal->j_fc_first = journal->j_last + 1;
> > journal->j_fc_off = 0;
> > } else {
> ...
> > diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> > index eb2606133cd8..822f16cbf9b3 100644
> > --- a/fs/jbd2/recovery.c
> > +++ b/fs/jbd2/recovery.c
> > @@ -134,7 +134,7 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
> >
> > *bhp = NULL;
> >
> > - if (offset >= journal->j_maxlen) {
> > + if (offset >= journal->j_maxlen + journal->j_fc_wbufsize) {
>
> This looks wrong since j_maxlen is currently including fastcommit blocks...
Ack,

Thanks,
Harshad
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-11-04 21:34:01

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 05/10] jbd2: fix fast commit journalling APIs

On Tue, Nov 3, 2020 at 8:38 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:13, Harshad Shirwadkar wrote:
> > This patch adds a few misc fixes for jbd2 fast commit functions.
> >
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> Please no "misc fixes" patches. If you have trouble writing good
> explanatory changelog, it's usually a sign you're trying to cram too much
> into a single commit :). In this case I'd split it into 3 changes:
>
> 1) TODO update.
> 2) Removal of j_state_lock protection (with comment updates)
> 3) Fix of journal->j_running_transaction->t_tid handling.
Okay thanks for pointing this out. I'll break this commit into logical
patches for the next version.
>
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 9b4e87a0068b..df1285da7276 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -946,7 +946,9 @@ struct journal_s
> > * @j_fc_off:
> > *
> > * Number of fast commit blocks currently allocated.
> > - * [j_state_lock].
> > + * [j_state_lock]. During the commit path, this variable is not
>
> Please remove the [j_state_lock] annotation when the entry isn't really
> protected by j_state_lock... Also I'd maybe rephrase the comment like
> "Accessed only during fastcommit, currenly only a single process can
> perform fastcommit at a time."
Ack
>
> > + * protected by j_state_lock since only one process performs commit
> > + * at a time.
> > */
> > unsigned long j_fc_off;
> >
> > @@ -1110,7 +1112,9 @@ struct journal_s
> >
> > /**
> > * @j_fc_wbuf: Array of fast commit bhs for
> > - * jbd2_journal_commit_transaction.
> > + * jbd2_journal_commit_transaction. During the commit path, this
> > + * variable is not protected by j_state_lock since only one process
> > + * performs commit at a time.
> > */
> > struct buffer_head **j_fc_wbuf;
>
> Here the bh's aren't really used in jbd2_journal_commit_transaction() are
> they? Please fix that when updating the comment. Also I'd find a
> reformulation like I suggested for the comment above more comprehensible.
Ack,

Thanks,
Harshad
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-11-04 21:40:12

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 06/10] ext4: dedpulicate the code to wait on inode that's being committed

On Tue, Nov 3, 2020 at 8:42 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:14, Harshad Shirwadkar wrote:
> > This patch removes the deduplicates the code that implements waiting
> > on inode that's being committed. That code is moved into a new
> > function.
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> Looks good to me. Just one nit below:
>
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > index b1ca55c7d32a..0f2543220d1d 100644
> > --- a/fs/ext4/fast_commit.c
> > +++ b/fs/ext4/fast_commit.c
> > @@ -155,6 +155,28 @@ void ext4_fc_init_inode(struct inode *inode)
> > ei->i_fc_committed_subtid = 0;
> > }
> >
> > +static void ext4_fc_wait_committing_inode(struct inode *inode)
> > +{
> > + wait_queue_head_t *wq;
> > + struct ext4_inode_info *ei = EXT4_I(inode);
> > +
>
> Maybe add lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock) here to
> make sure the function is called properly? It's kind of unobvious
> requirement (but hard to avoid)...
Sounds good. I had to add it after the #ifdef and before the
"prepare_to_wait()" call in order to avoid "ISO C90 forbids mixed
declarations and code [-Wd
eclaration-after-statement]" warning.

Thanks,
Harshad
>
> > +#if (BITS_PER_LONG < 64)
> > + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#else
> > + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#endif
> > + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> > + schedule();
> > + finish_wait(wq, &wait.wq_entry);
> > +}
> > +
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-11-05 04:39:54

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 03/10] ext4: pass handle to ext4_fc_track_* functions

On Tue, Nov 3, 2020 at 6:46 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:11, Harshad Shirwadkar wrote:
> > Use transaction id found in handle->h_transaction->h_tid for tracking
> > fast commit updates. This patch also restructures ext4_unlink to make
> > handle available inside ext4_unlink for fast commit tracking.
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> Thanks for the patch. Couple of comments below:
>
> > @@ -4651,8 +4652,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> > FALLOC_FL_INSERT_RANGE))
> > return -EOPNOTSUPP;
> > - ext4_fc_track_range(inode, offset >> blkbits,
> > - (offset + len - 1) >> blkbits);
>
> Why do you delete the ext4_fc_track_range() call here?
I realized that all the different ext4_fallocate functions such as
punch_hole, zero_range etc have their own track calls. Collapse_range
and insert_range are fc ineligible operations, and the default
fallocate calls ext4_map_blocks(), so this call here isn't really
needed. I also took a look at all the other calls to
ext4_fc_track_range() and I think we only need ext4_fc_track_range()
calls in following situations:

1) ext4_map_blocks()
2) ext4_punch_hole()
3) ext4_zero_range()
4) ext4_setattr() --> for handling truncate

I'll remove all the other callers in the next version.

>
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > index 354f81ff819d..5c3af472287a 100644
> > --- a/fs/ext4/fast_commit.c
> > +++ b/fs/ext4/fast_commit.c
> > @@ -323,15 +323,18 @@ static inline int ext4_fc_is_ineligible(struct super_block *sb)
> > * If enqueue is set, this function enqueues the inode in fast commit list.
> > */
> > static int ext4_fc_track_template(
> > - struct inode *inode, int (*__fc_track_fn)(struct inode *, void *, bool),
> > + handle_t *handle, struct inode *inode,
> > + int (*__fc_track_fn)(struct inode *, void *, bool),
> > void *args, int enqueue)
> > {
> > - tid_t running_txn_tid;
> > bool update = false;
> > struct ext4_inode_info *ei = EXT4_I(inode);
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > + tid_t tid = 0;
> > int ret;
> >
> > + if (ext4_handle_valid(handle) && handle->h_transaction)
> > + tid = handle->h_transaction->t_tid;
>
> The handle->h_transaction check is pointless here. It is always true. And
> if you move the tid fetching after the JOURNAL_FAST_COMMIT check below, you
> don't need the ext4_handle_valid() check either as fastcommit cannot be
> enabled without a journal.
Ack
>
> > if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
> > (sbi->s_mount_state & EXT4_FC_REPLAY))
> > return -EOPNOTSUPP;
>
>
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index 5159830dacb8..f3f8bf61072e 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -2631,7 +2631,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
> > inode_save = inode;
> > ihold(inode_save);
> > err = ext4_add_nondir(handle, dentry, &inode);
> > - ext4_fc_track_create(inode_save, dentry);
> > + ext4_fc_track_create(handle, inode_save, dentry);
> > iput(inode_save);
> > }
> > if (handle)
> > @@ -2668,7 +2668,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
> > ihold(inode_save);
> > err = ext4_add_nondir(handle, dentry, &inode);
> > if (!err)
> > - ext4_fc_track_create(inode_save, dentry);
> > + ext4_fc_track_create(handle, inode_save, dentry);
> > iput(inode_save);
> > }
>
> Not directly related to this patch but why do you bother with 'inode_save'
> in the above cases? I guess you're afraid by the comment that "inode
> reference is consumed by the dentry" but since you have a dentry reference
> as well, you can be sure that the inode stays around...
Yeah, I think I was confused by that. I'll get rid of inode_save stuff
and just use d_inode(dentry) instead.
>
> > if (handle)
> > @@ -2833,7 +2833,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> > iput(inode);
> > goto out_retry;
> > }
> > - ext4_fc_track_create(inode, dentry);
> > + ext4_fc_track_create(handle, inode, dentry);
> > ext4_inc_count(dir);
>
> And I was also wondering why all the directory tracking functions take both
> dentry and the inode. You can fetch inode from a dentry with d_inode()
> helper so I don't see a reason for passing it separately. That is, in a
> couple of places you call ext4_fc_track_*() before d_instantiate[_new]() so
> dentry isn't fully setup yet but there's nothing which prevents you from
> calling it after d_instantiate().
>
> The only possible exception to this is the ext4_rename() code. There you
> don't have suitable dentry for the link tracking so this would need to
> explicitely pass the inode & dentry. But that place can just call a low
> level wrapper allowing that. All the other places can use a higher level
> function which just takes the dentry.
Yeah, it's the rename that's the problem. Rename calls
ext4_fc_track_link() and ext4_fc_track_unlink(). I'd need to define
two lower level wrappers just for this one function call. But I agree
that it will make the code look much cleaner. I'll do that in V2.
>
> > static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> > {
> > + handle_t *handle;
> > int retval;
> >
> > if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
> > @@ -3282,9 +3273,16 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> > if (retval)
> > goto out_trace;
> >
> > - retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry));
> > + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> > + EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
> > + if (IS_ERR(handle)) {
> > + retval = PTR_ERR(handle);
> > + goto out_trace;
> > + }
> > +
> > + retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry));
> > if (!retval)
> > - ext4_fc_track_unlink(d_inode(dentry), dentry);
> > + ext4_fc_track_unlink(handle, d_inode(dentry), dentry);
> > #ifdef CONFIG_UNICODE
> > /* VFS negative dentries are incompatible with Encoding and
> > * Case-insensitiveness. Eventually we'll want avoid
> > @@ -3295,6 +3293,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> > if (IS_CASEFOLDED(dir))
> > d_invalidate(dentry);
> > #endif
> > + if (handle)
> > + ext4_journal_stop(handle);
>
> How could 'handle' be NULL here?
Ack

Thanks,
Harshad

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2020-11-05 10:31:06

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

On Wed 04-11-20 11:52:24, harshad shirwadkar wrote:
> On Tue, Nov 3, 2020 at 8:29 AM Jan Kara <[email protected]> wrote:
> > > -int jbd2_fc_init(journal_t *journal, int num_fc_blks)
> > > +int jbd2_fc_init(journal_t *journal)
> > > {
> > > - journal->j_fc_wbufsize = num_fc_blks;
> > > - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > > - sizeof(struct buffer_head *), GFP_KERNEL);
> > > - if (!journal->j_fc_wbuf)
> > > - return -ENOMEM;
> > > + /*
> > > + * Only set j_fc_wbufsize here to indicate that the client file
> > > + * system is interested in using fast commits. The actual number of
> > > + * fast commit blocks is found inside jbd2_superblock and is only
> > > + * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
> > > + * gets set in journal_reset().
> > > + */
> > > + journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
> > > return 0;
> > > }
> >
> > When looking at this, is there a reason why jbd2_fc_init() still exists? I
> > mean why not just make the rule that the journal has FC block number set
> > iff FC gets enabled? Anything else seems a bit confusing to me and also
> > dangerous - imagine we have fs with FC running, we write some FCs and then
> > crash. Then on system recovery we mount with no_fc mount option. We have
> > just lost data on the filesystem AFAIU... So I'd just remove all the mount
> > options related to fastcommits and leave everything to the journal setup
> > (which can be modified with e2fsprogs if needed) to keep things simple.
> The problem is whether or not to use fast commits is the file system's
> call. The JBD2 feature flag will be cleared on a clean unmount and if
> we rely solely on the JBD2 feature flag, fast commit will be turned
> off after a clean unmount. Whereas the FS compat flag is the source of
> truth about whether fast commit needs to be used or not. That's why we
> need an API for the file system to tell JBD2 to still do fast commits.

Yes, I meant the API could be just that the filesystem either calls
jbd2_journal_set_features() with FASTCOMMIT feature or it won't. Similarly
to how e.g. JBD2_FEATURE_INCOMPAT_64BIT is handled. No need for
jbd2_fc_init() function AFAICT.

> Mount options that override the feature flag in Ext4 were mainly meant
> for debugging purposes. So, perhaps there should be a clear warning
> message in the kernel if any of these options are used? Even if we get
> rid of the mount options, we still need the jbd2_fc_init() API for the
> FS to tell JBD2 that it wants to use fast commit. Note that even if
> jbd2_fc_init() is not called, JBD2 will still try to replay fast
> commit blocks.



> > > EXPORT_SYMBOL(jbd2_fc_init);
> > > @@ -1500,7 +1494,7 @@ static void journal_fail_superblock(journal_t *journal)
> > > static int journal_reset(journal_t *journal)
> > > {
> > > journal_superblock_t *sb = journal->j_superblock;
> > > - unsigned long long first, last;
> > > + unsigned long long first, last, num_fc_blocks;
> > >
> > > first = be32_to_cpu(sb->s_first);
> > > last = be32_to_cpu(sb->s_maxlen);
> > > @@ -1513,6 +1507,28 @@ static int journal_reset(journal_t *journal)
> > >
> > > journal->j_first = first;
> > >
> > > + /*
> > > + * At this point, fast commit recovery has finished. Now, we solely
> > > + * rely on the file system to decide whether it wants fast commits
> > > + * or not. File system that wishes to use fast commits must have
> > > + * already called jbd2_fc_init() before we get here.
> > > + */
> > > + if (journal->j_fc_wbufsize > 0)
> > > + jbd2_journal_set_features(journal, 0, 0,
> > > + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> > > + else
> > > + jbd2_journal_clear_features(journal, 0, 0,
> > > + JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> > > +
> > > + /* If valid, prefer the value found in superblock over the default */
> > > + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> > > + if (num_fc_blocks > 0 && num_fc_blocks < last)
> > > + journal->j_fc_wbufsize = num_fc_blocks;
> > > +
> > > + if (jbd2_has_feature_fast_commit(journal))
> > > + journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > > + sizeof(struct buffer_head *), GFP_KERNEL);
> > > +
> > > if (jbd2_has_feature_fast_commit(journal) &&
> > > journal->j_fc_wbufsize > 0) {
> > > journal->j_fc_last = last;
> > > @@ -1531,7 +1547,8 @@ static int journal_reset(journal_t *journal)
> > > journal->j_commit_sequence = journal->j_transaction_sequence - 1;
> > > journal->j_commit_request = journal->j_commit_sequence;
> > >
> > > - journal->j_max_transaction_buffers = journal->j_maxlen / 4;
> > > + journal->j_max_transaction_buffers =
> > > + (journal->j_maxlen - journal->j_fc_wbufsize) / 4;
> > >
> > > /*
> > > * As a special case, if the on-disk copy is already marked as needing
> > > @@ -1872,6 +1889,7 @@ static int load_superblock(journal_t *journal)
> > > {
> > > int err;
> > > journal_superblock_t *sb;
> > > + int num_fc_blocks;
> > >
> > > err = journal_get_superblock(journal);
> > > if (err)
> > > @@ -1884,10 +1902,12 @@ static int load_superblock(journal_t *journal)
> > > journal->j_first = be32_to_cpu(sb->s_first);
> > > journal->j_errno = be32_to_cpu(sb->s_errno);
> > >
> > > - if (jbd2_has_feature_fast_commit(journal) &&
> > > - journal->j_fc_wbufsize > 0) {
> > > + if (jbd2_has_feature_fast_commit(journal)) {
> > > journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
> > > - journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
> > > + num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
> > > + if (!num_fc_blocks || num_fc_blocks >= journal->j_fc_last)
> >
> > I think this needs to be stricter - we need the check that the journal is
> > at least JBD2_MIN_JOURNAL_BLOCKS long (which happens at the beginning of
> > journal_reset()) to happen after we've subtracted fastcommit blocks...
> So are you saying that with FC, the minimum journal size is
> JBD2_MIN_JOURNAL_BLOCKS + JBD2_MIN_FC_BLOCKS? I was assuming that we

Yes. JBD2_MIN_JOURNAL_BLOCKS is minimum number of blocks we need for normal
commits to get reasonable behavior. So as you say with fastcommits enabled,
the minimal journal size is JBD2_MIN_JOURNAL_BLOCKS + JBD2_MIN_FC_BLOCKS.

> will reserve JBD2_MIN_FC_BLOCKS (256) blocks out of the total journal
> size. That way the users who rely on the journal size to be 1024
> blocks, won't see a difference in journal size even after turning FC
> on. But I'm not sure if that's something we should care about.

Well, e2fsprogs need to check journal size when enabling fastcommits so
that we don't get invalid configurations.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-05 12:45:49

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

On Thu 05-11-20 11:30:24, Jan Kara wrote:
> On Wed 04-11-20 11:52:24, harshad shirwadkar wrote:
> > On Tue, Nov 3, 2020 at 8:29 AM Jan Kara <[email protected]> wrote:
> > > > -int jbd2_fc_init(journal_t *journal, int num_fc_blks)
> > > > +int jbd2_fc_init(journal_t *journal)
> > > > {
> > > > - journal->j_fc_wbufsize = num_fc_blks;
> > > > - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > > > - sizeof(struct buffer_head *), GFP_KERNEL);
> > > > - if (!journal->j_fc_wbuf)
> > > > - return -ENOMEM;
> > > > + /*
> > > > + * Only set j_fc_wbufsize here to indicate that the client file
> > > > + * system is interested in using fast commits. The actual number of
> > > > + * fast commit blocks is found inside jbd2_superblock and is only
> > > > + * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
> > > > + * gets set in journal_reset().
> > > > + */
> > > > + journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
> > > > return 0;
> > > > }
> > >
> > > When looking at this, is there a reason why jbd2_fc_init() still exists? I
> > > mean why not just make the rule that the journal has FC block number set
> > > iff FC gets enabled? Anything else seems a bit confusing to me and also
> > > dangerous - imagine we have fs with FC running, we write some FCs and then
> > > crash. Then on system recovery we mount with no_fc mount option. We have
> > > just lost data on the filesystem AFAIU... So I'd just remove all the mount
> > > options related to fastcommits and leave everything to the journal setup
> > > (which can be modified with e2fsprogs if needed) to keep things simple.
> > The problem is whether or not to use fast commits is the file system's
> > call. The JBD2 feature flag will be cleared on a clean unmount and if
> > we rely solely on the JBD2 feature flag, fast commit will be turned
> > off after a clean unmount. Whereas the FS compat flag is the source of
> > truth about whether fast commit needs to be used or not. That's why we
> > need an API for the file system to tell JBD2 to still do fast commits.
>
> Yes, I meant the API could be just that the filesystem either calls
> jbd2_journal_set_features() with FASTCOMMIT feature or it won't. Similarly
> to how e.g. JBD2_FEATURE_INCOMPAT_64BIT is handled. No need for
> jbd2_fc_init() function AFAICT.
>
> > Mount options that override the feature flag in Ext4 were mainly meant
> > for debugging purposes. So, perhaps there should be a clear warning
> > message in the kernel if any of these options are used? Even if we get
> > rid of the mount options, we still need the jbd2_fc_init() API for the
> > FS to tell JBD2 that it wants to use fast commit. Note that even if
> > jbd2_fc_init() is not called, JBD2 will still try to replay fast
> > commit blocks.

I forgot to add here: I don't like "debug-only" mount options in production
kernels because users tend to try them out and:
a) occasionally get burnt by unexpected behavior
b) the options become hard to get rid of because someone starts to depend
on them.

So I'd prefer that the options are removed unless they are really essential
for debugging the feature and if they are essential, they should be clearly
marked as debug aid... (e.g. with debug in the name or so).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-11-06 03:03:33

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 04/10] ext4: clean up the JBD2 API that initializes fast commits

Thanks Jan and Amy for the feedback. In V2, I'll drop "no_fc" mount
option which was very confusing. Also, we have a mount option called
"fc_debug_force" that forcefully turns fast commits on even if it was
not enabled by mke2fs. It's handy for running performance tests
without relying on e2fsprogs. But I understand that this option could
also be confusing. There's "debug" in its name and I will also move it
inside #ifdef CONFIG_EXT4_DEBUG so that for production, this gets
compiled out.

On Thu, Nov 5, 2020 at 5:30 AM Amy Parker <[email protected]> wrote:
>
> On Thu, Nov 5, 2020, 4:45 AM Jan Kara <[email protected]> wrote:
>>
>> On Thu 05-11-20 11:30:24, Jan Kara wrote:
>> > On Wed 04-11-20 11:52:24, harshad shirwadkar wrote:
>> > > On Tue, Nov 3, 2020 at 8:29 AM Jan Kara <[email protected]> wrote:
>> > > > > -int jbd2_fc_init(journal_t *journal, int num_fc_blks)
>> > > > > +int jbd2_fc_init(journal_t *journal)
>> > > > > {
>> > > > > - journal->j_fc_wbufsize = num_fc_blks;
>> > > > > - journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
>> > > > > - sizeof(struct buffer_head *), GFP_KERNEL);
>> > > > > - if (!journal->j_fc_wbuf)
>> > > > > - return -ENOMEM;
>> > > > > + /*
>> > > > > + * Only set j_fc_wbufsize here to indicate that the client file
>> > > > > + * system is interested in using fast commits. The actual number of
>> > > > > + * fast commit blocks is found inside jbd2_superblock and is only
>> > > > > + * valid if j_fc_wbufsize is non-zero. The real value of j_fc_wbufsize
>> > > > > + * gets set in journal_reset().
>> > > > > + */
>> > > > > + journal->j_fc_wbufsize = JBD2_MIN_FC_BLOCKS;
>> > > > > return 0;
>> > > > > }
>> > > >
>> > > > When looking at this, is there a reason why jbd2_fc_init() still exists? I
>> > > > mean why not just make the rule that the journal has FC block number set
>> > > > iff FC gets enabled? Anything else seems a bit confusing to me and also
>> > > > dangerous - imagine we have fs with FC running, we write some FCs and then
>> > > > crash. Then on system recovery we mount with no_fc mount option. We have
>> > > > just lost data on the filesystem AFAIU... So I'd just remove all the mount
>> > > > options related to fastcommits and leave everything to the journal setup
>> > > > (which can be modified with e2fsprogs if needed) to keep things simple.
>> > > The problem is whether or not to use fast commits is the file system's
>> > > call. The JBD2 feature flag will be cleared on a clean unmount and if
>> > > we rely solely on the JBD2 feature flag, fast commit will be turned
>> > > off after a clean unmount. Whereas the FS compat flag is the source of
>> > > truth about whether fast commit needs to be used or not. That's why we
>> > > need an API for the file system to tell JBD2 to still do fast commits.
>> >
>> > Yes, I meant the API could be just that the filesystem either calls
>> > jbd2_journal_set_features() with FASTCOMMIT feature or it won't. Similarly
>> > to how e.g. JBD2_FEATURE_INCOMPAT_64BIT is handled. No need for
>> > jbd2_fc_init() function AFAICT.
Sounds good, I'll drop it.

Thanks,
Harshad
>> >
>> > > Mount options that override the feature flag in Ext4 were mainly meant
>> > > for debugging purposes. So, perhaps there should be a clear warning
>> > > message in the kernel if any of these options are used? Even if we get
>> > > rid of the mount options, we still need the jbd2_fc_init() API for the
>> > > FS to tell JBD2 that it wants to use fast commit. Note that even if
>> > > jbd2_fc_init() is not called, JBD2 will still try to replay fast
>> > > commit blocks.
>>
>> I forgot to add here: I don't like "debug-only" mount options in production
>> kernels because users tend to try them out and:
>> a) occasionally get burnt by unexpected behavior
>
>
> Seen that happen enough times myself, there's a
> reason I always leave notes to users of systems I
> set up to never turn on debug-only mode in day to
>
>> b) the options become hard to get rid of because someone starts to depend
>> on them.
>
>
> This occurs not just with kernel things but with all
> software in general. Leaving options in long-term
> that then need to be removed gets problematic
> pretty quickly.
>
>>
>> So I'd prefer that the options are removed unless they are really essential
>> for debugging the feature
>
>
> Yeah - remove them if we can, if they aren't essential
> then no need to keep them.
>
>> and if they are essential, they should be clearly
>> marked as debug aid... (e.g. with debug in the name or so).
>
>
> At *least* that if not more.
>
>>
>> Honza
>>
>> --
>> Jan Kara <[email protected]>
>> SUSE Labs, CR
>
>
> Best regards,
> Amy Parker
> (they/them)

2020-11-06 03:08:42

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH 07/10] ext4: misc fast commit fixes

On Tue, Nov 3, 2020 at 8:52 AM Jan Kara <[email protected]> wrote:
>
> On Sat 31-10-20 13:05:15, Harshad Shirwadkar wrote:
> > This patch adds a small number of misc fast commit fixes. Along with
> > functional fixes such as setting the right buffer flags, there also
> > typo fixes and comment additions.
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Harshad Shirwadkar <[email protected]>
>
> Again, please don't merge logically separate fixes into one commit.
Sure, I'll break this up.
>
> > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> > index 00dc668e052b..10855cd230c7 100644
> > --- a/fs/ext4/ext4_jbd2.h
> > +++ b/fs/ext4/ext4_jbd2.h
> > @@ -422,9 +422,13 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> > static inline int ext4_jbd2_inode_add_write(handle_t *handle,
> > struct inode *inode, loff_t start_byte, loff_t length)
> > {
> > - if (ext4_handle_valid(handle))
> > + if (ext4_handle_valid(handle)) {
> > + ext4_fc_track_range(handle, inode,
> > + start_byte >> inode->i_sb->s_blocksize_bits,
> > + (start_byte + length) >> inode->i_sb->s_blocksize_bits);
> > return jbd2_journal_inode_ranged_write(handle,
> > EXT4_I(inode)->jinode, start_byte, length);
> > + }
>
> Why this change? A good changelog would tell me... I'm suspicious here
> because ext4_jbd2_inode_add_write() gets called only in data=ordered mode
> but fastcommit can run also in data=writeback mode...
>
> I suppose this is for the mmap coverage we were speaking about. Now that
> I'm speaking about it again maybe the ext4_fc_track_range() call in
> ext4_map_blocks() is actually enough? I mean once we allocate blocks for a
> range (either from page fault, write, or writeback of delalloc), they will
> become properly tracked in ext4_map_blocks() and that's all we need? But
> then I'm missing why we have so many ext4_fc_track_range() calls around the
> code... Can you please explain?
You are right. I was trying to handle the mmap case as we discussed on
the previous patch set. But as I mentioned in one of the other patches
in this series, I scanned all the callers of ext4_fc_track_range() and
realized that there were a few redundant callers of this function.
Ultimately, this function needs to get called only when blocks are
added or removed from an inode. So the places where this call remains
are - fallocate ops, ext4_map_blocks, truncate.

Thanks,
Harshad
>
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR