LinuxLists.cc - [PATCH 0/6] ext4/jbd2: inline

2022-01-13 07:46:52

Subject: [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups

Hellos,

Patch[1]: fixes BUG_ON with inline_data which was reported [1] with generic/475.

Patch[2]: is mostly cleanup found during code review of inline_data code.

Patch[3]: is a possible memory corruption fix in case of krealloc failure.

Patch[4-5]: Cleanups.

Patch[6]: Needs careful review. As it gets rid of t_handle_lock spinlock
in jbd2_journal_wait_updates(). From the code review I found it to be not
required. But let me know if I missed anything here.

[1]: https://lore.kernel.org/linux-ext4/[email protected]/

Ritesh Harjani (6):
ext4: Fix error handling in ext4_restore_inline_data()
ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()
ext4: Fix error handling in ext4_fc_record_modified_inode()
jbd2: Cleanup unused functions declarations from jbd2.h
jbd2: Refactor wait logic for transaction updates into a common function
jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
fs/ext4/inline.c | 23 +++++++++-------
fs/jbd2/commit.c | 19 ++-----------
fs/jbd2/transaction.c | 24 ++--------------
include/linux/jbd2.h | 34 +++++++++++++++++------
5 files changed, 74 insertions(+), 90 deletions(-)

--
2.31.1

2022-01-13 07:46:56

by Ritesh Harjani

[permalink] [raw]

Subject: [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()

ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
and returns -ENOSPC. So there is no need to check it twice within
ext4_da_write_inline_data_begin(). This patch removes the extra check.

It also makes it more clean.

No functionality change in this patch.

Signed-off-by: Ritesh Harjani <[email protected]>
---
fs/ext4/inline.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 31741e8a462e..c52b0037983d 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -913,7 +913,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
struct page **pagep,
void **fsdata)
{
- int ret, inline_size;
+ int ret;
handle_t *handle;
struct page *page;
struct ext4_iloc iloc;
@@ -930,14 +930,9 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
goto out;
}

- inline_size = ext4_get_max_inline_size(inode);
-
- ret = -ENOSPC;
- if (inline_size >= pos + len) {
- ret = ext4_prepare_inline_data(handle, inode, pos + len);
- if (ret && ret != -ENOSPC)
- goto out_journal;
- }
+ ret = ext4_prepare_inline_data(handle, inode, pos + len);
+ if (ret && ret != -ENOSPC)
+ goto out_journal;

/*
* We cannot recurse into the filesystem as the transaction
--
2.31.1

2022-01-13 07:46:58

by Ritesh Harjani

[permalink] [raw]

Subject: [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h

During code review found no references of few of these below function
declarations. This patch cleans those up from jbd2.h

Signed-off-by: Ritesh Harjani <[email protected]>
---
include/linux/jbd2.h | 7 -------
1 file changed, 7 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index fd933c45281a..f76598265896 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1419,9 +1419,7 @@ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *);
extern bool __jbd2_journal_refile_buffer(struct journal_head *);
extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *);
extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_free_buffer(struct journal_head *bh);
extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_clean_data_list(transaction_t *transaction);
static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh)
{
list_add_tail(&bh->b_assoc_buffers, head);
@@ -1486,9 +1484,6 @@ extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
struct buffer_head **bh_out,
sector_t blocknr);

-/* Transaction locking */
-extern void __wait_on_journal (journal_t *);
-
/* Transaction cache support */
extern void jbd2_journal_destroy_transaction_cache(void);
extern int __init jbd2_journal_init_transaction_cache(void);
@@ -1774,8 +1769,6 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
#define BJ_Reserved 4 /* Buffer is reserved for access by journal */
#define BJ_Types 5

-extern int jbd_blocks_per_page(struct inode *inode);
-
/* JBD uses a CRC32 checksum */
#define JBD_MAX_CHECKSUM_SIZE 4

--
2.31.1

2022-01-13 07:47:04

by Ritesh Harjani

[permalink] [raw]

Subject: [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode()

Current code does not fully takes care of krealloc() error case,
which could lead to silent memory corruption or a kernel bug.
This patch fixes that.

Also it cleans up some duplicated error handling logic from various functions
in fast_commit.c file.

Reported-by: luo penghao <[email protected]>
Suggested-by: Lukas Czerner <[email protected]>
Signed-off-by: Ritesh Harjani <[email protected]>
---
fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
1 file changed, 29 insertions(+), 35 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 5ae8026a0c56..4541c8468c01 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1392,14 +1392,15 @@ static int ext4_fc_record_modified_inode(struct super_block *sb, int ino)
if (state->fc_modified_inodes[i] == ino)
return 0;
if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) {
- state->fc_modified_inodes_size +=
- EXT4_FC_REPLAY_REALLOC_INCREMENT;
state->fc_modified_inodes = krealloc(
- state->fc_modified_inodes, sizeof(int) *
- state->fc_modified_inodes_size,
- GFP_KERNEL);
+ state->fc_modified_inodes,
+ sizeof(int) * (state->fc_modified_inodes_size +
+ EXT4_FC_REPLAY_REALLOC_INCREMENT),
+ GFP_KERNEL);
if (!state->fc_modified_inodes)
return -ENOMEM;
+ state->fc_modified_inodes_size +=
+ EXT4_FC_REPLAY_REALLOC_INCREMENT;
}
state->fc_modified_inodes[state->fc_modified_inodes_used++] = ino;
return 0;
@@ -1431,7 +1432,9 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
}
inode = NULL;

- ext4_fc_record_modified_inode(sb, ino);
+ ret = ext4_fc_record_modified_inode(sb, ino);
+ if (ret)
+ goto out;

raw_fc_inode = (struct ext4_inode *)
(val + offsetof(struct ext4_fc_inode, fc_raw_inode));
@@ -1621,6 +1624,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
}

ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
+ if (ret)
+ goto out;

start = le32_to_cpu(ex->ee_block);
start_pblk = ext4_ext_pblock(ex);
@@ -1638,18 +1643,14 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
map.m_pblk = 0;
ret = ext4_map_blocks(NULL, inode, &map, 0);

- if (ret < 0) {
- iput(inode);
- return 0;
- }
+ if (ret < 0)
+ goto out;

if (ret == 0) {
/* Range is not mapped */
path = ext4_find_extent(inode, cur, NULL, 0);
- if (IS_ERR(path)) {
- iput(inode);
- return 0;
- }
+ if (IS_ERR(path))
+ goto out;
memset(&newex, 0, sizeof(newex));
newex.ee_block = cpu_to_le32(cur);
ext4_ext_store_pblock(
@@ -1663,10 +1664,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
up_write((&EXT4_I(inode)->i_data_sem));
ext4_ext_drop_refs(path);
kfree(path);
- if (ret) {
- iput(inode);
- return 0;
- }
+ if (ret)
+ goto out;
goto next;
}

@@ -1679,10 +1678,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
ext4_ext_is_unwritten(ex),
start_pblk + cur - start);
- if (ret) {
- iput(inode);
- return 0;
- }
+ if (ret)
+ goto out;
/*
* Mark the old blocks as free since they aren't used
* anymore. We maintain an array of all the modified
@@ -1702,10 +1699,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
ext4_ext_is_unwritten(ex), map.m_pblk);
ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
ext4_ext_is_unwritten(ex), map.m_pblk);
- if (ret) {
- iput(inode);
- return 0;
- }
+ if (ret)
+ goto out;
/*
* We may have split the extent tree while toggling the state.
* Try to shrink the extent tree now.
@@ -1717,6 +1712,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
}
ext4_ext_replay_shrink_inode(inode, i_size_read(inode) >>
sb->s_blocksize_bits);
+out:
iput(inode);
return 0;
}
@@ -1746,6 +1742,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
}

ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
+ if (ret)
+ goto out;

jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n",
inode->i_ino, le32_to_cpu(lrange.fc_lblk),
@@ -1755,10 +1753,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
map.m_len = remaining;

ret = ext4_map_blocks(NULL, inode, &map, 0);
- if (ret < 0) {
- iput(inode);
- return 0;
- }
+ if (ret < 0)
+ goto out;
if (ret > 0) {
remaining -= ret;
cur += ret;
@@ -1773,15 +1769,13 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
lrange.fc_lblk + lrange.fc_len - 1);
up_write(&EXT4_I(inode)->i_data_sem);
- if (ret) {
- iput(inode);
- return 0;
- }
+ if (ret)
+ goto out;
ext4_ext_replay_shrink_inode(inode,
i_size_read(inode) >> sb->s_blocksize_bits);
ext4_mark_inode_dirty(NULL, inode);
+out:
iput(inode);
-
return 0;
}

--
2.31.1

2022-01-13 07:48:13

by Ritesh Harjani

[permalink] [raw]

Subject: [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function

No functionality change as such in this patch. This only refactors the
common piece of code which waits for t_updates to finish into a common
function named as jbd2_journal_wait_updates(journal_t *)

Signed-off-by: Ritesh Harjani <[email protected]>
---
fs/jbd2/commit.c | 19 +++----------------
fs/jbd2/transaction.c | 24 +++---------------------
include/linux/jbd2.h | 31 ++++++++++++++++++++++++++++++-
3 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3cc4ab2ba7f4..428364f107be 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -484,22 +484,9 @@ void jbd2_journal_commit_transaction(journal_t *journal)
stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
stats.run.rs_locked);

- spin_lock(&commit_transaction->t_handle_lock);
- while (atomic_read(&commit_transaction->t_updates)) {
- DEFINE_WAIT(wait);
+ // waits for any t_updates to finish
+ jbd2_journal_wait_updates(journal);

- prepare_to_wait(&journal->j_wait_updates, &wait,
- TASK_UNINTERRUPTIBLE);
- if (atomic_read(&commit_transaction->t_updates)) {
- spin_unlock(&commit_transaction->t_handle_lock);
- write_unlock(&journal->j_state_lock);
- schedule();
- write_lock(&journal->j_state_lock);
- spin_lock(&commit_transaction->t_handle_lock);
- }
- finish_wait(&journal->j_wait_updates, &wait);
- }
- spin_unlock(&commit_transaction->t_handle_lock);
commit_transaction->t_state = T_SWITCH;
write_unlock(&journal->j_state_lock);

@@ -817,7 +804,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
commit_transaction->t_state = T_COMMIT_DFLUSH;
write_unlock(&journal->j_state_lock);

- /*
+ /*
* If the journal is not located on the file system device,
* then we must flush the file system device before we issue
* the commit record
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 6a3caedd2285..89a955ab1557 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -449,7 +449,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
}

/* OK, account for the buffers that this operation expects to
- * use and add the handle to the running transaction.
+ * use and add the handle to the running transaction.
*/
update_t_max_wait(transaction, ts);
handle->h_transaction = transaction;
@@ -863,27 +863,9 @@ void jbd2_journal_lock_updates(journal_t *journal)
write_lock(&journal->j_state_lock);
}

- /* Wait until there are no running updates */
- while (1) {
- transaction_t *transaction = journal->j_running_transaction;
+ /* Wait until there are no running t_updates */
+ jbd2_journal_wait_updates(journal);

- if (!transaction)
- break;
-
- spin_lock(&transaction->t_handle_lock);
- prepare_to_wait(&journal->j_wait_updates, &wait,
- TASK_UNINTERRUPTIBLE);
- if (!atomic_read(&transaction->t_updates)) {
- spin_unlock(&transaction->t_handle_lock);
- finish_wait(&journal->j_wait_updates, &wait);
- break;
- }
- spin_unlock(&transaction->t_handle_lock);
- write_unlock(&journal->j_state_lock);
- schedule();
- finish_wait(&journal->j_wait_updates, &wait);
- write_lock(&journal->j_state_lock);
- }
write_unlock(&journal->j_state_lock);

/*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f76598265896..34b051aa9009 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -594,7 +594,7 @@ struct transaction_s
*/
unsigned long t_log_start;

- /*
+ /*
* Number of buffers on the t_buffers list [j_list_lock, no locks
* needed for jbd2 thread]
*/
@@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
return max_t(long, free, 0);
}

+/*
+ * Waits for any outstanding t_updates to finish.
+ * This is called with write j_state_lock held.
+ */
+static inline void jbd2_journal_wait_updates(journal_t *journal)
+{
+ transaction_t *commit_transaction = journal->j_running_transaction;
+
+ if (!commit_transaction)
+ return;
+
+ spin_lock(&commit_transaction->t_handle_lock);
+ while (atomic_read(&commit_transaction->t_updates)) {
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(&journal->j_wait_updates, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (atomic_read(&commit_transaction->t_updates)) {
+ spin_unlock(&commit_transaction->t_handle_lock);
+ write_unlock(&journal->j_state_lock);
+ schedule();
+ write_lock(&journal->j_state_lock);
+ spin_lock(&commit_transaction->t_handle_lock);
+ }
+ finish_wait(&journal->j_wait_updates, &wait);
+ }
+ spin_unlock(&commit_transaction->t_handle_lock);
+}
+
/*
* Definitions which augment the buffer_head layer
*/
--
2.31.1

2022-01-13 07:51:09

by Ritesh Harjani

[permalink] [raw]

Subject: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
variable. So from code review it looks like we don't need to use
t_handle_lock spinlock for checking t_updates value.
Hence this patch gets rid of the spinlock protection in
jbd2_journal_wait_updates()

Signed-off-by: Ritesh Harjani <[email protected]>
---
include/linux/jbd2.h | 4 ----
1 file changed, 4 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 34b051aa9009..9bef47622b9d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
if (!commit_transaction)
return;

- spin_lock(&commit_transaction->t_handle_lock);
while (atomic_read(&commit_transaction->t_updates)) {
DEFINE_WAIT(wait);

prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
if (atomic_read(&commit_transaction->t_updates)) {
- spin_unlock(&commit_transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
write_lock(&journal->j_state_lock);
- spin_lock(&commit_transaction->t_handle_lock);
}
finish_wait(&journal->j_wait_updates, &wait);
}
- spin_unlock(&commit_transaction->t_handle_lock);
}

/*
--
2.31.1

2022-01-13 17:04:02

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()

On Thu 13-01-22 08:56:25, Ritesh Harjani wrote:
> ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
> and returns -ENOSPC. So there is no need to check it twice within
> ext4_da_write_inline_data_begin(). This patch removes the extra check.
>
> It also makes it more clean.
>
> No functionality change in this patch.
>
> Signed-off-by: Ritesh Harjani <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inline.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 31741e8a462e..c52b0037983d 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -913,7 +913,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
> struct page **pagep,
> void **fsdata)
> {
> - int ret, inline_size;
> + int ret;
> handle_t *handle;
> struct page *page;
> struct ext4_iloc iloc;
> @@ -930,14 +930,9 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
> goto out;
> }
>
> - inline_size = ext4_get_max_inline_size(inode);
> -
> - ret = -ENOSPC;
> - if (inline_size >= pos + len) {
> - ret = ext4_prepare_inline_data(handle, inode, pos + len);
> - if (ret && ret != -ENOSPC)
> - goto out_journal;
> - }
> + ret = ext4_prepare_inline_data(handle, inode, pos + len);
> + if (ret && ret != -ENOSPC)
> + goto out_journal;
>
> /*
> * We cannot recurse into the filesystem as the transaction
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-01-13 17:04:02

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode()

On Thu 13-01-22 08:56:26, Ritesh Harjani wrote:
> Current code does not fully takes care of krealloc() error case,
> which could lead to silent memory corruption or a kernel bug.
> This patch fixes that.
>
> Also it cleans up some duplicated error handling logic from various functions
> in fast_commit.c file.
>
> Reported-by: luo penghao <[email protected]>
> Suggested-by: Lukas Czerner <[email protected]>
> Signed-off-by: Ritesh Harjani <[email protected]>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
> 1 file changed, 29 insertions(+), 35 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 5ae8026a0c56..4541c8468c01 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1392,14 +1392,15 @@ static int ext4_fc_record_modified_inode(struct super_block *sb, int ino)
> if (state->fc_modified_inodes[i] == ino)
> return 0;
> if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) {
> - state->fc_modified_inodes_size +=
> - EXT4_FC_REPLAY_REALLOC_INCREMENT;
> state->fc_modified_inodes = krealloc(
> - state->fc_modified_inodes, sizeof(int) *
> - state->fc_modified_inodes_size,
> - GFP_KERNEL);
> + state->fc_modified_inodes,
> + sizeof(int) * (state->fc_modified_inodes_size +
> + EXT4_FC_REPLAY_REALLOC_INCREMENT),
> + GFP_KERNEL);
> if (!state->fc_modified_inodes)
> return -ENOMEM;
> + state->fc_modified_inodes_size +=
> + EXT4_FC_REPLAY_REALLOC_INCREMENT;
> }
> state->fc_modified_inodes[state->fc_modified_inodes_used++] = ino;
> return 0;
> @@ -1431,7 +1432,9 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
> }
> inode = NULL;
>
> - ext4_fc_record_modified_inode(sb, ino);
> + ret = ext4_fc_record_modified_inode(sb, ino);
> + if (ret)
> + goto out;
>
> raw_fc_inode = (struct ext4_inode *)
> (val + offsetof(struct ext4_fc_inode, fc_raw_inode));
> @@ -1621,6 +1624,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> }
>
> ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
> + if (ret)
> + goto out;
>
> start = le32_to_cpu(ex->ee_block);
> start_pblk = ext4_ext_pblock(ex);
> @@ -1638,18 +1643,14 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> map.m_pblk = 0;
> ret = ext4_map_blocks(NULL, inode, &map, 0);
>
> - if (ret < 0) {
> - iput(inode);
> - return 0;
> - }
> + if (ret < 0)
> + goto out;
>
> if (ret == 0) {
> /* Range is not mapped */
> path = ext4_find_extent(inode, cur, NULL, 0);
> - if (IS_ERR(path)) {
> - iput(inode);
> - return 0;
> - }
> + if (IS_ERR(path))
> + goto out;
> memset(&newex, 0, sizeof(newex));
> newex.ee_block = cpu_to_le32(cur);
> ext4_ext_store_pblock(
> @@ -1663,10 +1664,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> up_write((&EXT4_I(inode)->i_data_sem));
> ext4_ext_drop_refs(path);
> kfree(path);
> - if (ret) {
> - iput(inode);
> - return 0;
> - }
> + if (ret)
> + goto out;
> goto next;
> }
>
> @@ -1679,10 +1678,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
> ext4_ext_is_unwritten(ex),
> start_pblk + cur - start);
> - if (ret) {
> - iput(inode);
> - return 0;
> - }
> + if (ret)
> + goto out;
> /*
> * Mark the old blocks as free since they aren't used
> * anymore. We maintain an array of all the modified
> @@ -1702,10 +1699,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> ext4_ext_is_unwritten(ex), map.m_pblk);
> ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
> ext4_ext_is_unwritten(ex), map.m_pblk);
> - if (ret) {
> - iput(inode);
> - return 0;
> - }
> + if (ret)
> + goto out;
> /*
> * We may have split the extent tree while toggling the state.
> * Try to shrink the extent tree now.
> @@ -1717,6 +1712,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
> }
> ext4_ext_replay_shrink_inode(inode, i_size_read(inode) >>
> sb->s_blocksize_bits);
> +out:
> iput(inode);
> return 0;
> }
> @@ -1746,6 +1742,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
> }
>
> ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
> + if (ret)
> + goto out;
>
> jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n",
> inode->i_ino, le32_to_cpu(lrange.fc_lblk),
> @@ -1755,10 +1753,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
> map.m_len = remaining;
>
> ret = ext4_map_blocks(NULL, inode, &map, 0);
> - if (ret < 0) {
> - iput(inode);
> - return 0;
> - }
> + if (ret < 0)
> + goto out;
> if (ret > 0) {
> remaining -= ret;
> cur += ret;
> @@ -1773,15 +1769,13 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
> ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
> lrange.fc_lblk + lrange.fc_len - 1);
> up_write(&EXT4_I(inode)->i_data_sem);
> - if (ret) {
> - iput(inode);
> - return 0;
> - }
> + if (ret)
> + goto out;
> ext4_ext_replay_shrink_inode(inode,
> i_size_read(inode) >> sb->s_blocksize_bits);
> ext4_mark_inode_dirty(NULL, inode);
> +out:
> iput(inode);
> -
> return 0;
> }
>
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-01-13 17:04:19

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h

On Thu 13-01-22 08:56:27, Ritesh Harjani wrote:
> During code review found no references of few of these below function
> declarations. This patch cleans those up from jbd2.h
>
> Signed-off-by: Ritesh Harjani <[email protected]>

Spring cleaning :). Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> include/linux/jbd2.h | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index fd933c45281a..f76598265896 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1419,9 +1419,7 @@ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *);
> extern bool __jbd2_journal_refile_buffer(struct journal_head *);
> extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *);
> extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
> -extern void __journal_free_buffer(struct journal_head *bh);
> extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
> -extern void __journal_clean_data_list(transaction_t *transaction);
> static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh)
> {
> list_add_tail(&bh->b_assoc_buffers, head);
> @@ -1486,9 +1484,6 @@ extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
> struct buffer_head **bh_out,
> sector_t blocknr);
>
> -/* Transaction locking */
> -extern void __wait_on_journal (journal_t *);
> -
> /* Transaction cache support */
> extern void jbd2_journal_destroy_transaction_cache(void);
> extern int __init jbd2_journal_init_transaction_cache(void);
> @@ -1774,8 +1769,6 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
> #define BJ_Reserved 4 /* Buffer is reserved for access by journal */
> #define BJ_Types 5
>
> -extern int jbd_blocks_per_page(struct inode *inode);
> -
> /* JBD uses a CRC32 checksum */
> #define JBD_MAX_CHECKSUM_SIZE 4
>
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-01-13 17:16:21

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function

On Thu 13-01-22 08:56:28, Ritesh Harjani wrote:
> No functionality change as such in this patch. This only refactors the
> common piece of code which waits for t_updates to finish into a common
> function named as jbd2_journal_wait_updates(journal_t *)
>
> Signed-off-by: Ritesh Harjani <[email protected]>

Just one nit, otherwise. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

> @@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
> return max_t(long, free, 0);
> }
>
> +/*
> + * Waits for any outstanding t_updates to finish.
> + * This is called with write j_state_lock held.
> + */
> +static inline void jbd2_journal_wait_updates(journal_t *journal)
> +{
> + transaction_t *commit_transaction = journal->j_running_transaction;
> +
> + if (!commit_transaction)
> + return;
> +
> + spin_lock(&commit_transaction->t_handle_lock);
> + while (atomic_read(&commit_transaction->t_updates)) {
> + DEFINE_WAIT(wait);
> +
> + prepare_to_wait(&journal->j_wait_updates, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (atomic_read(&commit_transaction->t_updates)) {
> + spin_unlock(&commit_transaction->t_handle_lock);
> + write_unlock(&journal->j_state_lock);
> + schedule();
> + write_lock(&journal->j_state_lock);
> + spin_lock(&commit_transaction->t_handle_lock);
> + }
> + finish_wait(&journal->j_wait_updates, &wait);
> + }
> + spin_unlock(&commit_transaction->t_handle_lock);
> +}
> +

I don't think making this inline makes sence. Neither the commit code nor
jbd2_journal_lock_updates() are so hot that it would warrant this large
inline function...

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-01-13 17:16:21

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> variable. So from code review it looks like we don't need to use
> t_handle_lock spinlock for checking t_updates value.
> Hence this patch gets rid of the spinlock protection in
> jbd2_journal_wait_updates()
>
> Signed-off-by: Ritesh Harjani <[email protected]>

This patch looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Actually looking at it, t_handle_lock seems to be very much unused. I agree
we don't need it when waiting for outstanding handles but the only
remaining uses are:

1) jbd2_journal_extend() where it is not needed either - we use
atomic_add_return() to manipulate t_outstanding_credits and hold
j_state_lock for reading which provides us enough exclusion.

2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
can just switch it to cmpxchg loop with a bit of care. Something like:

unsigned long old;

ts = jbd2_time_diff(ts, transaction->t_start);
old = transaction->t_max_wait;
while (old < ts)
old = cmpxchg(&transaction->t_max_wait, old, ts);

So perhaps you can add two more patches to remove other t_handle_lock uses
and drop it completely.

Honza

> ---
> include/linux/jbd2.h | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 34b051aa9009..9bef47622b9d 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
> if (!commit_transaction)
> return;
>
> - spin_lock(&commit_transaction->t_handle_lock);
> while (atomic_read(&commit_transaction->t_updates)) {
> DEFINE_WAIT(wait);
>
> prepare_to_wait(&journal->j_wait_updates, &wait,
> TASK_UNINTERRUPTIBLE);
> if (atomic_read(&commit_transaction->t_updates)) {
> - spin_unlock(&commit_transaction->t_handle_lock);
> write_unlock(&journal->j_state_lock);
> schedule();
> write_lock(&journal->j_state_lock);
> - spin_lock(&commit_transaction->t_handle_lock);
> }
> finish_wait(&journal->j_wait_updates, &wait);
> }
> - spin_unlock(&commit_transaction->t_handle_lock);
> }
>
> /*
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-01-13 17:18:15

by Ritesh Harjani

[permalink] [raw]

Subject: Re: [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function

On 22/01/13 12:30PM, Jan Kara wrote:
> On Thu 13-01-22 08:56:28, Ritesh Harjani wrote:
> > No functionality change as such in this patch. This only refactors the
> > common piece of code which waits for t_updates to finish into a common
> > function named as jbd2_journal_wait_updates(journal_t *)
> >
> > Signed-off-by: Ritesh Harjani <[email protected]>
>
> Just one nit, otherwise. Feel free to add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> > @@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
> > return max_t(long, free, 0);
> > }
> >
> > +/*
> > + * Waits for any outstanding t_updates to finish.
> > + * This is called with write j_state_lock held.
> > + */
> > +static inline void jbd2_journal_wait_updates(journal_t *journal)
> > +{
> > + transaction_t *commit_transaction = journal->j_running_transaction;
> > +
> > + if (!commit_transaction)
> > + return;
> > +
> > + spin_lock(&commit_transaction->t_handle_lock);
> > + while (atomic_read(&commit_transaction->t_updates)) {
> > + DEFINE_WAIT(wait);
> > +
> > + prepare_to_wait(&journal->j_wait_updates, &wait,
> > + TASK_UNINTERRUPTIBLE);
> > + if (atomic_read(&commit_transaction->t_updates)) {
> > + spin_unlock(&commit_transaction->t_handle_lock);
> > + write_unlock(&journal->j_state_lock);
> > + schedule();
> > + write_lock(&journal->j_state_lock);
> > + spin_lock(&commit_transaction->t_handle_lock);
> > + }
> > + finish_wait(&journal->j_wait_updates, &wait);
> > + }
> > + spin_unlock(&commit_transaction->t_handle_lock);
> > +}
> > +
>
> I don't think making this inline makes sence. Neither the commit code nor
> jbd2_journal_lock_updates() are so hot that it would warrant this large
> inline function...

Yes, make sense. Thanks for the review.
Will do the needful in v2.

-ritesh

>
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2022-01-13 20:20:33

by Ritesh Harjani

[permalink] [raw]

Subject: Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

On 22/01/13 12:27PM, Jan Kara wrote:
> On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > variable. So from code review it looks like we don't need to use
> > t_handle_lock spinlock for checking t_updates value.
> > Hence this patch gets rid of the spinlock protection in
> > jbd2_journal_wait_updates()
> >
> > Signed-off-by: Ritesh Harjani <[email protected]>
>
> This patch looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> Actually looking at it, t_handle_lock seems to be very much unused. I agree

I too had this thought in mind. Thanks for taking a deeper look into it :)

>
> we don't need it when waiting for outstanding handles but the only
> remaining uses are:
>
> 1) jbd2_journal_extend() where it is not needed either - we use
> atomic_add_return() to manipulate t_outstanding_credits and hold
> j_state_lock for reading which provides us enough exclusion.
>
> 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> can just switch it to cmpxchg loop with a bit of care. Something like:
>
> unsigned long old;
>
> ts = jbd2_time_diff(ts, transaction->t_start);
> old = transaction->t_max_wait;
> while (old < ts)
> old = cmpxchg(&transaction->t_max_wait, old, ts);
>
> So perhaps you can add two more patches to remove other t_handle_lock uses
> and drop it completely.

Thanks for providing the details Jan :)
Agree with jbd2_journal_extend(). I did looked a bit around t_max_wait and
I agree that something like above could work. I will spend some more time around
that code and will submit those changes together in v2.

-ritesh

>
> Honza
>
> > ---
> > include/linux/jbd2.h | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 34b051aa9009..9bef47622b9d 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
> > if (!commit_transaction)
> > return;
> >
> > - spin_lock(&commit_transaction->t_handle_lock);
> > while (atomic_read(&commit_transaction->t_updates)) {
> > DEFINE_WAIT(wait);
> >
> > prepare_to_wait(&journal->j_wait_updates, &wait,
> > TASK_UNINTERRUPTIBLE);
> > if (atomic_read(&commit_transaction->t_updates)) {
> > - spin_unlock(&commit_transaction->t_handle_lock);
> > write_unlock(&journal->j_state_lock);
> > schedule();
> > write_lock(&journal->j_state_lock);
> > - spin_lock(&commit_transaction->t_handle_lock);
> > }
> > finish_wait(&journal->j_wait_updates, &wait);
> > }
> > - spin_unlock(&commit_transaction->t_handle_lock);
> > }
> >
> > /*
> > --
> > 2.31.1
> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2022-01-18 02:25:48

by Ritesh Harjani

[permalink] [raw]

Subject: Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

On 22/01/13 06:08PM, Ritesh Harjani wrote:
> On 22/01/13 12:27PM, Jan Kara wrote:
> > On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > > variable. So from code review it looks like we don't need to use
> > > t_handle_lock spinlock for checking t_updates value.
> > > Hence this patch gets rid of the spinlock protection in
> > > jbd2_journal_wait_updates()
> > >
> > > Signed-off-by: Ritesh Harjani <[email protected]>
> >
> > This patch looks good. Feel free to add:
> >
> > Reviewed-by: Jan Kara <[email protected]>
> >
> > Actually looking at it, t_handle_lock seems to be very much unused. I agree

Thanks Jan for your help in this.
I have dropped this patch from v2 in order to discuss few more things and I felt
killing t_handle_lock completely can be sent in a seperate patch series.

>
> I too had this thought in mind. Thanks for taking a deeper look into it :)
>
> >
> > we don't need it when waiting for outstanding handles but the only
> > remaining uses are:
> >
> > 1) jbd2_journal_extend() where it is not needed either - we use
> > atomic_add_return() to manipulate t_outstanding_credits and hold
> > j_state_lock for reading which provides us enough exclusion.

I looked into jbd2_journal_extend and yes, we don't need t_handle_lock
for updating transaction->t_outstanding_credits, since it already happens with
atomic API calls.

Now I do see we update handle->h_**_credits in that function.
But I think this is per process (based on task_struct, current->journal_info)
and doesn't need a lock protection right?

> >
> > 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> > can just switch it to cmpxchg loop with a bit of care. Something like:
> >
> > unsigned long old;
> >
> > ts = jbd2_time_diff(ts, transaction->t_start);
> > old = transaction->t_max_wait;
> > while (old < ts)
> > old = cmpxchg(&transaction->t_max_wait, old, ts);

I think there might be a simpler and more straight forward way for updating
t_max_wait.

I did look into the t_max_wait logic and where all we are updating it.

t_max_wait is the max wait time in starting (&attaching) a _new_ running
transaction by a handle. Is this understaning correct?
From code I don't see t_max_wait getting updated for the time taken in order
to start the handle by a existing running transaction.

Here is how -
update_t_max_wait() will only update t_max_wait if the
transaction->t_start is after ts
(ts is nothing but when start_this_handle() was called).

1. This means that for transaction->t_start to be greater than ts, it has to be
the new transaction that gets started right (in start_this_handle() func)?

2. Second place where transaction->t_start is updated is just after the start of
commit phase 7. But this only means that this transaction has become the
commit transaction. That means someone has to alloc a new running transaction
which again is case-1.

Now I think this spinlock was added since multiple processes can start a handle
in parallel and attach a running transaction.

Also this was then moved within CONFIG_JBD2_DEBUG since to avoid spinlock
contention on a SMP system in starting multiple handles by different processes.

Now looking at all of above, I think we can move update_t_max_wait()
inside jbd2_get_transaction() in start_this_handle(). Because that is where
a new transaction will be started and transaction->t_start will be greater then
ts. This also is protected within j_state_lock write_lock, so we don't need
spinlock.

This would also mean that we can move t_max_wait outside of CONFIG_JBD2_DEBUG
and jbd2_journal_enable_debug.

Jan, could you confirm if above understaning is correct and shall I go ahead
with above changes?

-ritesh

> >
> > So perhaps you can add two more patches to remove other t_handle_lock uses
> > and drop it completely.
>
> Thanks for providing the details Jan :)
> Agree with jbd2_journal_extend().

> I did looked a bit around t_max_wait and
> I agree that something like above could work. I will spend some more time around
> that code and will submit those changes together in v2.
>
> -ritesh
>
> >
> > Honza
> >
> > > ---
> > > include/linux/jbd2.h | 4 ----
> > > 1 file changed, 4 deletions(-)
> > >
> > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > > index 34b051aa9009..9bef47622b9d 100644
> > > --- a/include/linux/jbd2.h
> > > +++ b/include/linux/jbd2.h
> > > @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
> > > if (!commit_transaction)
> > > return;
> > >
> > > - spin_lock(&commit_transaction->t_handle_lock);
> > > while (atomic_read(&commit_transaction->t_updates)) {
> > > DEFINE_WAIT(wait);
> > >
> > > prepare_to_wait(&journal->j_wait_updates, &wait,
> > > TASK_UNINTERRUPTIBLE);
> > > if (atomic_read(&commit_transaction->t_updates)) {
> > > - spin_unlock(&commit_transaction->t_handle_lock);
> > > write_unlock(&journal->j_state_lock);
> > > schedule();
> > > write_lock(&journal->j_state_lock);
> > > - spin_lock(&commit_transaction->t_handle_lock);
> > > }
> > > finish_wait(&journal->j_wait_updates, &wait);
> > > }
> > > - spin_unlock(&commit_transaction->t_handle_lock);
> > > }
> > >
> > > /*
> > > --
> > > 2.31.1
> > >
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR

2022-01-18 02:32:48

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

On Mon 17-01-22 18:25:27, Ritesh Harjani wrote:
> On 22/01/13 06:08PM, Ritesh Harjani wrote:
> > On 22/01/13 12:27PM, Jan Kara wrote:
> > > On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > > > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > > > variable. So from code review it looks like we don't need to use
> > > > t_handle_lock spinlock for checking t_updates value.
> > > > Hence this patch gets rid of the spinlock protection in
> > > > jbd2_journal_wait_updates()
> > > >
> > > > Signed-off-by: Ritesh Harjani <[email protected]>
> > >
> > > This patch looks good. Feel free to add:
> > >
> > > Reviewed-by: Jan Kara <[email protected]>
> > >
> > > Actually looking at it, t_handle_lock seems to be very much unused. I agree
>
> Thanks Jan for your help in this.
> I have dropped this patch from v2 in order to discuss few more things and I felt
> killing t_handle_lock completely can be sent in a seperate patch series.

Yes, probably a good choice.

> > I too had this thought in mind. Thanks for taking a deeper look into it :)
> >
> > >
> > > we don't need it when waiting for outstanding handles but the only
> > > remaining uses are:
> > >
> > > 1) jbd2_journal_extend() where it is not needed either - we use
> > > atomic_add_return() to manipulate t_outstanding_credits and hold
> > > j_state_lock for reading which provides us enough exclusion.
>
> I looked into jbd2_journal_extend and yes, we don't need t_handle_lock
> for updating transaction->t_outstanding_credits, since it already happens with
> atomic API calls.
>
> Now I do see we update handle->h_**_credits in that function.
> But I think this is per process (based on task_struct, current->journal_info)
> and doesn't need a lock protection right?

Yes, handle is per process so no lock is needed there.

> > > 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> > > can just switch it to cmpxchg loop with a bit of care. Something like:
> > >
> > > unsigned long old;
> > >
> > > ts = jbd2_time_diff(ts, transaction->t_start);
> > > old = transaction->t_max_wait;
> > > while (old < ts)
> > > old = cmpxchg(&transaction->t_max_wait, old, ts);
>
> I think there might be a simpler and more straight forward way for updating
> t_max_wait.
>
> I did look into the t_max_wait logic and where all we are updating it.
>
> t_max_wait is the max wait time in starting (&attaching) a _new_ running
> transaction by a handle. Is this understaning correct?

Correct. It is the maximum time we had to wait for a new transaction to be
created.

> From code I don't see t_max_wait getting updated for the time taken in order
> to start the handle by a existing running transaction.
>
> Here is how -
> update_t_max_wait() will only update t_max_wait if the
> transaction->t_start is after ts
> (ts is nothing but when start_this_handle() was called).
>
> 1. This means that for transaction->t_start to be greater than ts, it has to be
> the new transaction that gets started right (in start_this_handle() func)?
>
> 2. Second place where transaction->t_start is updated is just after the start of
> commit phase 7. But this only means that this transaction has become the
> commit transaction. That means someone has to alloc a new running transaction
> which again is case-1.
>
> Now I think this spinlock was added since multiple processes can start a handle
> in parallel and attach a running transaction.
>
> Also this was then moved within CONFIG_JBD2_DEBUG since to avoid spinlock
> contention on a SMP system in starting multiple handles by different processes.
>
> Now looking at all of above, I think we can move update_t_max_wait()
> inside jbd2_get_transaction() in start_this_handle(). Because that is where
> a new transaction will be started and transaction->t_start will be greater then
> ts. This also is protected within j_state_lock write_lock, so we don't need
> spinlock.

All above is correct upto this point. The catch is there can be (and often
are) more processes in start_this_handle() waiting in
wait_transaction_switching() and then racing to create the new transaction.
The process calling jbd2_get_transaction() is not necessarily the one which
entered start_this_handle() first and thus t_max_wait would not be really
the maximum time someone had to wait.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR