On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep
warning when run on ext4 mounted with -o test_dummy_encryption:
xfs_io/4594 is trying to acquire lock:
(jbd2_handle
){++++.+}, at:
[<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b
but task is already holding lock:
(jbd2_handle
){++++.+}, at:
[<ffffffff813000de>] start_this_handle+0x354/0x3d8
The abbreviated call stack is:
[<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
[<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b
[<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
[<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6
[<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6
[<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18
[<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79
[<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f
[<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b
[<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103
[<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a
[<ffffffff812bd371>] ext4_create+0xb7/0x161
When a file is created in an encrypted directory, ext4_set_context() is
called to set an encryption context on the new file. This calls
ext4_xattr_set(), which contains a retry loop where the journal is
forced to commit if an ENOSPC error is encountered.
If the task actually were to wait for the journal to commit in this
case, then it would deadlock because a handle remains open from
__ext4_new_inode(), so the running transaction can't be committed yet.
Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
allowing the running transaction to be committed while the current task
has it open. However, the above lockdep warning is still triggered.
Fix the problem by passing the handle through the 'fs_data' argument to
ext4_set_context(), then using ext4_xattr_set_handle() instead of
ext4_xattr_set(). And in the case where no journal handle is specified
and ext4_set_context() has to open one, add an ENOSPC retry loop since
in that case it is the outermost transaction.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/ialloc.c | 3 +--
fs/ext4/super.c | 32 ++++++++++++++++++++++----------
2 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 170421e..f543281 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1115,8 +1115,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
}
if (encrypt) {
- /* give pointer to avoid set_context with journal ops. */
- err = fscrypt_inherit_context(dir, inode, &encrypt, true);
+ err = fscrypt_inherit_context(dir, inode, handle, true);
if (err)
goto fail_free_drop;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ff6f3ab..22d50cb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1114,14 +1114,22 @@ static int ext4_prepare_context(struct inode *inode)
static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
void *fs_data)
{
- handle_t *handle;
- int res, res2;
+ handle_t *handle = fs_data;
+ int res, res2, retries = 0;
- /* fs_data is null when internally used. */
- if (fs_data) {
- res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION,
- EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx,
- len, 0);
+ /*
+ * If a journal handle was specified, then the encryption context is
+ * being set on a new inode via inheritance and is part of a larger
+ * transaction to create the inode. Otherwise the encryption context is
+ * being set on an existing inode in its own transaction. Only in the
+ * latter case should the "retry on ENOSPC" logic be used.
+ */
+
+ if (handle) {
+ res = ext4_xattr_set_handle(handle, inode,
+ EXT4_XATTR_INDEX_ENCRYPTION,
+ EXT4_XATTR_NAME_ENCRYPTION_CONTEXT,
+ ctx, len, 0);
if (!res) {
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
ext4_clear_inode_state(inode,
@@ -1130,14 +1138,15 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
return res;
}
+retry:
handle = ext4_journal_start(inode, EXT4_HT_MISC,
ext4_jbd2_credits_xattr(inode));
if (IS_ERR(handle))
return PTR_ERR(handle);
- res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION,
- EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx,
- len, 0);
+ res = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_ENCRYPTION,
+ EXT4_XATTR_NAME_ENCRYPTION_CONTEXT,
+ ctx, len, 0);
if (!res) {
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
res = ext4_mark_inode_dirty(handle, inode);
@@ -1145,6 +1154,9 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
EXT4_ERROR_INODE(inode, "Failed to mark inode dirty");
}
res2 = ext4_journal_stop(handle);
+
+ if (res == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+ goto retry;
if (!res)
res = res2;
return res;
--
2.8.0.rc3.226.g39d4020
On Mon 14-11-16 13:03:36, Eric Biggers wrote:
> On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep
> warning when run on ext4 mounted with -o test_dummy_encryption:
>
> xfs_io/4594 is trying to acquire lock:
> (jbd2_handle
> ){++++.+}, at:
> [<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b
>
> but task is already holding lock:
> (jbd2_handle
> ){++++.+}, at:
> [<ffffffff813000de>] start_this_handle+0x354/0x3d8
>
> The abbreviated call stack is:
>
> [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
> [<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b
> [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
> [<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6
> [<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6
> [<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18
> [<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79
> [<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f
> [<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b
> [<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103
> [<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a
> [<ffffffff812bd371>] ext4_create+0xb7/0x161
>
> When a file is created in an encrypted directory, ext4_set_context() is
> called to set an encryption context on the new file. This calls
> ext4_xattr_set(), which contains a retry loop where the journal is
> forced to commit if an ENOSPC error is encountered.
>
> If the task actually were to wait for the journal to commit in this
> case, then it would deadlock because a handle remains open from
> __ext4_new_inode(), so the running transaction can't be committed yet.
> Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
> allowing the running transaction to be committed while the current task
> has it open. However, the above lockdep warning is still triggered.
>
> Fix the problem by passing the handle through the 'fs_data' argument to
> ext4_set_context(), then using ext4_xattr_set_handle() instead of
> ext4_xattr_set(). And in the case where no journal handle is specified
> and ext4_set_context() has to open one, add an ENOSPC retry loop since
> in that case it is the outermost transaction.
>
> Signed-off-by: Eric Biggers <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> fs/ext4/ialloc.c | 3 +--
> fs/ext4/super.c | 32 ++++++++++++++++++++++----------
> 2 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 170421e..f543281 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -1115,8 +1115,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> }
>
> if (encrypt) {
> - /* give pointer to avoid set_context with journal ops. */
> - err = fscrypt_inherit_context(dir, inode, &encrypt, true);
> + err = fscrypt_inherit_context(dir, inode, handle, true);
> if (err)
> goto fail_free_drop;
> }
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index ff6f3ab..22d50cb 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1114,14 +1114,22 @@ static int ext4_prepare_context(struct inode *inode)
> static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> void *fs_data)
> {
> - handle_t *handle;
> - int res, res2;
> + handle_t *handle = fs_data;
> + int res, res2, retries = 0;
>
> - /* fs_data is null when internally used. */
> - if (fs_data) {
> - res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION,
> - EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx,
> - len, 0);
> + /*
> + * If a journal handle was specified, then the encryption context is
> + * being set on a new inode via inheritance and is part of a larger
> + * transaction to create the inode. Otherwise the encryption context is
> + * being set on an existing inode in its own transaction. Only in the
> + * latter case should the "retry on ENOSPC" logic be used.
> + */
> +
> + if (handle) {
> + res = ext4_xattr_set_handle(handle, inode,
> + EXT4_XATTR_INDEX_ENCRYPTION,
> + EXT4_XATTR_NAME_ENCRYPTION_CONTEXT,
> + ctx, len, 0);
> if (!res) {
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> ext4_clear_inode_state(inode,
> @@ -1130,14 +1138,15 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> return res;
> }
>
> +retry:
> handle = ext4_journal_start(inode, EXT4_HT_MISC,
> ext4_jbd2_credits_xattr(inode));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> - res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION,
> - EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx,
> - len, 0);
> + res = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_ENCRYPTION,
> + EXT4_XATTR_NAME_ENCRYPTION_CONTEXT,
> + ctx, len, 0);
> if (!res) {
> ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
> res = ext4_mark_inode_dirty(handle, inode);
> @@ -1145,6 +1154,9 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len,
> EXT4_ERROR_INODE(inode, "Failed to mark inode dirty");
> }
> res2 = ext4_journal_stop(handle);
> +
> + if (res == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry;
> if (!res)
> res = res2;
> return res;
> --
> 2.8.0.rc3.226.g39d4020
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Nov 14, 2016 at 01:03:36PM -0800, Eric Biggers wrote:
> If the task actually were to wait for the journal to commit in this
> case, then it would deadlock because a handle remains open from
> __ext4_new_inode(), so the running transaction can't be committed yet.
> Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
> allowing the running transaction to be committed while the current task
> has it open. However, the above lockdep warning is still triggered.
So this is a false positive introduced by
1eaa566d368b: jbd2: track more dependencies on transaction commit
Instead of working around the problem here, perhaps it would be better
to fix __jbd2_journal_force_commit() so that it calls a newly created
__jbd2_log_wait_commit() which skips the jbd2_might_wait_for_commit()
(and then have jbd2_log_wait_commit call __jbd2_log_wait_commit with
the might_wait_for_commit check)?
This isn't the only place where jbd2_journal_force_commit() is called
so if the problem is with the lockdep check, maybe we should just fix
the logic in the jbd2 layer, hmm?
- Ted
On Wed, Nov 16, 2016 at 10:47:38AM -0500, Theodore Ts'o wrote:
>
> So this is a false positive introduced by
>
> 1eaa566d368b: jbd2: track more dependencies on transaction commit
>
> Instead of working around the problem here, perhaps it would be better
> to fix __jbd2_journal_force_commit() so that it calls a newly created
> __jbd2_log_wait_commit() which skips the jbd2_might_wait_for_commit()
> (and then have jbd2_log_wait_commit call __jbd2_log_wait_commit with
> the might_wait_for_commit check)?
>
> This isn't the only place where jbd2_journal_force_commit() is called
> so if the problem is with the lockdep check, maybe we should just fix
> the logic in the jbd2 layer, hmm?
>
> - Ted
The lockdep warning still seems helpful because it will show places that try to
force-commit the journal while holding an open handle. Callers probably won't
expect that this will be a no-op, and it may indicate that the attempt to
force-commit the journal is in the wrong place, as it was here.
You and Jan know more about this than I do, though; I could be wrong.
Eric
On Wed 16-11-16 10:47:38, Ted Tso wrote:
> On Mon, Nov 14, 2016 at 01:03:36PM -0800, Eric Biggers wrote:
> > If the task actually were to wait for the journal to commit in this
> > case, then it would deadlock because a handle remains open from
> > __ext4_new_inode(), so the running transaction can't be committed yet.
> > Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
> > allowing the running transaction to be committed while the current task
> > has it open. However, the above lockdep warning is still triggered.
>
> So this is a false positive introduced by
>
> 1eaa566d368b: jbd2: track more dependencies on transaction commit
>
> Instead of working around the problem here, perhaps it would be better
> to fix __jbd2_journal_force_commit() so that it calls a newly created
> __jbd2_log_wait_commit() which skips the jbd2_might_wait_for_commit()
> (and then have jbd2_log_wait_commit call __jbd2_log_wait_commit with
> the might_wait_for_commit check)?
>
> This isn't the only place where jbd2_journal_force_commit() is called
> so if the problem is with the lockdep check, maybe we should just fix
> the logic in the jbd2 layer, hmm?
So it is a false positive in the sense that it is not a deadlock
possibility. OTOH it is a valid report in the sense that the code does not
do what it is intended to - i.e., force a transaction commit to free some
blocks and retry block allocation. So I find the original fix actually
worthwhile.
Looking into ext4 codebase there are only 3 callsites of
jbd2_journal_force_commit_nested() - all in some kind of retry loops and
two of the call sites actually should not need the "nested" variant. Just
ext4_should_retry_alloc() may be called when a transaction is held open and
then the caller likely wanted something else anyway so it would be good to
find these cases and fix them and then get rid of the "nested" variant of
forcing a commit since that is easy to use in a wrong way.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Nov 21, 2016 at 03:18:13PM +0100, Jan Kara wrote:
> On Wed 16-11-16 10:47:38, Ted Tso wrote:
> > On Mon, Nov 14, 2016 at 01:03:36PM -0800, Eric Biggers wrote:
> > > If the task actually were to wait for the journal to commit in this
> > > case, then it would deadlock because a handle remains open from
> > > __ext4_new_inode(), so the running transaction can't be committed yet.
> > > Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
> > > allowing the running transaction to be committed while the current task
> > > has it open. However, the above lockdep warning is still triggered.
> >
> > So this is a false positive introduced by
> >
> > 1eaa566d368b: jbd2: track more dependencies on transaction commit
> >
> > Instead of working around the problem here, perhaps it would be better
> > to fix __jbd2_journal_force_commit() so that it calls a newly created
> > __jbd2_log_wait_commit() which skips the jbd2_might_wait_for_commit()
> > (and then have jbd2_log_wait_commit call __jbd2_log_wait_commit with
> > the might_wait_for_commit check)?
> >
> > This isn't the only place where jbd2_journal_force_commit() is called
> > so if the problem is with the lockdep check, maybe we should just fix
> > the logic in the jbd2 layer, hmm?
>
> So it is a false positive in the sense that it is not a deadlock
> possibility. OTOH it is a valid report in the sense that the code does not
> do what it is intended to - i.e., force a transaction commit to free some
> blocks and retry block allocation. So I find the original fix actually
> worthwhile.
OK, I'll take Eric's patch. Thanks!!
> Looking into ext4 codebase there are only 3 callsites of
> jbd2_journal_force_commit_nested() - all in some kind of retry loops and
> two of the call sites actually should not need the "nested" variant. Just
> ext4_should_retry_alloc() may be called when a transaction is held open and
> then the caller likely wanted something else anyway so it would be good to
> find these cases and fix them and then get rid of the "nested" variant of
> forcing a commit since that is easy to use in a wrong way.
Fair enough. The original use case was for undo access, which we're
not using any more since we added mballoc. It looks like we can
remove all of the support for get_undo_access while we're at it.
- Ted