At least 3 places might sleep in update_qgroup_limit_item(), as shown below:
update_qgroup_limit_item
btrfs_alloc_path
/* allocate memory non-atomically, might sleep */
kmem_cache_zalloc(btrfs_path_cachep, GFP_NOFS)
btrfs_search_slot
setup_nodes_for_search
reada_for_balance
btrfs_readahead_node_child
btrfs_readahead_tree_block
btrfs_find_create_tree_block
alloc_extent_buffer
kmem_cache_zalloc
/* allocate memory non-atomically, might sleep */
kmem_cache_alloc(GFP_NOFS|__GFP_NOFAIL|__GFP_ZERO)
read_extent_buffer_pages
submit_extent_page
/* disk IO, might sleep */
submit_one_bio
Fix this by delaying the limit item updates until unlock the spin lock.
By the way, add might_sleep() to some places.
ChenXiaoSong (3):
btrfs: add might_sleep() to some places in update_qgroup_limit_item()
btrfs: qgroup: introduce btrfs_update_quoto_limit() helper
btrfs: qgroup: fix sleep from invalid context bug in
update_qgroup_limit_item()
fs/btrfs/ctree.c | 2 ++
fs/btrfs/qgroup.c | 45 ++++++++++++++++++++++++++-------------------
2 files changed, 28 insertions(+), 19 deletions(-)
--
2.31.1
As the potential sleeping under spin lock is hard to spot, we should add
might_sleep() to some places.
Signed-off-by: ChenXiaoSong <[email protected]>
---
fs/btrfs/ctree.c | 2 ++
fs/btrfs/qgroup.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a9543f01184c..809053e9cfde 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
int min_write_lock_level;
int prev_cmp;
+ might_sleep();
+
lowest_level = p->lowest_level;
WARN_ON(lowest_level && ins_len > 0);
WARN_ON(p->nodes[0] != NULL);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..d0480b9c6c86 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
int ret;
int slot;
+ might_sleep();
+
key.objectid = 0;
key.type = BTRFS_QGROUP_LIMIT_KEY;
key.offset = qgroup->qgroupid;
--
2.31.1
On Wed, Nov 16, 2022 at 01:17:07AM +0800, ChenXiaoSong wrote:
> As the potential sleeping under spin lock is hard to spot, we should add
> might_sleep() to some places.
>
> Signed-off-by: ChenXiaoSong <[email protected]>
> ---
> fs/btrfs/ctree.c | 2 ++
> fs/btrfs/qgroup.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a9543f01184c..809053e9cfde 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> int min_write_lock_level;
> int prev_cmp;
>
> + might_sleep();
This needs some explanation in the changelog, the reason was mentioned
in some past patch iteration that it's due to potential IO fi the blocks
are not cached.
> +
> lowest_level = p->lowest_level;
> WARN_ON(lowest_level && ins_len > 0);
> WARN_ON(p->nodes[0] != NULL);
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9334c3157c22..d0480b9c6c86 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
> int ret;
> int slot;
>
> + might_sleep();
This one is redundant, no? There's call to btrfs_search_slot a few lines
below.
> +
> key.objectid = 0;
> key.type = BTRFS_QGROUP_LIMIT_KEY;
> key.offset = qgroup->qgroupid;
> --
> 2.31.1
>
On 2022/11/16 01:17, ChenXiaoSong wrote:
> As the potential sleeping under spin lock is hard to spot, we should add
> might_sleep() to some places.
>
> Signed-off-by: ChenXiaoSong <[email protected]>
Looks good.
We may want to add more in other locations, but this is really a good start.
Reviewed-by: Qu Wenruo <[email protected]>
Thanks,
Qu
> ---
> fs/btrfs/ctree.c | 2 ++
> fs/btrfs/qgroup.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a9543f01184c..809053e9cfde 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1934,6 +1934,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> int min_write_lock_level;
> int prev_cmp;
>
> + might_sleep();
> +
> lowest_level = p->lowest_level;
> WARN_ON(lowest_level && ins_len > 0);
> WARN_ON(p->nodes[0] != NULL);
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9334c3157c22..d0480b9c6c86 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -779,6 +779,8 @@ static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
> int ret;
> int slot;
>
> + might_sleep();
> +
> key.objectid = 0;
> key.type = BTRFS_QGROUP_LIMIT_KEY;
> key.offset = qgroup->qgroupid;
On 2022/11/16 16:09, ChenXiaoSong wrote:
> 在 2022/11/16 6:48, Qu Wenruo 写道:
>> Looks good.
>>
>> We may want to add more in other locations, but this is really a good
>> start.
>>
>> Reviewed-by: Qu Wenruo <[email protected]>
>>
>> Thanks,
>> Qu
>
> If I just add might_sleep() in btrfs_alloc_path() and
> btrfs_search_slot(), is it reasonable?
Adding it to btrfs_search_slot() is definitely correct.
But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself
already do the might_sleep_if() somewhere?
I just looked the call chain, and indeed it is doing the check already:
btrfs_alloc_path()
|- kmem_cache_zalloc()
|- kmem_cache_alloc()
|- __kmem_cache_alloc_lru()
|- slab_alloc()
|- slab_alloc_node()
|- slab_pre_alloc_hook()
|- might_alloc()
|- might_sleep_if()
Thanks,
Qu
>
> Or just add might_sleep() to one place in update_qgroup_limit_item() ?
在 2022/11/16 6:48, Qu Wenruo 写道:
> Looks good.
>
> We may want to add more in other locations, but this is really a good
> start.
>
> Reviewed-by: Qu Wenruo <[email protected]>
>
> Thanks,
> Qu
If I just add might_sleep() in btrfs_alloc_path() and
btrfs_search_slot(), is it reasonable?
Or just add might_sleep() to one place in update_qgroup_limit_item() ?
On 2022/11/16 20:24, David Sterba wrote:
> On Wed, Nov 16, 2022 at 04:43:50PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/11/16 16:09, ChenXiaoSong wrote:
>>> 在 2022/11/16 6:48, Qu Wenruo 写道:
>>>> Looks good.
>>>>
>>>> We may want to add more in other locations, but this is really a good
>>>> start.
>>>>
>>>> Reviewed-by: Qu Wenruo <[email protected]>
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>> If I just add might_sleep() in btrfs_alloc_path() and
>>> btrfs_search_slot(), is it reasonable?
>>
>> Adding it to btrfs_search_slot() is definitely correct.
>>
>> But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself
>> already do the might_sleep_if() somewhere?
>>
>> I just looked the call chain, and indeed it is doing the check already:
>>
>> btrfs_alloc_path()
>> |- kmem_cache_zalloc()
>> |- kmem_cache_alloc()
>> |- __kmem_cache_alloc_lru()
>> |- slab_alloc()
>> |- slab_alloc_node()
>> |- slab_pre_alloc_hook()
>> |- might_alloc()
>> |- might_sleep_if()
>
> The call chaing is unconditional so the check will always happen but the
> condition itself in might_sleep_if does not recognize GFP_NOFS:
>
> 34 static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
> 35 {
> 36 return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
> 37 }
>
> #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO)
>
> And I think the qgroup limit was exactly a spin lock over btrfs_path_alloc so
> it did not help. An might_sleep() inside btrfs_path_alloc() is a very minimal
> but reliable check we could add, the paths are used in many places so it would
> increase the coverage.
OK, then it makes sense now for btrfs_alloc_path().
But I still believe this looks like a bug in gfpflags_allow_blocking()...
Thanks,
Qu
On Wed, Nov 16, 2022 at 04:43:50PM +0800, Qu Wenruo wrote:
>
>
> On 2022/11/16 16:09, ChenXiaoSong wrote:
> > 在 2022/11/16 6:48, Qu Wenruo 写道:
> >> Looks good.
> >>
> >> We may want to add more in other locations, but this is really a good
> >> start.
> >>
> >> Reviewed-by: Qu Wenruo <[email protected]>
> >>
> >> Thanks,
> >> Qu
> >
> > If I just add might_sleep() in btrfs_alloc_path() and
> > btrfs_search_slot(), is it reasonable?
>
> Adding it to btrfs_search_slot() is definitely correct.
>
> But why for btrfs_alloc_path()? Wouldn't kmem_cache_zalloc() itself
> already do the might_sleep_if() somewhere?
>
> I just looked the call chain, and indeed it is doing the check already:
>
> btrfs_alloc_path()
> |- kmem_cache_zalloc()
> |- kmem_cache_alloc()
> |- __kmem_cache_alloc_lru()
> |- slab_alloc()
> |- slab_alloc_node()
> |- slab_pre_alloc_hook()
> |- might_alloc()
> |- might_sleep_if()
The call chaing is unconditional so the check will always happen but the
condition itself in might_sleep_if does not recognize GFP_NOFS:
34 static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
35 {
36 return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
37 }
#define GFP_NOFS (__GFP_RECLAIM | __GFP_IO)
And I think the qgroup limit was exactly a spin lock over btrfs_path_alloc so
it did not help. An might_sleep() inside btrfs_path_alloc() is a very minimal
but reliable check we could add, the paths are used in many places so it would
increase the coverage.