2021-10-14 08:27:01

by Zqiang

[permalink] [raw]
Subject: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

<IRQ>
__init_work+0x2d/0x50
synchronize_rcu_expedited+0x3af/0x650
bdi_remove_from_list [inline]
bdi_unregister+0x17f/0x5c0
release_bdi+0xa1/0xc0
kref_put [inline]
bdi_put+0x72/0xa0
bdev_free_inode+0x11e/0x220
i_callback+0x3f/0x70
rcu_do_batch [inline]
rcu_core+0x76d/0x16c0
__do_softirq+0x1d7/0x93b
invoke_softirq [inline]
__irq_exit_rcu [inline]
irq_exit_rcu+0xf2/0x130
sysvec_apic_timer_interrupt+0x93/0xc0

The bdi_remove_from_list() is called in RCU softirq, however the
synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
instead of it.

Reported-by: Hao Sun <[email protected]>
Signed-off-by: Zqiang <[email protected]>
---
include/linux/backing-dev-defs.h | 1 +
mm/backing-dev.c | 4 +---
2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 33207004cfde..35a093384518 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -202,6 +202,7 @@ struct backing_dev_info {
#ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
#endif
+ struct rcu_head rcu;
};

enum {
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c878d995af06..45d866a3a4a2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -935,8 +935,6 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
rb_erase(&bdi->rb_node, &bdi_tree);
list_del_rcu(&bdi->bdi_list);
spin_unlock_bh(&bdi_lock);
-
- synchronize_rcu_expedited();
}

void bdi_unregister(struct backing_dev_info *bdi)
@@ -969,7 +967,7 @@ static void release_bdi(struct kref *ref)
bdi_unregister(bdi);
WARN_ON_ONCE(bdi->dev);
wb_exit(&bdi->wb);
- kfree(bdi);
+ kfree_rcu(bdi, rcu);
}

void bdi_put(struct backing_dev_info *bdi)
--
2.17.1


2021-10-14 11:29:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.
>
> Reported-by: Hao Sun <[email protected]>
> Signed-off-by: Zqiang <[email protected]>
> ---
> include/linux/backing-dev-defs.h | 1 +
> mm/backing-dev.c | 4 +---
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 33207004cfde..35a093384518 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -202,6 +202,7 @@ struct backing_dev_info {
> #ifdef CONFIG_DEBUG_FS
> struct dentry *debug_dir;
> #endif
> + struct rcu_head rcu;
> };

Instead of growing struct backing_dev_info, it seems to me this rcu_head
could be placed in a union with rb_node, since it will have been removed
from the bdi_tree by this point and the tree is never walked under
RCU protection?

2021-10-14 13:13:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> <IRQ>
> __init_work+0x2d/0x50
> synchronize_rcu_expedited+0x3af/0x650
> bdi_remove_from_list [inline]
> bdi_unregister+0x17f/0x5c0
> release_bdi+0xa1/0xc0
> kref_put [inline]
> bdi_put+0x72/0xa0
> bdev_free_inode+0x11e/0x220
> i_callback+0x3f/0x70
> rcu_do_batch [inline]
> rcu_core+0x76d/0x16c0
> __do_softirq+0x1d7/0x93b
> invoke_softirq [inline]
> __irq_exit_rcu [inline]
> irq_exit_rcu+0xf2/0x130
> sysvec_apic_timer_interrupt+0x93/0xc0
>
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.

What workload is this running? If we hit the RCU unregister path
from inode freeing we have a lifetime problem somewhere that we need
to fix first.

2021-10-15 11:53:09

by zhangqiang

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()


On 2021/10/14 下午7:24, Matthew Wilcox wrote:
> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>> The bdi_remove_from_list() is called in RCU softirq, however the
>> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
>> instead of it.
>>
>> Reported-by: Hao Sun <[email protected]>
>> Signed-off-by: Zqiang <[email protected]>
>> ---
>> include/linux/backing-dev-defs.h | 1 +
>> mm/backing-dev.c | 4 +---
>> 2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
>> index 33207004cfde..35a093384518 100644
>> --- a/include/linux/backing-dev-defs.h
>> +++ b/include/linux/backing-dev-defs.h
>> @@ -202,6 +202,7 @@ struct backing_dev_info {
>> #ifdef CONFIG_DEBUG_FS
>> struct dentry *debug_dir;
>> #endif
>> + struct rcu_head rcu;
>> };
> Instead of growing struct backing_dev_info, it seems to me this rcu_head
> could be placed in a union with rb_node, since it will have been removed
> from the bdi_tree by this point and the tree is never walked under
> RCU protection?
>
Thanks for your advice, I find this bdi_tree is traversed under the
protection of a spin lock, not under the protection of RCU.
I find this modification does not avoid the problem described in patch,
the flush_delayed_work() may be called in release_bdi()
The same will cause problems.

may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
i_callback) or do you have any better suggestions?

Thanks
Zqiang


2021-10-15 23:02:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
>
> On 2021/10/15 上午10:57, Qiang Zhang wrote:
> >
> >
> > Matthew Wilcox <[email protected] <mailto:[email protected]>>
> > 于2021年10月14日周四 下午7:26写道:
> >
> > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> > > The bdi_remove_from_list() is called in RCU softirq, however the
> > > synchronize_rcu_expedited() will produce sleep action, use
> > kfree_rcu()
> > > instead of it.
> > >
> > > Reported-by: Hao Sun <[email protected]
> > <mailto:[email protected]>>
> > > Signed-off-by: Zqiang <[email protected]
> > <mailto:[email protected]>>
> > > ---
> > >  include/linux/backing-dev-defs.h | 1 +
> > >  mm/backing-dev.c                 | 4 +---
> > >  2 files changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/backing-dev-defs.h
> > b/include/linux/backing-dev-defs.h
> > > index 33207004cfde..35a093384518 100644
> > > --- a/include/linux/backing-dev-defs.h
> > > +++ b/include/linux/backing-dev-defs.h
> > > @@ -202,6 +202,7 @@ struct backing_dev_info {
> > >  #ifdef CONFIG_DEBUG_FS
> > >       struct dentry *debug_dir;
> > >  #endif
> > > +     struct rcu_head rcu;
> > >  };
> >
> > >Instead of growing struct backing_dev_info, it seems to me this
> > rcu_head
> > >could be placed in a union with rb_node, since it will have been
> > removed
> > >from the bdi_tree by this point and the tree is never walked under
> > >RCU protection?
> >
> >
> > Thanks for your advice, I find this bdi_tree is traversed under the
> > protection of a spin lock, not under the protection of RCU.
> > I find this modification does not avoid the problem described in patch,
> > the flush_delayed_work() may be called in release_bdi()
> > The same will cause problems.
> > may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
> > i_callback) or do you have any better suggestions?

What? All I was suggesting was:

+++ b/include/linux/backing-dev-defs.h
@@ -168,7 +168,10 @@ struct bdi_writeback {

struct backing_dev_info {
u64 id;
- struct rb_node rb_node; /* keyed by ->id */
+ union {
+ struct rb_node rb_node; /* keyed by ->id */
+ struct rcu_head rcu;
+ };
struct list_head bdi_list;
unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
unsigned long io_pages; /* max allowed IO size */


Christoph, independent of the inode lifetime problem, this actually seems
like a good approach to take. I don't see why we should synchronize_rcu()
here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
(converted it to use _expedited) and Tejun (worked around a problem when
using _expedited).

2021-10-16 03:30:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

On Fri, Oct 15, 2021 at 01:35:56PM +0100, Matthew Wilcox wrote:
> struct backing_dev_info {
> u64 id;
> - struct rb_node rb_node; /* keyed by ->id */
> + union {
> + struct rb_node rb_node; /* keyed by ->id */
> + struct rcu_head rcu;
> + };
> struct list_head bdi_list;
> unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
> unsigned long io_pages; /* max allowed IO size */
>
>
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take. I don't see why we should synchronize_rcu()
> here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).

The kfree+rcu + your suggestion does seem like a good idea in general to
me. But I'd still like to fix the actual bug being reported before
optimizing the area in a way that papers over the bug.

2021-10-18 03:50:13

by Zqiang

[permalink] [raw]
Subject: Re: [PATCH] mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()


On 2021/10/15 下午8:35, Matthew Wilcox wrote:
> On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
>> On 2021/10/15 上午10:57, Qiang Zhang wrote:
>>>
>>> Matthew Wilcox <[email protected] <mailto:[email protected]>>
>>> 于2021年10月14日周四 下午7:26写道:
>>>
>>> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>>> > The bdi_remove_from_list() is called in RCU softirq, however the
>>> > synchronize_rcu_expedited() will produce sleep action, use
>>> kfree_rcu()
>>> > instead of it.
>>> >
>>> > Reported-by: Hao Sun <[email protected]
>>> <mailto:[email protected]>>
>>> > Signed-off-by: Zqiang <[email protected]
>>> <mailto:[email protected]>>
>>> > ---
>>> >  include/linux/backing-dev-defs.h | 1 +
>>> >  mm/backing-dev.c                 | 4 +---
>>> >  2 files changed, 2 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/include/linux/backing-dev-defs.h
>>> b/include/linux/backing-dev-defs.h
>>> > index 33207004cfde..35a093384518 100644
>>> > --- a/include/linux/backing-dev-defs.h
>>> > +++ b/include/linux/backing-dev-defs.h
>>> > @@ -202,6 +202,7 @@ struct backing_dev_info {
>>> >  #ifdef CONFIG_DEBUG_FS
>>> >       struct dentry *debug_dir;
>>> >  #endif
>>> > +     struct rcu_head rcu;
>>> >  };
>>>
>>> >Instead of growing struct backing_dev_info, it seems to me this
>>> rcu_head
>>> >could be placed in a union with rb_node, since it will have been
>>> removed
>>> >from the bdi_tree by this point and the tree is never walked under
>>> >RCU protection?
>>>
>>>
>>> Thanks for your advice, I find this bdi_tree is traversed under the
>>> protection of a spin lock, not under the protection of RCU.
>>> I find this modification does not avoid the problem described in patch,
>>> the flush_delayed_work() may be called in release_bdi()
>>> The same will cause problems.
>>> may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
>>> i_callback) or do you have any better suggestions?
> What? All I was suggesting was:
>
> +++ b/include/linux/backing-dev-defs.h
> @@ -168,7 +168,10 @@ struct bdi_writeback {
>
> struct backing_dev_info {
> u64 id;
> - struct rb_node rb_node; /* keyed by ->id */
> + union {
> + struct rb_node rb_node; /* keyed by ->id */
> + struct rcu_head rcu;
> + };
> struct list_head bdi_list;
> unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
> unsigned long io_pages; /* max allowed IO size */
>
>
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take. I don't see why we should synchronize_rcu()
> here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).

Sorry,this my mistake.   this problem and the inode lifetime cycle are
two different problems

Can this modification which use kfree_rcu() instead of synchronize_rcu()
be accepted?


Thanks

Zqiang