<IRQ>
__init_work+0x2d/0x50
synchronize_rcu_expedited+0x3af/0x650
bdi_remove_from_list [inline]
bdi_unregister+0x17f/0x5c0
release_bdi+0xa1/0xc0
kref_put [inline]
bdi_put+0x72/0xa0
bdev_free_inode+0x11e/0x220
i_callback+0x3f/0x70
rcu_do_batch [inline]
rcu_core+0x76d/0x16c0
__do_softirq+0x1d7/0x93b
invoke_softirq [inline]
__irq_exit_rcu [inline]
irq_exit_rcu+0xf2/0x130
sysvec_apic_timer_interrupt+0x93/0xc0
The bdi_remove_from_list() is called in RCU softirq, however the
synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
instead of it.
Reported-by: Hao Sun <[email protected]>
Signed-off-by: Zqiang <[email protected]>
---
include/linux/backing-dev-defs.h | 1 +
mm/backing-dev.c | 4 +---
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 33207004cfde..35a093384518 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -202,6 +202,7 @@ struct backing_dev_info {
#ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
#endif
+ struct rcu_head rcu;
};
enum {
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c878d995af06..45d866a3a4a2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -935,8 +935,6 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
rb_erase(&bdi->rb_node, &bdi_tree);
list_del_rcu(&bdi->bdi_list);
spin_unlock_bh(&bdi_lock);
-
- synchronize_rcu_expedited();
}
void bdi_unregister(struct backing_dev_info *bdi)
@@ -969,7 +967,7 @@ static void release_bdi(struct kref *ref)
bdi_unregister(bdi);
WARN_ON_ONCE(bdi->dev);
wb_exit(&bdi->wb);
- kfree(bdi);
+ kfree_rcu(bdi, rcu);
}
void bdi_put(struct backing_dev_info *bdi)
--
2.17.1
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.
>
> Reported-by: Hao Sun <[email protected]>
> Signed-off-by: Zqiang <[email protected]>
> ---
> include/linux/backing-dev-defs.h | 1 +
> mm/backing-dev.c | 4 +---
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 33207004cfde..35a093384518 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -202,6 +202,7 @@ struct backing_dev_info {
> #ifdef CONFIG_DEBUG_FS
> struct dentry *debug_dir;
> #endif
> + struct rcu_head rcu;
> };
Instead of growing struct backing_dev_info, it seems to me this rcu_head
could be placed in a union with rb_node, since it will have been removed
from the bdi_tree by this point and the tree is never walked under
RCU protection?
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> <IRQ>
> __init_work+0x2d/0x50
> synchronize_rcu_expedited+0x3af/0x650
> bdi_remove_from_list [inline]
> bdi_unregister+0x17f/0x5c0
> release_bdi+0xa1/0xc0
> kref_put [inline]
> bdi_put+0x72/0xa0
> bdev_free_inode+0x11e/0x220
> i_callback+0x3f/0x70
> rcu_do_batch [inline]
> rcu_core+0x76d/0x16c0
> __do_softirq+0x1d7/0x93b
> invoke_softirq [inline]
> __irq_exit_rcu [inline]
> irq_exit_rcu+0xf2/0x130
> sysvec_apic_timer_interrupt+0x93/0xc0
>
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.
What workload is this running? If we hit the RCU unregister path
from inode freeing we have a lifetime problem somewhere that we need
to fix first.
On 2021/10/14 下午7:24, Matthew Wilcox wrote:
> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>> The bdi_remove_from_list() is called in RCU softirq, however the
>> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
>> instead of it.
>>
>> Reported-by: Hao Sun <[email protected]>
>> Signed-off-by: Zqiang <[email protected]>
>> ---
>> include/linux/backing-dev-defs.h | 1 +
>> mm/backing-dev.c | 4 +---
>> 2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
>> index 33207004cfde..35a093384518 100644
>> --- a/include/linux/backing-dev-defs.h
>> +++ b/include/linux/backing-dev-defs.h
>> @@ -202,6 +202,7 @@ struct backing_dev_info {
>> #ifdef CONFIG_DEBUG_FS
>> struct dentry *debug_dir;
>> #endif
>> + struct rcu_head rcu;
>> };
> Instead of growing struct backing_dev_info, it seems to me this rcu_head
> could be placed in a union with rb_node, since it will have been removed
> from the bdi_tree by this point and the tree is never walked under
> RCU protection?
>
Thanks for your advice, I find this bdi_tree is traversed under the
protection of a spin lock, not under the protection of RCU.
I find this modification does not avoid the problem described in patch,
the flush_delayed_work() may be called in release_bdi()
The same will cause problems.
may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
i_callback) or do you have any better suggestions?
Thanks
Zqiang
On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
>
> On 2021/10/15 上午10:57, Qiang Zhang wrote:
> >
> >
> > Matthew Wilcox <[email protected] <mailto:[email protected]>>
> > 于2021年10月14日周四 下午7:26写道:
> >
> > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> > > The bdi_remove_from_list() is called in RCU softirq, however the
> > > synchronize_rcu_expedited() will produce sleep action, use
> > kfree_rcu()
> > > instead of it.
> > >
> > > Reported-by: Hao Sun <[email protected]
> > <mailto:[email protected]>>
> > > Signed-off-by: Zqiang <[email protected]
> > <mailto:[email protected]>>
> > > ---
> > > include/linux/backing-dev-defs.h | 1 +
> > > mm/backing-dev.c | 4 +---
> > > 2 files changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/backing-dev-defs.h
> > b/include/linux/backing-dev-defs.h
> > > index 33207004cfde..35a093384518 100644
> > > --- a/include/linux/backing-dev-defs.h
> > > +++ b/include/linux/backing-dev-defs.h
> > > @@ -202,6 +202,7 @@ struct backing_dev_info {
> > > #ifdef CONFIG_DEBUG_FS
> > > struct dentry *debug_dir;
> > > #endif
> > > + struct rcu_head rcu;
> > > };
> >
> > >Instead of growing struct backing_dev_info, it seems to me this
> > rcu_head
> > >could be placed in a union with rb_node, since it will have been
> > removed
> > >from the bdi_tree by this point and the tree is never walked under
> > >RCU protection?
> >
> >
> > Thanks for your advice, I find this bdi_tree is traversed under the
> > protection of a spin lock, not under the protection of RCU.
> > I find this modification does not avoid the problem described in patch,
> > the flush_delayed_work() may be called in release_bdi()
> > The same will cause problems.
> > may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
> > i_callback) or do you have any better suggestions?
What? All I was suggesting was:
+++ b/include/linux/backing-dev-defs.h
@@ -168,7 +168,10 @@ struct bdi_writeback {
struct backing_dev_info {
u64 id;
- struct rb_node rb_node; /* keyed by ->id */
+ union {
+ struct rb_node rb_node; /* keyed by ->id */
+ struct rcu_head rcu;
+ };
struct list_head bdi_list;
unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
unsigned long io_pages; /* max allowed IO size */
Christoph, independent of the inode lifetime problem, this actually seems
like a good approach to take. I don't see why we should synchronize_rcu()
here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
(converted it to use _expedited) and Tejun (worked around a problem when
using _expedited).
On Fri, Oct 15, 2021 at 01:35:56PM +0100, Matthew Wilcox wrote:
> struct backing_dev_info {
> u64 id;
> - struct rb_node rb_node; /* keyed by ->id */
> + union {
> + struct rb_node rb_node; /* keyed by ->id */
> + struct rcu_head rcu;
> + };
> struct list_head bdi_list;
> unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
> unsigned long io_pages; /* max allowed IO size */
>
>
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take. I don't see why we should synchronize_rcu()
> here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).
The kfree+rcu + your suggestion does seem like a good idea in general to
me. But I'd still like to fix the actual bug being reported before
optimizing the area in a way that papers over the bug.
On 2021/10/15 下午8:35, Matthew Wilcox wrote:
> On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
>> On 2021/10/15 上午10:57, Qiang Zhang wrote:
>>>
>>> Matthew Wilcox <[email protected] <mailto:[email protected]>>
>>> 于2021年10月14日周四 下午7:26写道:
>>>
>>> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>>> > The bdi_remove_from_list() is called in RCU softirq, however the
>>> > synchronize_rcu_expedited() will produce sleep action, use
>>> kfree_rcu()
>>> > instead of it.
>>> >
>>> > Reported-by: Hao Sun <[email protected]
>>> <mailto:[email protected]>>
>>> > Signed-off-by: Zqiang <[email protected]
>>> <mailto:[email protected]>>
>>> > ---
>>> > include/linux/backing-dev-defs.h | 1 +
>>> > mm/backing-dev.c | 4 +---
>>> > 2 files changed, 2 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/include/linux/backing-dev-defs.h
>>> b/include/linux/backing-dev-defs.h
>>> > index 33207004cfde..35a093384518 100644
>>> > --- a/include/linux/backing-dev-defs.h
>>> > +++ b/include/linux/backing-dev-defs.h
>>> > @@ -202,6 +202,7 @@ struct backing_dev_info {
>>> > #ifdef CONFIG_DEBUG_FS
>>> > struct dentry *debug_dir;
>>> > #endif
>>> > + struct rcu_head rcu;
>>> > };
>>>
>>> >Instead of growing struct backing_dev_info, it seems to me this
>>> rcu_head
>>> >could be placed in a union with rb_node, since it will have been
>>> removed
>>> >from the bdi_tree by this point and the tree is never walked under
>>> >RCU protection?
>>>
>>>
>>> Thanks for your advice, I find this bdi_tree is traversed under the
>>> protection of a spin lock, not under the protection of RCU.
>>> I find this modification does not avoid the problem described in patch,
>>> the flush_delayed_work() may be called in release_bdi()
>>> The same will cause problems.
>>> may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
>>> i_callback) or do you have any better suggestions?
> What? All I was suggesting was:
>
> +++ b/include/linux/backing-dev-defs.h
> @@ -168,7 +168,10 @@ struct bdi_writeback {
>
> struct backing_dev_info {
> u64 id;
> - struct rb_node rb_node; /* keyed by ->id */
> + union {
> + struct rb_node rb_node; /* keyed by ->id */
> + struct rcu_head rcu;
> + };
> struct list_head bdi_list;
> unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
> unsigned long io_pages; /* max allowed IO size */
>
>
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take. I don't see why we should synchronize_rcu()
> here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).
Sorry,this my mistake. this problem and the inode lifetime cycle are
two different problems
Can this modification which use kfree_rcu() instead of synchronize_rcu()
be accepted?
Thanks
Zqiang