I can reproduce this fairly easily by using ext4 w/o a journal, running
under KVM with 1024megs memory, with fsstress (xfstests #13):
013 [ 45.152457]
[ 45.153294] =====================================
[ 45.154784] [ BUG: bad unlock balance detected! ]
[ 45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
[ 45.155591] -------------------------------------
[ 45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
[ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
[ 45.155591] but there are no more locks to release!
[ 45.155591]
[ 45.155591] other info that might help us debug this:
[ 45.155591] 1 lock held by flush-254:16/2499:
[ 45.155591] #0: (&sb->s_type->i_lock_key#13){+.+...}, at: [<c022c33d>] writeback_sb_inodes+0xc3/0x327
[ 45.155591]
[ 45.155591] stack backtrace:
[ 45.155591] Pid: 2499, comm: flush-254:16 Not tainted 3.5.0-rc1-00002-gb22b1f1 #124
[ 45.155591] Call Trace:
[ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c019930e>] print_unlock_inbalance_bug+0xb4/0xc1
[ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c019b5c2>] lock_release_non_nested+0x9f/0x1e8
[ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c019b875>] lock_release+0x16a/0x18a
[ 45.155591] [<c06ec713>] _raw_spin_unlock+0x1b/0x25
[ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
[ 45.155591] [<c022c784>] wb_writeback+0xf2/0x1c1
[ 45.155591] [<c0160477>] ? _local_bh_enable_ip+0x9d/0xa6
[ 45.155591] [<c022c8c4>] wb_do_writeback+0x71/0x18c
[ 45.155591] [<c022ca7b>] bdi_writeback_thread+0x9c/0x18d
[ 45.155591] [<c0199b97>] ? trace_hardirqs_on+0xb/0xd
[ 45.155591] [<c022c9df>] ? wb_do_writeback+0x18c/0x18c
[ 45.155591] [<c017316a>] kthread+0x6c/0x71
[ 45.155591] [<c01730fe>] ? __init_kthread_worker+0x47/0x47
[ 45.155591] [<c06f237a>] kernel_thread_helper+0x6/0x10
[ 105.196666] INFO: rcu_sched self-detected stall on CPU { 1} (t=18000 jiffies)
[ 105.196670] Pid: 2499, comm: flush-254:16 Not tainted 3.5.0-rc1-00002-gb22b1f1 #124
[ 105.199991] INFO: rcu_sched self-detected stall on CPU { 0} (t=18000 jiffies)
On Fri 08-06-12 10:36:13, Ted Tso wrote:
>
> I can reproduce this fairly easily by using ext4 w/o a journal, running
> under KVM with 1024megs memory, with fsstress (xfstests #13):
Argh, I wonder how come I didn't hit this. Does attached patch fix the
problem?
Honza
>
> 013 [ 45.152457]
> [ 45.153294] =====================================
> [ 45.154784] [ BUG: bad unlock balance detected! ]
> [ 45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
> [ 45.155591] -------------------------------------
> [ 45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
> [ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
> [ 45.155591] but there are no more locks to release!
> [ 45.155591]
> [ 45.155591] other info that might help us debug this:
> [ 45.155591] 1 lock held by flush-254:16/2499:
> [ 45.155591] #0: (&sb->s_type->i_lock_key#13){+.+...}, at: [<c022c33d>] writeback_sb_inodes+0xc3/0x327
> [ 45.155591]
> [ 45.155591] stack backtrace:
> [ 45.155591] Pid: 2499, comm: flush-254:16 Not tainted 3.5.0-rc1-00002-gb22b1f1 #124
> [ 45.155591] Call Trace:
> [ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c019930e>] print_unlock_inbalance_bug+0xb4/0xc1
> [ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c019b5c2>] lock_release_non_nested+0x9f/0x1e8
> [ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c022c3da>] ? writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c019b875>] lock_release+0x16a/0x18a
> [ 45.155591] [<c06ec713>] _raw_spin_unlock+0x1b/0x25
> [ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
> [ 45.155591] [<c022c784>] wb_writeback+0xf2/0x1c1
> [ 45.155591] [<c0160477>] ? _local_bh_enable_ip+0x9d/0xa6
> [ 45.155591] [<c022c8c4>] wb_do_writeback+0x71/0x18c
> [ 45.155591] [<c022ca7b>] bdi_writeback_thread+0x9c/0x18d
> [ 45.155591] [<c0199b97>] ? trace_hardirqs_on+0xb/0xd
> [ 45.155591] [<c022c9df>] ? wb_do_writeback+0x18c/0x18c
> [ 45.155591] [<c017316a>] kthread+0x6c/0x71
> [ 45.155591] [<c01730fe>] ? __init_kthread_worker+0x47/0x47
> [ 45.155591] [<c06f237a>] kernel_thread_helper+0x6/0x10
> [ 105.196666] INFO: rcu_sched self-detected stall on CPU { 1} (t=18000 jiffies)
> [ 105.196670] Pid: 2499, comm: flush-254:16 Not tainted 3.5.0-rc1-00002-gb22b1f1 #124
> [ 105.199991] INFO: rcu_sched self-detected stall on CPU { 0} (t=18000 jiffies)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
> On Fri 08-06-12 10:36:13, Ted Tso wrote:
> >
> > I can reproduce this fairly easily by using ext4 w/o a journal, running
> > under KVM with 1024megs memory, with fsstress (xfstests #13):
Good catch, thanks!
> Argh, I wonder how come I didn't hit this. Does attached patch fix the
> problem?
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 8d2fb8c..41a3ccf 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -664,6 +664,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> /* Wait for I_SYNC. This function drops i_lock... */
> inode_sleep_on_writeback(inode);
> /* Inode may be gone, start again */
> + spin_lock(&wb->list_lock);
> continue;
> }
That looks like the fix. So I pushed it to writeback-for-next.
Thanks for the quick fixing!
I'm yet to setup and run xfstests regularly, so as to catch such kind
of problems earlier in future.
Thanks,
Fengguang
On Fri, Jun 8, 2012 at 5:45 PM, Fengguang Wu <[email protected]> wrote:
> On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
>> On Fri 08-06-12 10:36:13, Ted Tso wrote:
>> >
>> > I can reproduce this fairly easily by using ext4 w/o a journal, running
>> > under KVM with 1024megs memory, with fsstress (xfstests #13):
>
> Good catch, thanks!
>
>> Argh, I wonder how come I didn't hit this. Does attached patch fix the
>> problem?
>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 8d2fb8c..41a3ccf 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -664,6 +664,7 @@ static long writeback_sb_inodes(struct super_block *sb,
>> /* Wait for I_SYNC. This function drops i_lock... */
>> inode_sleep_on_writeback(inode);
>> /* Inode may be gone, start again */
>> + spin_lock(&wb->list_lock);
>> continue;
>> }
>
> That looks like the fix. So I pushed it to writeback-for-next.
> Thanks for the quick fixing!
>
s/writeback-for-next/writeback-for-linus ?
- Sedat -
> I'm yet to setup and run xfstests regularly, so as to catch such kind
> of problems earlier in future.
>
> Thanks,
> Fengguang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 08, 2012 at 05:51:48PM +0200, Sedat Dilek wrote:
> On Fri, Jun 8, 2012 at 5:45 PM, Fengguang Wu <[email protected]> wrote:
> > On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
> >> On Fri 08-06-12 10:36:13, Ted Tso wrote:
> >> >
> >> > I can reproduce this fairly easily by using ext4 w/o a journal, running
> >> > under KVM with 1024megs memory, with fsstress (xfstests #13):
> >
> > Good catch, thanks!
> >
> >> Argh, I wonder how come I didn't hit this. Does attached patch fix the
> >> problem?
> >
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 8d2fb8c..41a3ccf 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -664,6 +664,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> /* Wait for I_SYNC. This function drops i_lock... */
> >> inode_sleep_on_writeback(inode);
> >> /* Inode may be gone, start again */
> >> + spin_lock(&wb->list_lock);
> >> continue;
> >> }
> >
> > That looks like the fix. So I pushed it to writeback-for-next.
> > Thanks for the quick fixing!
> >
>
> s/writeback-for-next/writeback-for-linus ?
I use the same branch for next and linus.. Before sending git pull
requests, I add a tag somewhere in the branch and ask Linus to pull
that tag :)
Thanks,
Fengguang
On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
> Argh, I wonder how come I didn't hit this. Does attached patch fix the
> problem?
Just to confirm (although there was little doubt), I've built a kernel
with this patch and it fixes the lockdep complaint.
- Ted
On Sat 09-06-12 00:45:35, Wu Fengguang wrote:
> On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
> > On Fri 08-06-12 10:36:13, Ted Tso wrote:
> > >
> > > I can reproduce this fairly easily by using ext4 w/o a journal, running
> > > under KVM with 1024megs memory, with fsstress (xfstests #13):
>
> Good catch, thanks!
>
> > Argh, I wonder how come I didn't hit this. Does attached patch fix the
> > problem?
>
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 8d2fb8c..41a3ccf 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -664,6 +664,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> > /* Wait for I_SYNC. This function drops i_lock... */
> > inode_sleep_on_writeback(inode);
> > /* Inode may be gone, start again */
> > + spin_lock(&wb->list_lock);
> > continue;
> > }
>
> That looks like the fix. So I pushed it to writeback-for-next.
> Thanks for the quick fixing!
Attached is a version with proper sign-off and changelog. Just in case
you didn't create one yourself already.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, Jun 08, 2012 at 04:28:40PM -0400, Theodore Ts'o wrote:
> On Fri, Jun 08, 2012 at 05:07:36PM +0200, Jan Kara wrote:
> > Argh, I wonder how come I didn't hit this. Does attached patch fix the
> > problem?
>
> Just to confirm (although there was little doubt), I've built a kernel
> with this patch and it fixes the lockdep complaint.
Great, thank you!
Here is the updated changelog:
writeback: Fix lock imbalance in writeback_sb_inodes()
Fix bug introduced by 169ebd90. We have to have wb_list_lock locked when
restarting writeback loop after having waited for inode writeback.
Bug description by Ted Tso:
I can reproduce this fairly easily by using ext4 w/o a journal, running
under KVM with 1024megs memory, with fsstress (xfstests #13):
[ 45.153294] =====================================
[ 45.154784] [ BUG: bad unlock balance detected! ]
[ 45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
[ 45.155591] -------------------------------------
[ 45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
[ 45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
[ 45.155591] but there are no more locks to release!
Reported-by: Theodore Ts'o <[email protected]>
Tested-by: Theodore Ts'o <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
Thanks,
Fengguang
On Sat, Jun 09, 2012 at 08:37:28AM +0900, Fengguang Wu wrote:
>
> Here is the updated changelog:
>
> writeback: Fix lock imbalance in writeback_sb_inodes()
>
> Fix bug introduced by 169ebd90. We have to have wb_list_lock locked when
> restarting writeback loop after having waited for inode writeback.
>
> Bug description by Ted Tso:
>
> I can reproduce this fairly easily by using ext4 w/o a journal, running
> under KVM with 1024megs memory, with fsstress (xfstests #13):
Not that it matters that much, but I typo'ed the description; sorry
about that. The KVM only had 1024k of memory....
- Ted
On Fri, Jun 08, 2012 at 10:38:04PM -0400, Theodore Ts'o wrote:
> On Sat, Jun 09, 2012 at 08:37:28AM +0900, Fengguang Wu wrote:
> >
> > Here is the updated changelog:
> >
> > writeback: Fix lock imbalance in writeback_sb_inodes()
> >
> > Fix bug introduced by 169ebd90. We have to have wb_list_lock locked when
> > restarting writeback loop after having waited for inode writeback.
> >
> > Bug description by Ted Tso:
> >
> > I can reproduce this fairly easily by using ext4 w/o a journal, running
> > under KVM with 1024megs memory, with fsstress (xfstests #13):
>
> Not that it matters that much, but I typo'ed the description; sorry
> about that. The KVM only had 1024k of memory....
1MB memory? How do you manage to boot it? Recently I tried running a
big fat kernel (with almost everything built in) that can easily OOM
(at boot time, before swapon) even given 256MB memory...
Thanks,
Fengguang
On Sat, Jun 09, 2012 at 07:07:31PM +0800, Fengguang Wu wrote:
>
> 1MB memory? How do you manage to boot it? Recently I tried running a
> big fat kernel (with almost everything built in) that can easily OOM
> (at boot time, before swapon) even given 256MB memory...
Sorry, I was confused; it was megs. That'll teach me to send e-mails
when I'm really tired. I normally run with half that (to really
stress the paging/writeback systems), and I was thinking that a gig of
memory was way more than I normally use. I think I was running into
some failures at 512m, which is why it got bumped to 1024. I'll have
to try it again at 512 and see if it blows up or not.
- Ted
On Sat, 2012-06-09 at 00:45 +0900, Fengguang Wu wrote:
> I'm yet to setup and run xfstests regularly, so as to catch such kind
> of problems earlier in future.
You definitely need to do this before merging anything upstream. I'd
recommend to use test at least with ext4 and xfs. Probably btrfs, but I
never tested it with xfstests.
--
Best Regards,
Artem Bityutskiy
On Mon, Jun 11, 2012 at 02:21:16PM +0300, Artem Bityutskiy wrote:
> On Sat, 2012-06-09 at 00:45 +0900, Fengguang Wu wrote:
> > I'm yet to setup and run xfstests regularly, so as to catch such kind
> > of problems earlier in future.
>
> You definitely need to do this before merging anything upstream. I'd
> recommend to use test at least with ext4 and xfs. Probably btrfs, but I
> never tested it with xfstests.
Sure! I'll try to test all of them.
Thanks,
Fengguang