2016-08-03 07:22:08

by Angel Shtilianov

[permalink] [raw]
Subject: Sleeping function called in invalid context

While doing some testing on today's checkout of Linus' master branch I got the following:

[ 9.302725] BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
[ 9.304403] in_atomic(): 1, irqs_disabled(): 0, pid: 1718, name: mount
[ 9.305633] 8 locks held by mount/1718:
[ 9.306382] #0: (sb_writers#6){.+.+.+}, at: [<ffffffff811f0473>] __sb_start_write+0xd3/0xf0
[ 9.308178] #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff811f9574>] lock_rename+0xe4/0x120
[ 9.310096] #2: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff8120c157>] lock_two_nondirectories+0x47/0x90
[ 9.312195] #3: (&sb->s_type->i_mutex_key#9/4){+.+.+.}, at: [<ffffffff8120c18f>] lock_two_nondirectories+0x7f/0x90
[ 9.314315] #4: (&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<ffffffff812988c0>] ext4_writepages+0x50/0xff0
[ 9.316278] #5: (jbd2_handle){++++..}, at: [<ffffffff812e8d43>] start_this_handle+0x143/0x4c0
[ 9.318032] #6: (&ei->i_data_sem){++++..}, at: [<ffffffff81294803>] ext4_map_blocks+0x123/0x540
[ 9.319793] #7: (&(&bgl->locks[i].lock)->rlock){+.+...}, at: [<ffffffff812d0df3>] ext4_mb_init_cache+0x3e3/0xa70
[ 9.321887] CPU: 0 PID: 1718 Comm: mount Not tainted 4.7.0-clouder1 #5
[ 9.322873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[ 9.322873] 0000000000000000 ffff88003a4d3058 ffffffff81375c97 ffffffff819c29f9
[ 9.322873] 0000000000000166 ffff880038a41500 ffffffff819c29f9 ffff88003a4d3088
[ 9.322873] ffffffff8108dc43 ffff88003a4d3088 0000000000000000 0000000000000166
[ 9.322873] Call Trace:
[ 9.322873] [<ffffffff81375c97>] dump_stack+0x6b/0xa4
[ 9.322873] [<ffffffff8108dc43>] ___might_sleep+0x183/0x240
[ 9.322873] [<ffffffff8108dd52>] __might_sleep+0x52/0x90
[ 9.322873] [<ffffffff812ae9c1>] ext4_commit_super+0x191/0x2a0
[ 9.322873] [<ffffffff812af790>] __ext4_grp_locked_error+0x170/0x240
[ 9.322873] [<ffffffff8108dd52>] ? __might_sleep+0x52/0x90
[ 9.322873] [<ffffffff812cf00c>] ext4_mb_generate_buddy+0x20c/0x360
[ 9.322873] [<ffffffff812d0e61>] ext4_mb_init_cache+0x451/0xa70
[ 9.322873] [<ffffffff812d167c>] ext4_mb_init_group+0x1fc/0x2b0
[ 9.322873] [<ffffffff812d73dd>] ? ext4_mb_new_blocks+0x24d/0x1160
[ 9.322873] [<ffffffff812d18af>] ext4_mb_good_group+0x17f/0x190
[ 9.322873] [<ffffffff812d6fa7>] ext4_mb_regular_allocator+0x2c7/0x4b0
[ 9.322873] [<ffffffff812d05ae>] ? ext4_mb_use_preallocated+0x3e/0x4a0
[ 9.322873] [<ffffffff811c9e0c>] ? kmem_cache_alloc+0x24c/0x2d0
[ 9.322873] [<ffffffff812d73dd>] ? ext4_mb_new_blocks+0x24d/0x1160
[ 9.322873] [<ffffffff812cdea4>] ? ext4_mb_initialize_context+0x84/0x1a0
[ 9.322873] [<ffffffff812d7a84>] ext4_mb_new_blocks+0x8f4/0x1160
[ 9.322873] [<ffffffff810b1bf6>] ? check_usage+0x86/0x4c0
[ 9.322873] [<ffffffff810b2f9b>] ? __lock_acquire+0x39b/0x1800
[ 9.322873] [<ffffffff810b379e>] ? __lock_acquire+0xb9e/0x1800
[ 9.322873] [<ffffffff812c28de>] ? ext4_find_extent+0x23e/0x2b0
[ 9.322873] [<ffffffff810c6fff>] ? rcu_read_lock_sched_held+0x4f/0x90
[ 9.322873] [<ffffffff811caf84>] ? __kmalloc+0x274/0x320
[ 9.322873] [<ffffffff812c28de>] ? ext4_find_extent+0x23e/0x2b0
[ 9.322873] [<ffffffff812c28de>] ? ext4_find_extent+0x23e/0x2b0
[ 9.322873] [<ffffffff812c7f5c>] ext4_ext_map_blocks+0xcfc/0x1140
[ 9.322873] [<ffffffff812de7d8>] ? ext4_es_lookup_extent+0x58/0x3a0
[ 9.322873] [<ffffffff81294803>] ? ext4_map_blocks+0x123/0x540
[ 9.322873] [<ffffffff81294803>] ? ext4_map_blocks+0x123/0x540
[ 9.322873] [<ffffffff81294827>] ext4_map_blocks+0x147/0x540
[ 9.322873] [<ffffffff81298ea8>] ? ext4_writepages+0x638/0xff0
[ 9.322873] [<ffffffff812991c4>] ext4_writepages+0x954/0xff0
[ 9.322873] [<ffffffff810b0908>] ? add_lock_to_list.clone.0+0x78/0xb0
[ 9.322873] [<ffffffff810b379e>] ? __lock_acquire+0xb9e/0x1800
[ 9.322873] [<ffffffff816fc1db>] ? _raw_spin_unlock+0x2b/0x40
[ 9.322873] [<ffffffff81223fb9>] ? wbc_attach_and_unlock_inode+0x179/0x290
[ 9.322873] [<ffffffff8116eb43>] do_writepages+0x23/0x40
[ 9.322873] [<ffffffff8115c6a5>] __filemap_fdatawrite_range+0xb5/0x100
[ 9.322873] [<ffffffff810b0908>] ? add_lock_to_list.clone.0+0x78/0xb0
[ 9.322873] [<ffffffff8115c8fc>] filemap_flush+0x1c/0x20
[ 9.322873] [<ffffffff81293bf6>] ext4_alloc_da_blocks+0x56/0x160
[ 9.322873] [<ffffffff812a7655>] ext4_rename+0x665/0x900
[ 9.322873] [<ffffffff8120c18f>] ? lock_two_nondirectories+0x7f/0x90
[ 9.322873] [<ffffffff8120c18f>] ? lock_two_nondirectories+0x7f/0x90
[ 9.322873] [<ffffffff812a7912>] ext4_rename2+0x22/0x40
[ 9.322873] [<ffffffff811fa5ec>] vfs_rename+0x27c/0x5e0
[ 9.322873] [<ffffffff811fea71>] SyS_renameat2+0x431/0x4c0
[ 9.322873] [<ffffffff811f88ce>] SyS_rename+0x1e/0x20
[ 9.322873] [<ffffffff816fc365>] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 9.322873] [<ffffffff816fc31a>] ? entry_SYSCALL_64_after_swapgs+0x17/0x4a

Line 358 is lock_buffer


2016-08-04 16:05:58

by Jan Kara

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context

On Wed 03-08-16 10:22:03, Nikolay Borisov wrote:
> While doing some testing on today's checkout of Linus' master branch I
> got the following:

>
> [ 9.302725] BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
> [ 9.304403] in_atomic(): 1, irqs_disabled(): 0, pid: 1718, name: mount
> [ 9.305633] 8 locks held by mount/1718:

Yeah, this looks like a regression cause by commit 4743f83990614af "ext4:
Fix WARN_ON_ONCE in ext4_commit_super()". Arguably that cure is worse than
the disease but OTOH calling ext4_commit_super() from an atomic context
(like __ext4_grp_locked_error() does) sucks as well.

I'm not sure what the right fix is here. The cleanest would probably be to
always drop group lock in __ext4_grp_locked_error() and make sure we always
properly bail out of mballoc code on such error. But that's a non-trivial
amount of work. Not sure if other ext4 people have opinion on this?

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-08-04 20:59:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context

On Thu, Aug 04, 2016 at 06:05:50PM +0200, Jan Kara wrote:
> On Wed 03-08-16 10:22:03, Nikolay Borisov wrote:
> > While doing some testing on today's checkout of Linus' master branch I
> > got the following:
>
> >
> > [ 9.302725] BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
> > [ 9.304403] in_atomic(): 1, irqs_disabled(): 0, pid: 1718, name: mount
> > [ 9.305633] 8 locks held by mount/1718:
>
> Yeah, this looks like a regression cause by commit 4743f83990614af "ext4:
> Fix WARN_ON_ONCE in ext4_commit_super()". Arguably that cure is worse than
> the disease but OTOH calling ext4_commit_super() from an atomic context
> (like __ext4_grp_locked_error() does) sucks as well.
>
> I'm not sure what the right fix is here. The cleanest would probably be to
> always drop group lock in __ext4_grp_locked_error() and make sure we always
> properly bail out of mballoc code on such error. But that's a non-trivial
> amount of work. Not sure if other ext4 people have opinion on this?

The easist way to fix this is defer the ext4_commit_super() to a
workqueue. We only need this in the errors=continue case, and in that
scenario we're not in a hurry when the superblock gets written out.

In fact, we probably want to be doing this for all of the
errors=continue cases when we want to save the error state to the
superblock, so we can do the update properly using the journal,
instead of calling ext4_commit_super() which just force writes the
block.

(Of course, if the journal is aborted we'll need to fall back to using
ext4_commit_super, of course.)

- Ted

2016-08-05 06:30:04

by Angel Shtilianov

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context



On 08/04/2016 11:58 PM, Theodore Ts'o wrote:
> On Thu, Aug 04, 2016 at 06:05:50PM +0200, Jan Kara wrote:
>> On Wed 03-08-16 10:22:03, Nikolay Borisov wrote:
>>> While doing some testing on today's checkout of Linus' master branch I
>>> got the following:
>>
>>>
>>> [ 9.302725] BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
>>> [ 9.304403] in_atomic(): 1, irqs_disabled(): 0, pid: 1718, name: mount
>>> [ 9.305633] 8 locks held by mount/1718:
>>
>> Yeah, this looks like a regression cause by commit 4743f83990614af "ext4:
>> Fix WARN_ON_ONCE in ext4_commit_super()". Arguably that cure is worse than
>> the disease but OTOH calling ext4_commit_super() from an atomic context
>> (like __ext4_grp_locked_error() does) sucks as well.
>>
>> I'm not sure what the right fix is here. The cleanest would probably be to
>> always drop group lock in __ext4_grp_locked_error() and make sure we always
>> properly bail out of mballoc code on such error. But that's a non-trivial
>> amount of work. Not sure if other ext4 people have opinion on this?
>
> The easist way to fix this is defer the ext4_commit_super() to a
> workqueue. We only need this in the errors=continue case, and in that
> scenario we're not in a hurry when the superblock gets written out.

Is errors=continue the default option if nothing specifically is
specified at mount time, since I don't have this set explicitly:

/dev/vda / ext4 rw,relatime,data=ordered 0 0

>
> In fact, we probably want to be doing this for all of the
> errors=continue cases when we want to save the error state to the
> superblock, so we can do the update properly using the journal,
> instead of calling ext4_commit_super() which just force writes the
> block.
>
> (Of course, if the journal is aborted we'll need to fall back to using
> ext4_commit_super, of course.)
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-08-05 14:56:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context

On Fri, Aug 05, 2016 at 09:29:59AM +0300, Nikolay Borisov wrote:
> > The easist way to fix this is defer the ext4_commit_super() to a
> > workqueue. We only need this in the errors=continue case, and in that
> > scenario we're not in a hurry when the superblock gets written out.
>
> Is errors=continue the default option if nothing specifically is
> specified at mount time, since I don't have this set explicitly:
>
> /dev/vda / ext4 rw,relatime,data=ordered 0 0

Yes, it's the default. I keep wondering whether we should change the
default to remount-ro or even panic, since people sometimes don't
notice that the "file system has been corrupted" messages, and then
they can end up losing a lot more detail if we forced them to address
the issue right away.

- Ted

2016-08-05 17:07:12

by Darrick J. Wong

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context

On Fri, Aug 05, 2016 at 10:56:04AM -0400, Theodore Ts'o wrote:
> On Fri, Aug 05, 2016 at 09:29:59AM +0300, Nikolay Borisov wrote:
> > > The easist way to fix this is defer the ext4_commit_super() to a
> > > workqueue. We only need this in the errors=continue case, and in that
> > > scenario we're not in a hurry when the superblock gets written out.
> >
> > Is errors=continue the default option if nothing specifically is
> > specified at mount time, since I don't have this set explicitly:
> >
> > /dev/vda / ext4 rw,relatime,data=ordered 0 0
>
> Yes, it's the default. I keep wondering whether we should change the
> default to remount-ro or even panic, since people sometimes don't
> notice that the "file system has been corrupted" messages, and then
> they can end up losing a lot more detail if we forced them to address
> the issue right away.

Yes, please. :)

(At the moment I'm also trying to sort out some ext3 bug where jbd
flushes some dirty pages to disk ahead of writing out a transaction
but the dirty page writeback fails. Apparently the userspace program
is long gone by the time this happens (I haven't been able to verify
that they're actually calling fsync) and the FS doesn't shut down,
but a later umount/mount/reread shows old file contents. Hence I
now know about data_err=abort.)

--D

>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-08-11 19:52:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: Sleeping function called in invalid context


> On Aug 5, 2016, at 8:56 AM, Theodore Ts'o <[email protected]> wrote:
>
> On Fri, Aug 05, 2016 at 09:29:59AM +0300, Nikolay Borisov wrote:
>>> The easist way to fix this is defer the ext4_commit_super() to a
>>> workqueue. We only need this in the errors=continue case, and in that
>>> scenario we're not in a hurry when the superblock gets written out.
>>
>> Is errors=continue the default option if nothing specifically is
>> specified at mount time, since I don't have this set explicitly:
>>
>> /dev/vda / ext4 rw,relatime,data=ordered 0 0
>
> Yes, it's the default. I keep wondering whether we should change the
> default to remount-ro or even panic, since people sometimes don't
> notice that the "file system has been corrupted" messages, and then
> they can end up losing a lot more detail if we forced them to address
> the issue right away.

I'd definitely be in favour of making the default "errors=remount-ro".
We've been setting that explicitly for years, since otherwise people
may not notice their ongoing problems until the filesystem completely
explodes.

Related to that, there is a Lustre patch to handle inconsistencies
between group descriptors and block/inode bitmaps by marking only the
group as unusable for new allocations, instead of marking the whole
filesystem in error. Is that something that is of interest to a wider
audience?

Patch against RHEL7 is attached, but could be updated for newer kernels
if there is interest.

Cheers, Andreas





Attachments:
ext4-corrupted-inode-block-bitmaps-handling-patches.patch (15.18 kB)
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail
Download all attachments