2016-06-15 20:49:57

by Calvin Owens

[permalink] [raw]
Subject: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

Hi Jan,

I'm hitting the following warning on a 4.0 kernel:

WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
Call Trace:
[<ffffffff8176af8e>] dump_stack+0x4d/0x63
[<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
[<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
[<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
[<ffffffff811d248b>] evict+0xbb/0x190
[<ffffffff811d2d6d>] iput+0x17d/0x1e0
[<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
[<ffffffff811ce2d1>] dput+0x1a1/0x1f0
[<ffffffff811b952a>] __fput+0x17a/0x210
[<ffffffff811b960e>] ____fput+0xe/0x10
[<ffffffff81086dbf>] task_work_run+0xbf/0x100
[<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
[<ffffffff81771529>] int_signal+0x12/0x17

Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
this in 3.11. The check was entirely removed in 4.6.

Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
see if I can reproduce it on upstream, but since the check got removed I
was wondering if there was post-4.0 work that makes it obsolete?

Thanks,
Calvin

--
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=822dbba
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=600be30


2016-06-16 08:33:07

by Jan Kara

[permalink] [raw]
Subject: Re: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

Hi Calvin,

On Wed 15-06-16 16:49:39, Calvin Owens wrote:
> I'm hitting the following warning on a 4.0 kernel:
>
> WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
> CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
> Call Trace:
> [<ffffffff8176af8e>] dump_stack+0x4d/0x63
> [<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
> [<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
> [<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
> [<ffffffff811d248b>] evict+0xbb/0x190
> [<ffffffff811d2d6d>] iput+0x17d/0x1e0
> [<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
> [<ffffffff811ce2d1>] dput+0x1a1/0x1f0
> [<ffffffff811b952a>] __fput+0x17a/0x210
> [<ffffffff811b960e>] ____fput+0xe/0x10
> [<ffffffff81086dbf>] task_work_run+0xbf/0x100
> [<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
> [<ffffffff81771529>] int_signal+0x12/0x17
>
> Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
> this in 3.11. The check was entirely removed in 4.6.

OK, so this is the warning:

WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));

It was removed in 4.6 since we maintained i_ioend_count only to be able to
do this check and it didn't trigger for a long time. So it is interesting
that it actually triggered for you with 4.0.

> Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
> see if I can reproduce it on upstream, but since the check got removed I
> was wondering if there was post-4.0 work that makes it obsolete?

It would be great. I'm attaching a revert and an additional debug patch. If
you can run with these two on the latest kernel (or even just apply the debug
patch on top of 4.0) and reproduce the issue with it, I would be grateful.

Also if you reproduce the issue, please send me also your System.map so that
I can map the address of the function where ioend was created. Thanks!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-06-17 17:40:56

by Calvin Owens

[permalink] [raw]
Subject: Re: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

On Thursday 06/16 at 10:33 +0200, Jan Kara wrote:
> Hi Calvin,
>
> On Wed 15-06-16 16:49:39, Calvin Owens wrote:
> > I'm hitting the following warning on a 4.0 kernel:
> >
> > WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
> > CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
> > Call Trace:
> > [<ffffffff8176af8e>] dump_stack+0x4d/0x63
> > [<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
> > [<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
> > [<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
> > [<ffffffff811d248b>] evict+0xbb/0x190
> > [<ffffffff811d2d6d>] iput+0x17d/0x1e0
> > [<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
> > [<ffffffff811ce2d1>] dput+0x1a1/0x1f0
> > [<ffffffff811b952a>] __fput+0x17a/0x210
> > [<ffffffff811b960e>] ____fput+0xe/0x10
> > [<ffffffff81086dbf>] task_work_run+0xbf/0x100
> > [<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
> > [<ffffffff81771529>] int_signal+0x12/0x17
> >
> > Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
> > this in 3.11. The check was entirely removed in 4.6.
>
> OK, so this is the warning:
>
> WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
>
> It was removed in 4.6 since we maintained i_ioend_count only to be able to
> do this check and it didn't trigger for a long time. So it is interesting
> that it actually triggered for you with 4.0.
>
> > Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
> > see if I can reproduce it on upstream, but since the check got removed I
> > was wondering if there was post-4.0 work that makes it obsolete?
>
> It would be great. I'm attaching a revert and an additional debug patch. If
> you can run with these two on the latest kernel (or even just apply the debug
> patch on top of 4.0) and reproduce the issue with it, I would be grateful.

Forgot to attach the patches? Or did my mailserver eat them? :)

Thanks,
Calvin

> Also if you reproduce the issue, please send me also your System.map so that
> I can map the address of the function where ioend was created. Thanks!
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2016-06-17 18:04:42

by Jan Kara

[permalink] [raw]
Subject: Re: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

On Fri 17-06-16 13:40:33, Calvin Owens wrote:
> On Thursday 06/16 at 10:33 +0200, Jan Kara wrote:
> > Hi Calvin,
> >
> > On Wed 15-06-16 16:49:39, Calvin Owens wrote:
> > > I'm hitting the following warning on a 4.0 kernel:
> > >
> > > WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
> > > CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
> > > Call Trace:
> > > [<ffffffff8176af8e>] dump_stack+0x4d/0x63
> > > [<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
> > > [<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
> > > [<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
> > > [<ffffffff811d248b>] evict+0xbb/0x190
> > > [<ffffffff811d2d6d>] iput+0x17d/0x1e0
> > > [<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
> > > [<ffffffff811ce2d1>] dput+0x1a1/0x1f0
> > > [<ffffffff811b952a>] __fput+0x17a/0x210
> > > [<ffffffff811b960e>] ____fput+0xe/0x10
> > > [<ffffffff81086dbf>] task_work_run+0xbf/0x100
> > > [<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
> > > [<ffffffff81771529>] int_signal+0x12/0x17
> > >
> > > Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
> > > this in 3.11. The check was entirely removed in 4.6.
> >
> > OK, so this is the warning:
> >
> > WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
> >
> > It was removed in 4.6 since we maintained i_ioend_count only to be able to
> > do this check and it didn't trigger for a long time. So it is interesting
> > that it actually triggered for you with 4.0.
> >
> > > Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
> > > see if I can reproduce it on upstream, but since the check got removed I
> > > was wondering if there was post-4.0 work that makes it obsolete?
> >
> > It would be great. I'm attaching a revert and an additional debug patch. If
> > you can run with these two on the latest kernel (or even just apply the debug
> > patch on top of 4.0) and reproduce the issue with it, I would be grateful.
>
> Forgot to attach the patches? Or did my mailserver eat them? :)

Sorry, probably forgot to attach. Here they are.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (2.18 kB)
0001-Revert-ext4-remove-i_ioend_count.patch (2.91 kB)
0002-ext4-Debug-outstanding-io_ends.patch (4.55 kB)
Download all attachments

2016-06-30 07:00:36

by Jan Kara

[permalink] [raw]
Subject: Re: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

On Fri 17-06-16 20:04:34, Jan Kara wrote:
> On Fri 17-06-16 13:40:33, Calvin Owens wrote:
> > On Thursday 06/16 at 10:33 +0200, Jan Kara wrote:
> > > Hi Calvin,
> > >
> > > On Wed 15-06-16 16:49:39, Calvin Owens wrote:
> > > > I'm hitting the following warning on a 4.0 kernel:
> > > >
> > > > WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
> > > > CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
> > > > Call Trace:
> > > > [<ffffffff8176af8e>] dump_stack+0x4d/0x63
> > > > [<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
> > > > [<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
> > > > [<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
> > > > [<ffffffff811d248b>] evict+0xbb/0x190
> > > > [<ffffffff811d2d6d>] iput+0x17d/0x1e0
> > > > [<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
> > > > [<ffffffff811ce2d1>] dput+0x1a1/0x1f0
> > > > [<ffffffff811b952a>] __fput+0x17a/0x210
> > > > [<ffffffff811b960e>] ____fput+0xe/0x10
> > > > [<ffffffff81086dbf>] task_work_run+0xbf/0x100
> > > > [<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
> > > > [<ffffffff81771529>] int_signal+0x12/0x17
> > > >
> > > > Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
> > > > this in 3.11. The check was entirely removed in 4.6.
> > >
> > > OK, so this is the warning:
> > >
> > > WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
> > >
> > > It was removed in 4.6 since we maintained i_ioend_count only to be able to
> > > do this check and it didn't trigger for a long time. So it is interesting
> > > that it actually triggered for you with 4.0.
> > >
> > > > Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
> > > > see if I can reproduce it on upstream, but since the check got removed I
> > > > was wondering if there was post-4.0 work that makes it obsolete?
> > >
> > > It would be great. I'm attaching a revert and an additional debug patch. If
> > > you can run with these two on the latest kernel (or even just apply the debug
> > > patch on top of 4.0) and reproduce the issue with it, I would be grateful.
> >
> > Forgot to attach the patches? Or did my mailserver eat them? :)
>
> Sorry, probably forgot to attach. Here they are.

Any luck with the patch?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-07-08 00:24:26

by Calvin Owens

[permalink] [raw]
Subject: Re: WARNING at fs/ext4/inode.c ext4_evict_inode() triggers on 4.0

On 06/30/2016 12:00 AM, Jan Kara wrote:
> On Fri 17-06-16 20:04:34, Jan Kara wrote:
>> On Fri 17-06-16 13:40:33, Calvin Owens wrote:
>>> On Thursday 06/16 at 10:33 +0200, Jan Kara wrote:
>>>> Hi Calvin,
>>>>
>>>> On Wed 15-06-16 16:49:39, Calvin Owens wrote:
>>>>> I'm hitting the following warning on a 4.0 kernel:
>>>>>
>>>>> WARNING: CPU: 15 PID: 1005611 at fs/ext4/inode.c:233 ext4_evict_inode+0x4be/0x4d0()
>>>>> CPU: 15 PID: 1005611 Comm: rocksdb:bg0 Not tainted 4.0.9-60_fbk10_rc1_3974_g796b9b6 #1
>>>>> Call Trace:
>>>>> [<ffffffff8176af8e>] dump_stack+0x4d/0x63
>>>>> [<ffffffff8106c4dc>] warn_slowpath_common+0x8c/0xd0
>>>>> [<ffffffff8106c5da>] warn_slowpath_null+0x1a/0x20
>>>>> [<ffffffff81242c0e>] ext4_evict_inode+0x4be/0x4d0
>>>>> [<ffffffff811d248b>] evict+0xbb/0x190
>>>>> [<ffffffff811d2d6d>] iput+0x17d/0x1e0
>>>>> [<ffffffff811ce0e0>] __dentry_kill+0x190/0x1e0
>>>>> [<ffffffff811ce2d1>] dput+0x1a1/0x1f0
>>>>> [<ffffffff811b952a>] __fput+0x17a/0x210
>>>>> [<ffffffff811b960e>] ____fput+0xe/0x10
>>>>> [<ffffffff81086dbf>] task_work_run+0xbf/0x100
>>>>> [<ffffffff81002bcc>] do_notify_resume+0x7c/0x90
>>>>> [<ffffffff81771529>] int_signal+0x12/0x17
>>>>>
>>>>> Commit 822dbba ("ext4: fix warning in ext4_evict_inode()") proportedly fixed
>>>>> this in 3.11. The check was entirely removed in 4.6.
>>>>
>>>> OK, so this is the warning:
>>>>
>>>> WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
>>>>
>>>> It was removed in 4.6 since we maintained i_ioend_count only to be able to
>>>> do this check and it didn't trigger for a long time. So it is interesting
>>>> that it actually triggered for you with 4.0.
>>>>
>>>>> Is it interesting to you that this triggers on 4.0? I can revert 600be30 and
>>>>> see if I can reproduce it on upstream, but since the check got removed I
>>>>> was wondering if there was post-4.0 work that makes it obsolete?
>>>>
>>>> It would be great. I'm attaching a revert and an additional debug patch. If
>>>> you can run with these two on the latest kernel (or even just apply the debug
>>>> patch on top of 4.0) and reproduce the issue with it, I would be grateful.
>>>
>>> Forgot to attach the patches? Or did my mailserver eat them? :)
>>
>> Sorry, probably forgot to attach. Here they are.
>
> Any luck with the patch?

No luck at all :/

Whatever triggers this is extraordinarily rare, and I haven't been able to pin it
down: I see the WARN from 1-2 random hosts each day across all of FB's servers.

I'll keep at it and let you know if I get anything, hopefully soon.

Thanks,
Calvin

> Honza
>