Hello Jan,
Just wanted to let you know I hit this[1] again on Linus' latest. The
setup/workload is *identical* to the reported one a few months ago.
[1] https://lkml.org/lkml/2013/8/1/532
Here's the complete output, I hope it helps...
------------[ cut here ]------------
WARNING: CPU: 42 PID: 74607 at fs/ext4/inode.c:230 ext4_evict_inode+0x4a6/0x4e0 [ext4]()
Modules linked in: fuse autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand pcc_cpufreq ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler microcode pcspkr serio_raw lpc_ich mfd_core sg hpilo hpwdt i7core_edac edac_core netxen_nic freq_table ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif crct10dif_common qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 42 PID: 74607 Comm: reaim Tainted: G W 3.12.0-rc5 #40
Hardware name: HP ProLiant DL980 G7, BIOS P66 08/31/2222
00000000000000e6 ffff8908c6c59d28 ffffffff81550dd8 00000000000000e6
0000000000000000 ffff8908c6c59d68 ffffffff8104e29c ffff8908c6c59d78
ffff880afae83500 ffff880afae83608 ffff880afae83450 ffff880afae83500
Call Trace:
[<ffffffff81550dd8>] dump_stack+0x49/0x61
[<ffffffff8104e29c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8104e2ea>] warn_slowpath_null+0x1a/0x20
[<ffffffffa03afe16>] ext4_evict_inode+0x4a6/0x4e0 [ext4]
[<ffffffff81197ab7>] evict+0xa7/0x1c0
[<ffffffff81197cbf>] iput_final+0xef/0x190
[<ffffffff81197d9e>] iput+0x3e/0x50
[<ffffffff81195858>] d_kill+0xd8/0x110
[<ffffffff81195a1b>] dput+0x18b/0x1c0
[<ffffffff8117f588>] __fput+0x188/0x270
[<ffffffff8117f73e>] ____fput+0xe/0x10
[<ffffffff8106d0bf>] task_work_run+0x8f/0xf0
[<ffffffff81003a84>] do_notify_resume+0x84/0x90
[<ffffffff8155d9da>] int_signal+0x12/0x17
---[ end trace ea7184a5fb48e185 ]---
Thanks,
Davidlohr
On Tue 15-10-13 19:02:04, Davidlohr Bueso wrote:
> Hello Jan,
>
> Just wanted to let you know I hit this[1] again on Linus' latest. The
> setup/workload is *identical* to the reported one a few months ago.
>
> [1] https://lkml.org/lkml/2013/8/1/532
>
> Here's the complete output, I hope it helps...
Thanks for the headup. Last time I wasn't able to reproduce this and
eventually forgot about the problem. Can I send you a debug patch and you
would run a kernel with it? Thanks.
Honza
> ------------[ cut here ]------------
> WARNING: CPU: 42 PID: 74607 at fs/ext4/inode.c:230 ext4_evict_inode+0x4a6/0x4e0 [ext4]()
> Modules linked in: fuse autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand pcc_cpufreq ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler microcode pcspkr serio_raw lpc_ich mfd_core sg hpilo hpwdt i7core_edac edac_core netxen_nic freq_table ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif crct10dif_common qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
> CPU: 42 PID: 74607 Comm: reaim Tainted: G W 3.12.0-rc5 #40
> Hardware name: HP ProLiant DL980 G7, BIOS P66 08/31/2222
> 00000000000000e6 ffff8908c6c59d28 ffffffff81550dd8 00000000000000e6
> 0000000000000000 ffff8908c6c59d68 ffffffff8104e29c ffff8908c6c59d78
> ffff880afae83500 ffff880afae83608 ffff880afae83450 ffff880afae83500
> Call Trace:
> [<ffffffff81550dd8>] dump_stack+0x49/0x61
> [<ffffffff8104e29c>] warn_slowpath_common+0x8c/0xc0
> [<ffffffff8104e2ea>] warn_slowpath_null+0x1a/0x20
> [<ffffffffa03afe16>] ext4_evict_inode+0x4a6/0x4e0 [ext4]
> [<ffffffff81197ab7>] evict+0xa7/0x1c0
> [<ffffffff81197cbf>] iput_final+0xef/0x190
> [<ffffffff81197d9e>] iput+0x3e/0x50
> [<ffffffff81195858>] d_kill+0xd8/0x110
> [<ffffffff81195a1b>] dput+0x18b/0x1c0
> [<ffffffff8117f588>] __fput+0x188/0x270
> [<ffffffff8117f73e>] ____fput+0xe/0x10
> [<ffffffff8106d0bf>] task_work_run+0x8f/0xf0
> [<ffffffff81003a84>] do_notify_resume+0x84/0x90
> [<ffffffff8155d9da>] int_signal+0x12/0x17
> ---[ end trace ea7184a5fb48e185 ]---
>
>
> Thanks,
> Davidlohr
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Wed, 2013-10-16 at 14:50 +0200, Jan Kara wrote:
> On Tue 15-10-13 19:02:04, Davidlohr Bueso wrote:
> > Hello Jan,
> >
> > Just wanted to let you know I hit this[1] again on Linus' latest. The
> > setup/workload is *identical* to the reported one a few months ago.
> >
> > [1] https://lkml.org/lkml/2013/8/1/532
> >
> > Here's the complete output, I hope it helps...
> Thanks for the headup. Last time I wasn't able to reproduce this and
> eventually forgot about the problem. Can I send you a debug patch and you
> would run a kernel with it? Thanks.
Sure.
On Wed 16-10-13 08:56:07, Davidlohr Bueso wrote:
> On Wed, 2013-10-16 at 14:50 +0200, Jan Kara wrote:
> > On Tue 15-10-13 19:02:04, Davidlohr Bueso wrote:
> > > Hello Jan,
> > >
> > > Just wanted to let you know I hit this[1] again on Linus' latest. The
> > > setup/workload is *identical* to the reported one a few months ago.
> > >
> > > [1] https://lkml.org/lkml/2013/8/1/532
> > >
> > > Here's the complete output, I hope it helps...
> > Thanks for the headup. Last time I wasn't able to reproduce this and
> > eventually forgot about the problem. Can I send you a debug patch and you
> > would run a kernel with it? Thanks.
>
> Sure.
OK, attached is the debug patch. Please apply it and send dmesg when the
problem reproduces. Also please attach System.map so that I can map the
'created at' addresses to symbols. Thanks.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Wed 16-10-13 20:38:07, Jan Kara wrote:
> On Wed 16-10-13 08:56:07, Davidlohr Bueso wrote:
> > On Wed, 2013-10-16 at 14:50 +0200, Jan Kara wrote:
> > > On Tue 15-10-13 19:02:04, Davidlohr Bueso wrote:
> > > > Hello Jan,
> > > >
> > > > Just wanted to let you know I hit this[1] again on Linus' latest. The
> > > > setup/workload is *identical* to the reported one a few months ago.
> > > >
> > > > [1] https://lkml.org/lkml/2013/8/1/532
> > > >
> > > > Here's the complete output, I hope it helps...
> > > Thanks for the headup. Last time I wasn't able to reproduce this and
> > > eventually forgot about the problem. Can I send you a debug patch and you
> > > would run a kernel with it? Thanks.
> >
> > Sure.
> OK, attached is the debug patch. Please apply it and send dmesg when the
> problem reproduces. Also please attach System.map so that I can map the
> 'created at' addresses to symbols. Thanks.
Ping? Any luck with the debug patch?
Honza
> From 1411960edc40abe5db5344fed04bf6370dc432e8 Mon Sep 17 00:00:00 2001
> From: Jan Kara <[email protected]>
> Date: Wed, 16 Oct 2013 20:32:58 +0200
> Subject: [PATCH] ext4: Debug outstanding io_ends
>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/ext4/ext4.h | 3 +++
> fs/ext4/inode.c | 24 ++++++++++++++++++++++--
> fs/ext4/page-io.c | 11 +++++++++++
> 3 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index af815ea9d7cc..82333b4ba7bf 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -187,6 +187,7 @@ struct ext4_map_blocks {
> */
> typedef struct ext4_io_end {
> struct list_head list; /* per-file finished IO list */
> + struct list_head full_list;
> handle_t *handle; /* handle reserved for extent
> * conversion */
> struct inode *inode; /* file being written to */
> @@ -196,6 +197,7 @@ typedef struct ext4_io_end {
> loff_t offset; /* offset in the file */
> ssize_t size; /* size of the extent */
> atomic_t count; /* reference counter */
> + unsigned long created_at;
> } ext4_io_end_t;
>
> struct ext4_io_submit {
> @@ -907,6 +909,7 @@ struct ext4_inode_info {
> * transaction reserved
> */
> struct list_head i_rsv_conversion_list;
> + struct list_head i_ioend_list;
> /*
> * Completed IOs that need unwritten extents handling and don't have
> * transaction reserved
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e7e5b3d8f002..c053e977caa1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -176,6 +176,20 @@ int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode,
> return ret;
> }
>
> +static void dump_ioends(struct inode *inode, struct list_head *head)
> +{
> + ext4_io_end_t *io;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&EXT4_I(inode)->i_completed_io_lock, flags);
> + list_for_each_entry(io, head, full_list) {
> + printk("ioend %p, created at 0x%lx: handle=%p, bio=%p, flag=%u, offset=%lu, len=%u, count=%d\n",
> + io, io->created_at, io->handle, io->bio, io->flag, (unsigned long)io->offset,
> + (unsigned)io->size, (int)atomic_read(&io->count));
> + }
> + spin_unlock_irqrestore(&EXT4_I(inode)->i_completed_io_lock, flags);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero.
> */
> @@ -216,7 +230,10 @@ void ext4_evict_inode(struct inode *inode)
> }
> truncate_inode_pages(&inode->i_data, 0);
>
> - WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
> + if (WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count))) {
> + printk("ioend_count=%d, i_unwritten=%d\n", (int)atomic_read(&EXT4_I(inode)->i_ioend_count), (int)atomic_read(&EXT4_I(inode)->i_unwritten));
> + dump_ioends(inode, &EXT4_I(inode)->i_ioend_list);
> + }
> goto no_delete;
> }
>
> @@ -227,7 +244,10 @@ void ext4_evict_inode(struct inode *inode)
> ext4_begin_ordered_truncate(inode, 0);
> truncate_inode_pages(&inode->i_data, 0);
>
> - WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
> + if (WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count))) {
> + printk("ioend_count=%d, i_unwritten=%d\n", (int)atomic_read(&EXT4_I(inode)->i_ioend_count), (int)atomic_read(&EXT4_I(inode)->i_unwritten));
> + dump_ioends(inode, &EXT4_I(inode)->i_ioend_list);
> + }
> if (is_bad_inode(inode))
> goto no_delete;
>
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index d488f80ee32d..8cce9abf9d31 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -110,6 +110,7 @@ static void ext4_finish_bio(struct bio *bio)
> static void ext4_release_io_end(ext4_io_end_t *io_end)
> {
> struct bio *bio, *next_bio;
> + unsigned long flags;
>
> BUG_ON(!list_empty(&io_end->list));
> BUG_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
> @@ -118,6 +119,10 @@ static void ext4_release_io_end(ext4_io_end_t *io_end)
> if (atomic_dec_and_test(&EXT4_I(io_end->inode)->i_ioend_count))
> wake_up_all(ext4_ioend_wq(io_end->inode));
>
> + spin_lock_irqsave(&EXT4_I(io_end->inode)->i_completed_io_lock, flags);
> + list_del(&io_end->full_list);
> + spin_unlock_irqrestore(&EXT4_I(io_end->inode)->i_completed_io_lock, flags);
> +
> for (bio = io_end->bio; bio; bio = next_bio) {
> next_bio = bio->bi_private;
> ext4_finish_bio(bio);
> @@ -252,10 +257,16 @@ ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
> {
> ext4_io_end_t *io = kmem_cache_zalloc(io_end_cachep, flags);
> if (io) {
> + unsigned long flags;
> +
> + spin_lock_irqsave(&EXT4_I(inode)->i_completed_io_lock, flags);
> + list_add(&io->full_list, &EXT4_I(inode)->i_ioend_list);
> + spin_unlock_irqrestore(&EXT4_I(inode)->i_completed_io_lock, flags);
> atomic_inc(&EXT4_I(inode)->i_ioend_count);
> io->inode = inode;
> INIT_LIST_HEAD(&io->list);
> atomic_set(&io->count, 1);
> + io->created_at = _RET_IP_;
> }
> return io;
> }
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, 2013-11-29 at 08:37 +0100, Jan Kara wrote:
> On Wed 16-10-13 20:38:07, Jan Kara wrote:
> > On Wed 16-10-13 08:56:07, Davidlohr Bueso wrote:
> > > On Wed, 2013-10-16 at 14:50 +0200, Jan Kara wrote:
> > > > On Tue 15-10-13 19:02:04, Davidlohr Bueso wrote:
> > > > > Hello Jan,
> > > > >
> > > > > Just wanted to let you know I hit this[1] again on Linus' latest. The
> > > > > setup/workload is *identical* to the reported one a few months ago.
> > > > >
> > > > > [1] https://lkml.org/lkml/2013/8/1/532
> > > > >
> > > > > Here's the complete output, I hope it helps...
> > > > Thanks for the headup. Last time I wasn't able to reproduce this and
> > > > eventually forgot about the problem. Can I send you a debug patch and you
> > > > would run a kernel with it? Thanks.
> > >
> > > Sure.
> > OK, attached is the debug patch. Please apply it and send dmesg when the
> > problem reproduces. Also please attach System.map so that I can map the
> > 'created at' addresses to symbols. Thanks.
> Ping? Any luck with the debug patch?
Sorry for the delay, I haven't had much time to look into this. I do
remember applying the patch and getting a panic - I don't have any more
data at the moment, hopefully I can get back soon with details.
Thanks,
Davidlohr