Subject: Call trace in ext4_es_lru_add on 3.10 stable

Hi List,

I'm seeing this error pretty often at a vanilla 3.10.53 kernel.

Can anybody point me to a patch or tell me if this is a known bug?

Call Trace:
[<ffffffffa0311e56>] ext4_es_lru_add+0x26/0x80 [ext4]
[<ffffffffa0311f41>] ext4_es_lookup_extent+0x91/0x190 [ext4]
[<ffffffffa02d66b3>] ext4_map_blocks+0x43/0x450 [ext4]
[<ffffffffa02d8167>] _ext4_get_block+0x87/0x190 [ext4]
[<ffffffffa02d82c6>] ext4_get_block+0x16/0x20 [ext4]
[<ffffffff8117f63f>] generic_block_bmap+0x3f/0x50
[<ffffffffa017e4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
[<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
[<ffffffffa02d5d71>] ext4_bmap+0x91/0xf0 [ext4]
[<ffffffff8116866e>] bmap+0x1e/0x30
[<ffffffffa0187043>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
[<ffffffffa01872fd>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
[<ffffffffa017f238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
[<ffffffff81084e03>] ? idle_balance+0xd3/0x110
[<ffffffff8105a008>] ? lock_timer_base.isra.35+0x38/0x70
[<ffffffffa018491a>] kjournald2+0xba/0x230 [jbd2]
[<ffffffff81070350>] ? finish_wait+0x80/0x80
[<ffffffffa0184860>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
[<ffffffff8106fb50>] kthread+0xc0/0xd0
[<ffffffff8106fa90>] ? kthread_create_on_node+0x130/0x130
[<ffffffff815548ec>] ret_from_fork+0x7c/0xb0
[<ffffffff8106fa90>] ? kthread_create_on_node+0x130/0x130

--
Mit freundlichen Grüßen
Stefan Priebe
Bachelor of Science in Computer Science (BSCS)
Vorstand (CTO)

-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: [email protected]

Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)


2014-09-18 19:21:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Thu, Sep 18, 2014 at 03:08:10PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi List,
>
> I'm seeing this error pretty often at a vanilla 3.10.53 kernel.
>
> Can anybody point me to a patch or tell me if this is a known bug?

This is just call trace; can you send the complete set of kernel
messages? Please send everything starting a few lines before the

---------------- [ cut here ] ----------

line, and then going all the way to the

----------- [ end trace XXXXXXXXXX ] --------

line.

Cheers,

- Ted

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable


Am 18.09.2014 21:21, schrieb Theodore Ts'o:
> On Thu, Sep 18, 2014 at 03:08:10PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi List,
>>
>> I'm seeing this error pretty often at a vanilla 3.10.53 kernel.
>>
>> Can anybody point me to a patch or tell me if this is a known bug?
>
> This is just call trace; can you send the complete set of kernel
> messages? Please send everything starting a few lines before the
>
> ---------------- [ cut here ] ----------
>
> line, and then going all the way to the
>
> ----------- [ end trace XXXXXXXXXX ] --------


Sorry but whole output is:
2014-09-18 02:30:34 0000000000000000 ffff881021663b20
ffff881021663b08 ffffffffa02d66b3
2014-09-18 02:30:34 Call Trace:
2014-09-18 02:30:34 [<ffffffffa0311e56>] ext4_es_lru_add+0x26/0x80
[ext4]
2014-09-18 02:30:34 [<ffffffffa0311f41>]
ext4_es_lookup_extent+0x91/0x190 [ext4]
2014-09-18 02:30:34 [<ffffffffa02d66b3>] ext4_map_blocks+0x43/0x450
[ext4]
2014-09-18 02:30:34 [<ffffffffa02d8167>] _ext4_get_block+0x87/0x190
[ext4]
2014-09-18 02:30:34 [<ffffffffa02d82c6>] ext4_get_block+0x16/0x20 [ext4]
2014-09-18 02:30:34 [<ffffffff8117f63f>] generic_block_bmap+0x3f/0x50
2014-09-18 02:30:34 [<ffffffffa017e4ae>] ?
jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
2014-09-18 02:30:34 [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
2014-09-18 02:30:34 [<ffffffffa02d5d71>] ext4_bmap+0x91/0xf0 [ext4]
2014-09-18 02:30:34 [<ffffffff8116866e>] bmap+0x1e/0x30
2014-09-18 02:30:34 [<ffffffffa0187043>] jbd2_journal_bmap+0x33/0xb0
[jbd2]
2014-09-18 02:30:34 [<ffffffffa01872fd>]
jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
2014-09-18 02:30:34 [<ffffffffa017f238>]
jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
2014-09-18 02:30:34 [<ffffffff81084e03>] ? idle_balance+0xd3/0x110
2014-09-18 02:30:34 [<ffffffff8105a008>] ?
lock_timer_base.isra.35+0x38/0x70
2014-09-18 02:30:34 [<ffffffffa018491a>] kjournald2+0xba/0x230 [jbd2]
2014-09-18 02:30:34 [<ffffffff81070350>] ? finish_wait+0x80/0x80
2014-09-18 02:30:34 [<ffffffffa0184860>] ?
jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
2014-09-18 02:30:34 [<ffffffff8106fb50>] kthread+0xc0/0xd0
2014-09-18 02:30:34 [<ffffffff8106fa90>] ?
kthread_create_on_node+0x130/0x130
2014-09-18 02:30:34 [<ffffffff815548ec>] ret_from_fork+0x7c/0xb0
2014-09-18 02:30:34 [<ffffffff8106fa90>] ?
kthread_create_on_node+0x130/0x130
2014-09-18 02:30:34 Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 89
e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 0e 0f 1f 40 00
f3 90 0f b7 07 <66> 39 d0 75 f6 5d c3 90 90 90 90 66 66 66 66 90 55 48
89 e5 53
2014-09-18 02:30:34 BUG: soft lockup - CPU#1 stuck for 23s!
[jbd2/sdb1-8:1795]
2014-09-18 02:30:34 Modules linked in: nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp xt_owner ipt_REJECT
xt_multiport mpt2sas raid_class iptable_filter ip_tables x_tables
cpufreq_userspace cpufreq_powersave cpufreq_conservative
cpufreq_ondemand 8021q garp ext4 crc16 jbd2 mbcache ext2 mperf usbhid
coretemp kvm_intel kvm crc32_pclmul ehci_pci ghash_clmulni_intel sb_edac
ehci_hcd edac_core microcode usbcore i2c_i801 usb_common button
netconsole sg sd_mod igb i2c_algo_bit isci i2c_core libsas ahci ptp
libahci scsi_transport_sas pps_core megaraid_sas
2014-09-18 02:30:34 CPU: 1 PID: 1795 Comm: jbd2/sdb1-8 Not tainted
3.10.53+85-ph #1
2014-09-18 02:30:34 Hardware name: XX
2014-09-18 02:30:34 task: ffff881020d90000 ti: ffff881021662000
task.ti: ffff881021662000
2014-09-18 02:30:34 RIP: 0010:[<ffffffff81553cb5>]
[<ffffffff81553cb5>] _raw_spin_lock+0x25/0x30
2014-09-18 02:30:34 RSP: 0018:ffff881021663a38 EFLAGS: 00000297
2014-09-18 02:30:34 RAX: 000000000000c7a2 RBX: ffff881020800310 RCX:
0000000000000000
2014-09-18 02:30:34 RDX: 000000000000c7a3 RSI: 00000000000046b6 RDI:
ffff8810265c9440
2014-09-18 02:30:34 RBP: ffff881021663a38 R08: ffffffffa02d82b0 R09:
ffffea001d48dd40
2014-09-18 02:30:34 R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000020
2014-09-18 02:30:34 R13: 000000000000c79f R14: 0000000000000000 R15:
ffff88107fc359c0
2014-09-18 02:30:34 FS: 0000000000000000(0000)
GS:ffff88107fc20000(0000) knlGS:0000000000000000
2014-09-18 02:30:34 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2014-09-18 02:30:34 CR2: 000000000509d700 CR3: 0000000001a0b000 CR4:
00000000000407e0
2014-09-18 02:30:34 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
2014-09-18 02:30:34 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400Stack:
2014-09-18 02:30:34 ffff881021663a58 ffffffffa0311e56
00000000000046b6 ffff8810208000b0
2014-09-18 02:30:34 ffff881021663a88 ffffffffa0311f41
ffff8810208000b0 0000000000000000


Stefan

>
> line.
>
> Cheers,
>
> - Ted
>

2014-09-18 19:43:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Thu, Sep 18, 2014 at 09:29:37PM +0200, Stefan Priebe wrote:
>
> Sorry but whole output is:
> 2014-09-18 02:30:34 0000000000000000 ffff881021663b20 ffff881021663b08
> ffffffffa02d66b3
...

That's not the whole message; you just weren't able to capture it all.
How are you capturing these messages, by the way? Serial console?

It this reproducible? Can you try a newer kernel?

Cheers,

- Ted

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Hi Ted,

Am 18.09.2014 21:43, schrieb Theodore Ts'o:
> On Thu, Sep 18, 2014 at 09:29:37PM +0200, Stefan Priebe wrote:
>>
>> Sorry but whole output is:
>> 2014-09-18 02:30:34 0000000000000000 ffff881021663b20 ffff881021663b08
>> ffffffffa02d66b3
> ...
>
> That's not the whole message; you just weren't able to capture it all.
> How are you capturing these messages, by the way? Serial console?

Sorry this was an incomplete copy and paste by me.

Here is the complete output:
[1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
[1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_tcpudp xt_owner mpt2sas raid_class ipt_REJECT
xt_multiport iptable_filter ip_tables x_tables cpufreq_userspace
cpufreq_powersave cpufreq_conservative cpufreq_ondemand 8021q garp ext4
crc16 jbd2 mbcache ext2 k8temp ehci_pci mperf coretemp kvm_intel kvm
crc32_pclmul ehci_hcd ghash_clmulni_intel sb_edac edac_core usbcore
i2c_i801 microcode usb_common button netconsole sg sd_mod igb
i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
megaraid_sas pps_core
[1578545.192373] CPU: 7 PID: 29281 Comm: mysqld Tainted: G W
3.10.53+85-ph #1
[1578545.254369] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
03/06/2012
[1578545.317333] task: ffff880d5bab9900 ti: ffff880048da4000 task.ti:
ffff880048da4000
[1578545.380284] RIP: 0010:[<ffffffff81553cb2>] [<ffffffff81553cb2>]
_raw_spin_lock+0x22/0x30
[1578545.444138] RSP: 0000:ffff880048da5878 EFLAGS: 00000297
[1578545.507802] RAX: 000000000000f53c RBX: ffffffffa0372a69 RCX:
000000008802cc10
[1578545.571007] RDX: 000000000000f53d RSI: 0000000000000000 RDI:
ffff8810265a6440
[1578545.632916] RBP: ffff880048da5878 R08: 1038000000000000 R09:
0ab3417d081c0000
[1578545.694103] R10: 0000000000000005 R11: dead000000100100 R12:
ffffffff812ba03b
[1578545.755009] R13: ffff880048da57e8 R14: ffffffff810fa58c R15:
ffff880048da58a8
[1578545.815734] FS: 00007f55e0d1e700(0000) GS:ffff88107fdc0000(0000)
knlGS:0000000000000000
[1578545.877485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1578545.939121] CR2: 00007f5505000000 CR3: 0000001024ba0000 CR4:
00000000000407e0
[1578546.001641] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1578546.064081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[1578546.125544] Stack:
[1578546.186027] ffff880048da5898 ffffffffa0373350 ffff880ab3417c70
ffff880ab3417c70
[1578546.248189] ffff880048da58b8 ffffffffa03548b5 ffff880ab3417db8
ffff880ab3417db8
[1578546.310187] ffff880048da58d8 ffffffffa033cf43 ffff880ab3417c70
ffff880ab3417d70
[1578546.371830] Call Trace:
[1578546.432274] [<ffffffffa0373350>] ext4_es_lru_del+0x30/0x80 [ext4]
[1578546.493276] [<ffffffffa03548b5>] ext4_clear_inode+0x45/0x90 [ext4]
[1578546.554093] [<ffffffffa033cf43>] ext4_evict_inode+0x83/0x4d0 [ext4]
[1578546.614444] [<ffffffff81169ef0>] evict+0xb0/0x1b0
[1578546.673944] [<ffffffff8116a031>] dispose_list+0x41/0x50
[1578546.733061] [<ffffffff8116ae23>] prune_icache_sb+0x183/0x340
[1578546.792425] [<ffffffff81154c7b>] prune_super+0x17b/0x1b0
[1578546.851603] [<ffffffff810fd0f1>] shrink_slab+0x151/0x2e0
[1578546.910609] [<ffffffff8110dd22>] ? compact_zone+0x32/0x430
[1578546.969573] [<ffffffff810ffc55>] do_try_to_free_pages+0x405/0x540
[1578547.028754] [<ffffffff810fffc8>] try_to_free_pages+0xf8/0x180
[1578547.087924] [<ffffffff810f5d63>] __alloc_pages_nodemask+0x553/0x900
[1578547.147165] [<ffffffff81131a05>] alloc_pages_vma+0xa5/0x150
[1578547.206584] [<ffffffff811445a4>]
do_huge_pmd_anonymous_page+0x174/0x3d0
[1578547.265245] [<ffffffff8111c568>] ? change_protection+0x5b8/0x670
[1578547.322947] [<ffffffff81114a22>] handle_mm_fault+0x292/0x340
[1578547.379690] [<ffffffff81032b68>] __do_page_fault+0x168/0x460
[1578547.434929] [<ffffffff8111c777>] ? mprotect_fixup+0x157/0x280
[1578547.488655] [<ffffffff8111851b>] ? remove_vma+0x5b/0x70
[1578547.541197] [<ffffffff81032e9e>] do_page_fault+0xe/0x10
[1578547.594037] [<ffffffff81554242>] page_fault+0x22/0x30

> It this reproducible? Can you try a newer kernel?

I'm seeing this on various systems doing rsync backups to an ext4
partition. I can't try a newer kernel. But i also don't have exact steps
to reproduce. It just happens sometimes.

Greets,
Stefan

2014-09-22 16:47:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
> >That's not the whole message; you just weren't able to capture it all.
> >How are you capturing these messages, by the way? Serial console?
>
> Sorry this was an incomplete copy and paste by me.
>
> Here is the complete output:
> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4

OK, thanks, this is a known bug, where when ext4 is under heavy memory
pressure, we can end up stalling in reclaim. This message indicates
that the system got stalled for 22 seconds, which is not good, since
it impacts the interactivity of your system, and increases the
long-tail latency of requests to servers running on your system, but
it doesn't cause any data loss or will cause any of your processes to
crash or otherwise stop functioning (except for temporarily).

It's something that we are working on, and there are patches which
Zheng Liu submitted that still need a bit of polishing, but I hope to
have it addressed soon.

Cheers,

- Ted

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Hi,
Am 22.09.2014 18:47, schrieb Theodore Ts'o:
> On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
>>> That's not the whole message; you just weren't able to capture it all.
>>> How are you capturing these messages, by the way? Serial console?
>>
>> Sorry this was an incomplete copy and paste by me.
>>
>> Here is the complete output:
>> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
>> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
>
> OK, thanks, this is a known bug, where when ext4 is under heavy memory
> pressure, we can end up stalling in reclaim. This message indicates
> that the system got stalled for 22 seconds, which is not good, since
> it impacts the interactivity of your system, and increases the
> long-tail latency of requests to servers running on your system, but
> it doesn't cause any data loss or will cause any of your processes to
> crash or otherwise stop functioning (except for temporarily).
>
> It's something that we are working on, and there are patches which
> Zheng Liu submitted that still need a bit of polishing, but I hope to
> have it addressed soon.

Thanks for your feedback. Will those patches go to stable? Any link to
those patches?

Stefan

>
> Cheers,
>
> - Ted
>

2014-09-22 20:20:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote:
> Hi,
> Am 22.09.2014 18:47, schrieb Theodore Ts'o:
> >On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
> >>>That's not the whole message; you just weren't able to capture it all.
> >>>How are you capturing these messages, by the way? Serial console?
> >>
> >>Sorry this was an incomplete copy and paste by me.
> >>
> >>Here is the complete output:
> >>[1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
> >>[1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
> >
> >OK, thanks, this is a known bug, where when ext4 is under heavy memory
> >pressure, we can end up stalling in reclaim. This message indicates
> >that the system got stalled for 22 seconds, which is not good, since
> >it impacts the interactivity of your system, and increases the
> >long-tail latency of requests to servers running on your system, but
> >it doesn't cause any data loss or will cause any of your processes to
> >crash or otherwise stop functioning (except for temporarily).
> >
> >It's something that we are working on, and there are patches which
> >Zheng Liu submitted that still need a bit of polishing, but I hope to
> >have it addressed soon.
>
> Thanks for your feedback. Will those patches go to stable? Any link to
> those patches?

I'm not sure they will go to Stable when they are ready, because the
patches are somewhat complex and so they may not apply cleanly to much
older kernels.

The patches under discussion (some have been applied, others hae been
waiting for some requested changes) can be found here:

http://patchwork.ozlabs.org/patch/377720
http://patchwork.ozlabs.org/patch/377721
http://patchwork.ozlabs.org/patch/377722
http://patchwork.ozlabs.org/patch/377723
http://patchwork.ozlabs.org/patch/377724
http://patchwork.ozlabs.org/patch/377725
http://patchwork.ozlabs.org/patch/377727

- Ted

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable


Am 22.09.2014 um 22:20 schrieb Theodore Ts'o:
> On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote:
>> Hi,
>> Am 22.09.2014 18:47, schrieb Theodore Ts'o:
>>> On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
>>>>> That's not the whole message; you just weren't able to capture it all.
>>>>> How are you capturing these messages, by the way? Serial console?
>>>>
>>>> Sorry this was an incomplete copy and paste by me.
>>>>
>>>> Here is the complete output:
>>>> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
>>>> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
>>>
>>> OK, thanks, this is a known bug, where when ext4 is under heavy memory
>>> pressure, we can end up stalling in reclaim. This message indicates
>>> that the system got stalled for 22 seconds, which is not good, since
>>> it impacts the interactivity of your system, and increases the
>>> long-tail latency of requests to servers running on your system, but
>>> it doesn't cause any data loss or will cause any of your processes to
>>> crash or otherwise stop functioning (except for temporarily).
>>>
>>> It's something that we are working on, and there are patches which
>>> Zheng Liu submitted that still need a bit of polishing, but I hope to
>>> have it addressed soon.
>>
>> Thanks for your feedback. Will those patches go to stable? Any link to
>> those patches?
>
> I'm not sure they will go to Stable when they are ready, because the
> patches are somewhat complex and so they may not apply cleanly to much
> older kernels.
>
> The patches under discussion (some have been applied, others hae been
> waiting for some requested changes) can be found here:
>
> http://patchwork.ozlabs.org/patch/377720
> http://patchwork.ozlabs.org/patch/377721
> http://patchwork.ozlabs.org/patch/377722
> http://patchwork.ozlabs.org/patch/377723
> http://patchwork.ozlabs.org/patch/377724
> http://patchwork.ozlabs.org/patch/377725
> http://patchwork.ozlabs.org/patch/377727

hui that's a lot. Are they ALL needed to fix this? No workaround
possible? What will Redhat do with their 3.10 RHEL 7 kernel?

Stefan

2014-09-23 09:42:04

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Tue 23-09-14 09:50:25, Stefan Priebe - Profihost AG wrote:
>
> Am 22.09.2014 um 22:20 schrieb Theodore Ts'o:
> > On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote:
> >> Hi,
> >> Am 22.09.2014 18:47, schrieb Theodore Ts'o:
> >>> On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
> >>>>> That's not the whole message; you just weren't able to capture it all.
> >>>>> How are you capturing these messages, by the way? Serial console?
> >>>>
> >>>> Sorry this was an incomplete copy and paste by me.
> >>>>
> >>>> Here is the complete output:
> >>>> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
> >>>> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
> >>>
> >>> OK, thanks, this is a known bug, where when ext4 is under heavy memory
> >>> pressure, we can end up stalling in reclaim. This message indicates
> >>> that the system got stalled for 22 seconds, which is not good, since
> >>> it impacts the interactivity of your system, and increases the
> >>> long-tail latency of requests to servers running on your system, but
> >>> it doesn't cause any data loss or will cause any of your processes to
> >>> crash or otherwise stop functioning (except for temporarily).
> >>>
> >>> It's something that we are working on, and there are patches which
> >>> Zheng Liu submitted that still need a bit of polishing, but I hope to
> >>> have it addressed soon.
> >>
> >> Thanks for your feedback. Will those patches go to stable? Any link to
> >> those patches?
> >
> > I'm not sure they will go to Stable when they are ready, because the
> > patches are somewhat complex and so they may not apply cleanly to much
> > older kernels.
> >
> > The patches under discussion (some have been applied, others hae been
> > waiting for some requested changes) can be found here:
> >
> > http://patchwork.ozlabs.org/patch/377720
> > http://patchwork.ozlabs.org/patch/377721
> > http://patchwork.ozlabs.org/patch/377722
> > http://patchwork.ozlabs.org/patch/377723
> > http://patchwork.ozlabs.org/patch/377724
> > http://patchwork.ozlabs.org/patch/377725
> > http://patchwork.ozlabs.org/patch/377727
>
> hui that's a lot. Are they ALL needed to fix this?
Yes, all of them are needed.

> No workaround possible?
I don't know about any.

> What will Redhat do with their 3.10 RHEL 7 kernel?
Well, I cannot speak for RH guys but for SLES if there's a customer
request, we'll just go and backport the patches...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable


Am 23.09.2014 11:42, schrieb Jan Kara:
> On Tue 23-09-14 09:50:25, Stefan Priebe - Profihost AG wrote:
>>
>> Am 22.09.2014 um 22:20 schrieb Theodore Ts'o:
>>> On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote:
>>>> Hi,
>>>> Am 22.09.2014 18:47, schrieb Theodore Ts'o:
>>>>> On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
>>>>>>> That's not the whole message; you just weren't able to capture it all.
>>>>>>> How are you capturing these messages, by the way? Serial console?
>>>>>>
>>>>>> Sorry this was an incomplete copy and paste by me.
>>>>>>
>>>>>> Here is the complete output:
>>>>>> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
>>>>>> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
>>>>>
>>>>> OK, thanks, this is a known bug, where when ext4 is under heavy memory
>>>>> pressure, we can end up stalling in reclaim. This message indicates
>>>>> that the system got stalled for 22 seconds, which is not good, since
>>>>> it impacts the interactivity of your system, and increases the
>>>>> long-tail latency of requests to servers running on your system, but
>>>>> it doesn't cause any data loss or will cause any of your processes to
>>>>> crash or otherwise stop functioning (except for temporarily).
>>>>>
>>>>> It's something that we are working on, and there are patches which
>>>>> Zheng Liu submitted that still need a bit of polishing, but I hope to
>>>>> have it addressed soon.
>>>>
>>>> Thanks for your feedback. Will those patches go to stable? Any link to
>>>> those patches?
>>>
>>> I'm not sure they will go to Stable when they are ready, because the
>>> patches are somewhat complex and so they may not apply cleanly to much
>>> older kernels.
>>>
>>> The patches under discussion (some have been applied, others hae been
>>> waiting for some requested changes) can be found here:
>>>
>>> http://patchwork.ozlabs.org/patch/377720
>>> http://patchwork.ozlabs.org/patch/377721
>>> http://patchwork.ozlabs.org/patch/377722
>>> http://patchwork.ozlabs.org/patch/377723
>>> http://patchwork.ozlabs.org/patch/377724
>>> http://patchwork.ozlabs.org/patch/377725
>>> http://patchwork.ozlabs.org/patch/377727
>>
>> hui that's a lot. Are they ALL needed to fix this?
> Yes, all of them are needed.

How can i get notified when they're ready / polished?

Stefan

>> No workaround possible?
> I don't know about any.
>
>> What will Redhat do with their 3.10 RHEL 7 kernel?
> Well, I cannot speak for RH guys but for SLES if there's a customer
> request, we'll just go and backport the patches...
> Honza
>

2014-09-23 14:43:40

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Tue 23-09-14 14:23:29, Stefan Priebe wrote:
>
> Am 23.09.2014 11:42, schrieb Jan Kara:
> >On Tue 23-09-14 09:50:25, Stefan Priebe - Profihost AG wrote:
> >>
> >>Am 22.09.2014 um 22:20 schrieb Theodore Ts'o:
> >>>On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote:
> >>>>Hi,
> >>>>Am 22.09.2014 18:47, schrieb Theodore Ts'o:
> >>>>>On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote:
> >>>>>>>That's not the whole message; you just weren't able to capture it all.
> >>>>>>>How are you capturing these messages, by the way? Serial console?
> >>>>>>
> >>>>>>Sorry this was an incomplete copy and paste by me.
> >>>>>>
> >>>>>>Here is the complete output:
> >>>>>>[1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281]
> >>>>>>[1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
> >>>>>
> >>>>>OK, thanks, this is a known bug, where when ext4 is under heavy memory
> >>>>>pressure, we can end up stalling in reclaim. This message indicates
> >>>>>that the system got stalled for 22 seconds, which is not good, since
> >>>>>it impacts the interactivity of your system, and increases the
> >>>>>long-tail latency of requests to servers running on your system, but
> >>>>>it doesn't cause any data loss or will cause any of your processes to
> >>>>>crash or otherwise stop functioning (except for temporarily).
> >>>>>
> >>>>>It's something that we are working on, and there are patches which
> >>>>>Zheng Liu submitted that still need a bit of polishing, but I hope to
> >>>>>have it addressed soon.
> >>>>
> >>>>Thanks for your feedback. Will those patches go to stable? Any link to
> >>>>those patches?
> >>>
> >>>I'm not sure they will go to Stable when they are ready, because the
> >>>patches are somewhat complex and so they may not apply cleanly to much
> >>>older kernels.
> >>>
> >>>The patches under discussion (some have been applied, others hae been
> >>>waiting for some requested changes) can be found here:
> >>>
> >>>http://patchwork.ozlabs.org/patch/377720
> >>>http://patchwork.ozlabs.org/patch/377721
> >>>http://patchwork.ozlabs.org/patch/377722
> >>>http://patchwork.ozlabs.org/patch/377723
> >>>http://patchwork.ozlabs.org/patch/377724
> >>>http://patchwork.ozlabs.org/patch/377725
> >>>http://patchwork.ozlabs.org/patch/377727
> >>
> >>hui that's a lot. Are they ALL needed to fix this?
> > Yes, all of them are needed.
>
> How can i get notified when they're ready / polished?
Watching changes to fs/ext4/extents_status.c is probably the most
reliable. Or maybe Zheng (Cced) can add you to CC list when submitting the
patch set next time.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Hi all,

i'm still getting a lot of those call traces:
"
Call Trace:
[<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
[<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
[<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
[<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
[<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
[<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
[<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
[<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
[<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
[<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
[<ffffffff811686de>] bmap+0x1e/0x30
[<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
[<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
[<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
[<ffffffff81084e13>] ? idle_balance+0xd3/0x110
[<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
[<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
[<ffffffff81070360>] ? finish_wait+0x80/0x80
[<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
[<ffffffff8106fb60>] kthread+0xc0/0xd0
[<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
[<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
"

Is there any chance to fix them in vanilla 3.10.61?

Greets,
Stefan

2014-11-26 08:25:52

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Hi,

On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
> i'm still getting a lot of those call traces:
> "
> Call Trace:
> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
> [<ffffffff811686de>] bmap+0x1e/0x30
> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
> [<ffffffff81070360>] ? finish_wait+0x80/0x80
> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
> [<ffffffff8106fb60>] kthread+0xc0/0xd0
> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> "
>
> Is there any chance to fix them in vanilla 3.10.61?
Ted is just testing patches to fix these. You are welcome if you can give
them a try as well (tarball attached). I'm not sure patches will be
backported as far as to 3.10-stable but when the patches get some testing
in mainline, I'll be porting them to 3.12-stable for our enterprise
kernel...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (1.83 kB)
ext4_status_shrinker.tar.gz (16.45 kB)
Download all attachments
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Am 26.11.2014 um 09:25 schrieb Jan Kara:
> Hi,
>
> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
>> i'm still getting a lot of those call traces:
>> "
>> Call Trace:
>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
>> [<ffffffff811686de>] bmap+0x1e/0x30
>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>> "
>>
>> Is there any chance to fix them in vanilla 3.10.61?
> Ted is just testing patches to fix these. You are welcome if you can give
> them a try as well (tarball attached). I'm not sure patches will be
> backported as far as to 3.10-stable but when the patches get some testing
> in mainline, I'll be porting them to 3.12-stable for our enterprise
> kernel...

Thanks, on which kernel do they apply? They do not apply to a current 3.17.

Stefan

2014-11-26 10:38:28

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Wed 26-11-14 11:28:26, Stefan Priebe - Profihost AG wrote:
> Am 26.11.2014 um 09:25 schrieb Jan Kara:
> > Hi,
> >
> > On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
> >> i'm still getting a lot of those call traces:
> >> "
> >> Call Trace:
> >> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
> >> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
> >> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
> >> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
> >> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
> >> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
> >> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
> >> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
> >> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
> >> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
> >> [<ffffffff811686de>] bmap+0x1e/0x30
> >> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
> >> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
> >> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
> >> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
> >> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
> >> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
> >> [<ffffffff81070360>] ? finish_wait+0x80/0x80
> >> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
> >> [<ffffffff8106fb60>] kthread+0xc0/0xd0
> >> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
> >> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >> "
> >>
> >> Is there any chance to fix them in vanilla 3.10.61?
> > Ted is just testing patches to fix these. You are welcome if you can give
> > them a try as well (tarball attached). I'm not sure patches will be
> > backported as far as to 3.10-stable but when the patches get some testing
> > in mainline, I'll be porting them to 3.12-stable for our enterprise
> > kernel...
>
> Thanks, on which kernel do they apply? They do not apply to a current 3.17.
They are based on current Linus' git tree. So you have to try something
like 3.18-rc5 or so.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Am 26.11.2014 um 09:25 schrieb Jan Kara:
> Hi,
>
> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
>> i'm still getting a lot of those call traces:
>> "
>> Call Trace:
>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
>> [<ffffffff811686de>] bmap+0x1e/0x30
>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>> "
>>
>> Is there any chance to fix them in vanilla 3.10.61?
> Ted is just testing patches to fix these. You are welcome if you can give
> them a try as well (tarball attached). I'm not sure patches will be
> backported as far as to 3.10-stable but when the patches get some testing
> in mainline, I'll be porting them to 3.12-stable for our enterprise
> kernel...

OK i tried to port them to 3.10 but it seems i can't handle this. There
are so many differences. Are there any workarounds possible? Currently
the 3.10 kernel is also completely crashing with this backtrace.

Greets,
Stefan

2014-11-26 20:26:38

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Wed 26-11-14 16:11:37, Stefan Priebe - Profihost AG wrote:
> Am 26.11.2014 um 09:25 schrieb Jan Kara:
> > Hi,
> >
> > On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
> >> i'm still getting a lot of those call traces:
> >> "
> >> Call Trace:
> >> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
> >> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
> >> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
> >> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
> >> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
> >> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
> >> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
> >> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
> >> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
> >> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
> >> [<ffffffff811686de>] bmap+0x1e/0x30
> >> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
> >> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
> >> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
> >> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
> >> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
> >> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
> >> [<ffffffff81070360>] ? finish_wait+0x80/0x80
> >> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
> >> [<ffffffff8106fb60>] kthread+0xc0/0xd0
> >> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
> >> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >> "
> >>
> >> Is there any chance to fix them in vanilla 3.10.61?
> > Ted is just testing patches to fix these. You are welcome if you can give
> > them a try as well (tarball attached). I'm not sure patches will be
> > backported as far as to 3.10-stable but when the patches get some testing
> > in mainline, I'll be porting them to 3.12-stable for our enterprise
> > kernel...
>
> OK i tried to port them to 3.10 but it seems i can't handle this. There
> are so many differences. Are there any workarounds possible? Currently
> the 3.10 kernel is also completely crashing with this backtrace.
No workarounds I'm aware of. Sorry. When I have patches for 3.12, you can
try porting them to 3.10. That should be an easier task...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable


Am 26.11.2014 21:26, schrieb Jan Kara:
> On Wed 26-11-14 16:11:37, Stefan Priebe - Profihost AG wrote:
>> Am 26.11.2014 um 09:25 schrieb Jan Kara:
>>> Hi,
>>>
>>> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
>>>> i'm still getting a lot of those call traces:
>>>> "
>>>> Call Trace:
>>>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
>>>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
>>>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
>>>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
>>>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
>>>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
>>>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
>>>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
>>>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
>>>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
>>>> [<ffffffff811686de>] bmap+0x1e/0x30
>>>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
>>>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
>>>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
>>>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
>>>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
>>>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
>>>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
>>>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
>>>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>> "
>>>>
>>>> Is there any chance to fix them in vanilla 3.10.61?
>>> Ted is just testing patches to fix these. You are welcome if you can give
>>> them a try as well (tarball attached). I'm not sure patches will be
>>> backported as far as to 3.10-stable but when the patches get some testing
>>> in mainline, I'll be porting them to 3.12-stable for our enterprise
>>> kernel...
>>
>> OK i tried to port them to 3.10 but it seems i can't handle this. There
>> are so many differences. Are there any workarounds possible? Currently
>> the 3.10 kernel is also completely crashing with this backtrace.
> No workarounds I'm aware of. Sorry. When I have patches for 3.12, you can
> try porting them to 3.10. That should be an easier task...

Yes i think i'll be able todo this. Any idea when you'll have patches
around?

Thanks,
Stefan

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable


Am 26.11.2014 um 21:26 schrieb Jan Kara:
> On Wed 26-11-14 16:11:37, Stefan Priebe - Profihost AG wrote:
>> Am 26.11.2014 um 09:25 schrieb Jan Kara:
>>> Hi,
>>>
>>> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
>>>> i'm still getting a lot of those call traces:
>>>> "
>>>> Call Trace:
>>>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
>>>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
>>>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
>>>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
>>>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
>>>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
>>>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
>>>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
>>>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
>>>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
>>>> [<ffffffff811686de>] bmap+0x1e/0x30
>>>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
>>>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
>>>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
>>>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
>>>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
>>>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
>>>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
>>>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
>>>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>> "
>>>>
>>>> Is there any chance to fix them in vanilla 3.10.61?
>>> Ted is just testing patches to fix these. You are welcome if you can give
>>> them a try as well (tarball attached). I'm not sure patches will be
>>> backported as far as to 3.10-stable but when the patches get some testing
>>> in mainline, I'll be porting them to 3.12-stable for our enterprise
>>> kernel...
>>
>> OK i tried to port them to 3.10 but it seems i can't handle this. There
>> are so many differences. Are there any workarounds possible? Currently
>> the 3.10 kernel is also completely crashing with this backtrace.
> No workarounds I'm aware of. Sorry. When I have patches for 3.12, you can
> try porting them to 3.10. That should be an easier task...
>
> Honza

those patches work absolutely fine on a 3.16 kernel. Do you have any
idea, when your 3.12 backport is done?


Thanks!

Greets,
Stefan Priebe

2014-12-04 18:35:39

by Jan Kara

[permalink] [raw]
Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

On Thu 04-12-14 16:06:40, Stefan Priebe - Profihost AG wrote:
>
> Am 26.11.2014 um 21:26 schrieb Jan Kara:
> > On Wed 26-11-14 16:11:37, Stefan Priebe - Profihost AG wrote:
> >> Am 26.11.2014 um 09:25 schrieb Jan Kara:
> >>> Hi,
> >>>
> >>> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
> >>>> i'm still getting a lot of those call traces:
> >>>> "
> >>>> Call Trace:
> >>>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
> >>>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
> >>>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
> >>>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
> >>>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
> >>>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
> >>>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
> >>>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
> >>>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
> >>>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
> >>>> [<ffffffff811686de>] bmap+0x1e/0x30
> >>>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
> >>>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
> >>>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
> >>>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
> >>>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
> >>>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
> >>>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
> >>>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
> >>>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
> >>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >>>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
> >>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
> >>>> "
> >>>>
> >>>> Is there any chance to fix them in vanilla 3.10.61?
> >>> Ted is just testing patches to fix these. You are welcome if you can give
> >>> them a try as well (tarball attached). I'm not sure patches will be
> >>> backported as far as to 3.10-stable but when the patches get some testing
> >>> in mainline, I'll be porting them to 3.12-stable for our enterprise
> >>> kernel...
> >>
> >> OK i tried to port them to 3.10 but it seems i can't handle this. There
> >> are so many differences. Are there any workarounds possible? Currently
> >> the 3.10 kernel is also completely crashing with this backtrace.
> > No workarounds I'm aware of. Sorry. When I have patches for 3.12, you can
> > try porting them to 3.10. That should be an easier task...
> >
> > Honza
>
> those patches work absolutely fine on a 3.16 kernel. Do you have any
> idea, when your 3.12 backport is done?
Thanks for confirmation. I hope to backport the patches sometime next
week.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable

Hi,
Am 04.12.2014 um 19:35 schrieb Jan Kara:
> On Thu 04-12-14 16:06:40, Stefan Priebe - Profihost AG wrote:
>>
>> Am 26.11.2014 um 21:26 schrieb Jan Kara:
>>> On Wed 26-11-14 16:11:37, Stefan Priebe - Profihost AG wrote:
>>>> Am 26.11.2014 um 09:25 schrieb Jan Kara:
>>>>> Hi,
>>>>>
>>>>> On Wed 26-11-14 09:06:43, Stefan Priebe - Profihost AG wrote:
>>>>>> i'm still getting a lot of those call traces:
>>>>>> "
>>>>>> Call Trace:
>>>>>> [<ffffffffa01d7006>] ext4_es_lru_add+0x26/0x80 [ext4]
>>>>>> [<ffffffffa01d7286>] ext4_es_insert_extent+0x96/0x100 [ext4]
>>>>>> [<ffffffffa01c3fd3>] ? ext4_find_delalloc_range+0x23/0x60 [ext4]
>>>>>> [<ffffffffa019b781>] ext4_map_blocks+0x111/0x450 [ext4]
>>>>>> [<ffffffffa019d167>] _ext4_get_block+0x87/0x190 [ext4]
>>>>>> [<ffffffffa019d2c6>] ext4_get_block+0x16/0x20 [ext4]
>>>>>> [<ffffffff8117f73f>] generic_block_bmap+0x3f/0x50
>>>>>> [<ffffffffa013f4ae>] ? jbd2_journal_file_buffer+0x4e/0x80 [jbd2]
>>>>>> [<ffffffff810f6242>] ? mapping_tagged+0x12/0x20
>>>>>> [<ffffffffa019ad71>] ext4_bmap+0x91/0xf0 [ext4]
>>>>>> [<ffffffff811686de>] bmap+0x1e/0x30
>>>>>> [<ffffffffa0148063>] jbd2_journal_bmap+0x33/0xb0 [jbd2]
>>>>>> [<ffffffffa014831d>] jbd2_journal_next_log_block+0x7d/0x90 [jbd2]
>>>>>> [<ffffffffa0140238>] jbd2_journal_commit_transaction+0x7f8/0x1ae0 [jbd2]
>>>>>> [<ffffffff81084e13>] ? idle_balance+0xd3/0x110
>>>>>> [<ffffffff8105a018>] ? lock_timer_base.isra.35+0x38/0x70
>>>>>> [<ffffffffa014593a>] kjournald2+0xba/0x230 [jbd2]
>>>>>> [<ffffffff81070360>] ? finish_wait+0x80/0x80
>>>>>> [<ffffffffa0145880>] ? jbd2_journal_release_jbd_inode+0x130/0x130 [jbd2]
>>>>>> [<ffffffff8106fb60>] kthread+0xc0/0xd0
>>>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>>>> [<ffffffff81554c2c>] ret_from_fork+0x7c/0xb0
>>>>>> [<ffffffff8106faa0>] ? kthread_create_on_node+0x130/0x130
>>>>>> "
>>>>>>
>>>>>> Is there any chance to fix them in vanilla 3.10.61?
>>>>> Ted is just testing patches to fix these. You are welcome if you can give
>>>>> them a try as well (tarball attached). I'm not sure patches will be
>>>>> backported as far as to 3.10-stable but when the patches get some testing
>>>>> in mainline, I'll be porting them to 3.12-stable for our enterprise
>>>>> kernel...
>>>>
>>>> OK i tried to port them to 3.10 but it seems i can't handle this. There
>>>> are so many differences. Are there any workarounds possible? Currently
>>>> the 3.10 kernel is also completely crashing with this backtrace.
>>> No workarounds I'm aware of. Sorry. When I have patches for 3.12, you can
>>> try porting them to 3.10. That should be an easier task...
>>>
>>> Honza
>>
>> those patches work absolutely fine on a 3.16 kernel. Do you have any
>> idea, when your 3.12 backport is done?
> Thanks for confirmation. I hope to backport the patches sometime next
> week.

Do you had some time last week?

Stefan