From: Jan Kara Subject: Re: Yet another ext4/jbd2 lockup Date: Mon, 3 Aug 2015 17:15:25 +0200 Message-ID: <20150803151525.GE9657@quack.suse.cz> References: <55B8D61E.7090609@kyup.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org, Theodore Ts'o , Marian Marinov To: Nikolay Borisov Return-path: Received: from mx2.suse.de ([195.135.220.15]:58426 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751947AbbHCPPb (ORCPT ); Mon, 3 Aug 2015 11:15:31 -0400 Content-Disposition: inline In-Reply-To: <55B8D61E.7090609@kyup.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: Hello, On Wed 29-07-15 16:33:18, Nikolay Borisov wrote: > Again, I have come up against a rather annoying ext4/jbd2 > lock and I'm not sure it's the same issue as the last one > I posted (http://marc.info/?l=linux-ext4&m=143522748527527&w=2). > Again, all of this is happening on an 3.12.28 kernel. I did grep for > the ext4 commits that have come afterwards but none of them allude to > fixing something similar. This time the blocked process is an rsync > which has been running OUTSIDE of a container e.g. in the / cgroup > but it was copying stuff from a dm-thin volume acting as the root > device for a cgroup: > > rsync D ffffffff81622100 0 33158 1 0x00000004 > ffff882ad6257b18 0000000000000086 ffff883cfb3e5280 0000000000011640 > ffff882ad6257fd8 ffff882ad6256010 0000000000011640 0000000000011640 > ffff882ad6257fd8 0000000000011640 ffff881fd3213180 ffff883cfb3e5280 > Call Trace: > [] schedule+0x24/0x70 > [] wait_transaction_locked+0x7c/0xb0 > [] ? wake_up_bit+0x40/0x40 > [] start_this_handle+0x2c2/0x660 > [] ? d_rehash+0x1d/0x30 > [] ? kmem_cache_alloc+0xb5/0x150 > [] jbd2__journal_start+0x109/0x1b0 > [] __ext4_journal_start_sb+0x61/0xb0 > [] __ext4_new_inode+0x5b7/0x1330 > [] ? __lookup_hash+0x1e/0x40 > [] ext4_mkdir+0x18c/0x400 > [] ? generic_permission+0xc5/0x230 > [] vfs_mkdir+0xb8/0x100 > [] SyS_mkdirat+0xb2/0x100 > [] SyS_mkdir+0x14/0x20 > [] system_call_fastpath+0x16/0x1b So this is waiting to start a new transaction in the journal. > Sampling echo "w" > /proc/sysrq-trigger I found some other You can just attach full output from the above command. That way we can be sure we see all the pieces... > blocked process which might be related to the issue. In particular: > > jbd2/dm-24-8 D 0000000000000000 0 31263 2 0x00000000 > ffff882af46df9f8 0000000000000046 ffff882643db0000 0000000000011640 > ffff882af46dffd8 ffff882af46de010 0000000000011640 0000000000011640 > ffff882af46dffd8 0000000000011640 ffff8827c519ca40 ffff882643db0000 > Call Trace: > [] ? __lock_page+0x70/0x70 > [] schedule+0x24/0x70 > [] io_schedule+0x87/0xd0 > [] sleep_on_page+0x9/0x10 > [] __wait_on_bit_lock+0x55/0xb0 > [] ? find_get_pages_tag+0xca/0x170 > [] __lock_page+0x62/0x70 > [] ? wake_atomic_t_function+0x40/0x40 > [] ? pagevec_lookup_tag+0x20/0x30 > [] write_cache_pages+0x391/0x470 > [] ? set_page_dirty+0x60/0x60 > [] generic_writepages+0x4c/0x70 > [] jbd2_journal_commit_transaction+0x41b/0x1790 > [] ? lock_timer_base+0x37/0x70 > [] kjournald2+0xc2/0x250 > [] ? wake_up_bit+0x40/0x40 > [] ? commit_timeout+0x10/0x10 > [] kthread+0xc6/0xd0 > [] ? kthread_freezable_should_stop+0x60/0x60 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x60/0x60 So jbd2 thread is waiting for page lock. Interesting... > kworker/u81:90 D 0000000000000000 0 36695 2 0x00000000 > Workqueue: writeback bdi_writeback_workfn (flush-253:24) > ffff880155ee7888 0000000000000046 ffff881ea1a71080 0000000000011640 > ffff880155ee7fd8 ffff880155ee6010 0000000000011640 0000000000011640 > ffff880155ee7fd8 0000000000011640 ffff882119215ac0 ffff881ea1a71080 > Call Trace: > [] schedule+0x24/0x70 > [] jbd2_log_wait_commit+0xb5/0x130 > [] ? wake_up_bit+0x40/0x40 > [] ? __wake_up+0x4e/0x70 > [] jbd2_journal_stop+0x1bb/0x2c0 > [] __ext4_journal_stop+0x78/0xa0 > [] ext4_writepages+0x45f/0xa50 > [] do_writepages+0x1b/0x30 > [] __writeback_single_inode+0x3d/0x130 > [] writeback_sb_inodes+0x207/0x3f0 > [] __writeback_inodes_wb+0x96/0xc0 > [] wb_writeback+0x1db/0x1f0 > [] wb_do_writeback+0x191/0x1a0 > [] bdi_writeback_workfn+0x78/0x1b0 > [] ? wake_up_process+0x22/0x40 > [] process_one_work+0x13b/0x460 > [] worker_thread+0x119/0x3f0 > [] ? manage_workers+0x2c0/0x2c0 > [] kthread+0xc6/0xd0 > [] ? kthread_freezable_should_stop+0x60/0x60 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x60/0x60 And this is likely the culprit - flusher thread is waiting for journal commit from jbd2_journal_stop() at which point it may be holding page locks which are blocking jbd2 thread. Now waiting from transaction commit from jbd2_journal_stop() is really unexpected. That happens only if the handle was a sync one and the only way I can see handle becoming a sync one from a writeback path is when this would be the first file over 4 GB in the filesystem which looks really unlikely... Can you post output of dumpe2fs for the filesystem? Thanks! Honza > Basically dm-24 is the device this rsync was syncing from and the > kworker is running as part of writeback for the same device > (253:24 is the major:minor for this particular device). > Furthermore, I see there have been a number of "sync" executed and > their stack traces look like the following: > > sync D ffffffff81622100 0 17596 17532 0x00000000 > ffff8827cf1fdd78 0000000000000086 ffff88348291b9c0 0000000000011640 > ffff8827cf1fdfd8 ffff8827cf1fc010 0000000000011640 0000000000011640 > ffff8827cf1fdfd8 0000000000011640 ffff881fd32118c0 ffff88348291b9c0 > Call Trace: > [] schedule+0x24/0x70 > [] schedule_timeout+0x185/0x1c0 > [] ? __queue_delayed_work+0x82/0x150 > [] wait_for_completion+0x94/0x100 > [] ? try_to_wake_up+0x2b0/0x2b0 > [] ? bdi_queue_work+0x48/0xb0 > [] ? fdatawrite_one_bdev+0x20/0x20 > [] sync_inodes_sb+0xa6/0x1c0 > [] ? fdatawrite_one_bdev+0x20/0x20 > [] sync_inodes_one_sb+0x15/0x20 > [] iterate_supers+0xa9/0xe0 > [] sys_sync+0x30/0x90 > [] system_call_fastpath+0x16/0x1b > > I believe it is this sync which has triggered the bdi_writeback_workfn > for the particular dm device. Essentially what I gather is that the > rsync is locked since it wants to do some filesystem activity, hence > it wants to start a transaction but it's blocked due to an unfinished > transaction - presumably the one from the writeback path. However, > the writeback is blocked in jbd2_log_wait_commit, waiting for its > own transaction to complete and it is not progressing due to some > reason. There hasn't been any OOM messages for the container in > question and the global OOM killer hasn't triggered. The memory > state of the machine when this was observed. There still some > memory: > > SysRq : Show Memory > Mem-Info: > Node 0 DMA per-cpu: > CPU 0: hi: 0, btch: 1 usd: 0 > CPU 1: hi: 0, btch: 1 usd: 0 > CPU 2: hi: 0, btch: 1 usd: 0 > CPU 3: hi: 0, btch: 1 usd: 0 > CPU 4: hi: 0, btch: 1 usd: 0 > CPU 5: hi: 0, btch: 1 usd: 0 > CPU 6: hi: 0, btch: 1 usd: 0 > CPU 7: hi: 0, btch: 1 usd: 0 > CPU 8: hi: 0, btch: 1 usd: 0 > CPU 9: hi: 0, btch: 1 usd: 0 > CPU 10: hi: 0, btch: 1 usd: 0 > CPU 11: hi: 0, btch: 1 usd: 0 > CPU 12: hi: 0, btch: 1 usd: 0 > CPU 13: hi: 0, btch: 1 usd: 0 > CPU 14: hi: 0, btch: 1 usd: 0 > CPU 15: hi: 0, btch: 1 usd: 0 > CPU 16: hi: 0, btch: 1 usd: 0 > CPU 17: hi: 0, btch: 1 usd: 0 > CPU 18: hi: 0, btch: 1 usd: 0 > CPU 19: hi: 0, btch: 1 usd: 0 > CPU 20: hi: 0, btch: 1 usd: 0 > CPU 21: hi: 0, btch: 1 usd: 0 > CPU 22: hi: 0, btch: 1 usd: 0 > CPU 23: hi: 0, btch: 1 usd: 0 > CPU 24: hi: 0, btch: 1 usd: 0 > CPU 25: hi: 0, btch: 1 usd: 0 > CPU 26: hi: 0, btch: 1 usd: 0 > CPU 27: hi: 0, btch: 1 usd: 0 > CPU 28: hi: 0, btch: 1 usd: 0 > CPU 29: hi: 0, btch: 1 usd: 0 > CPU 30: hi: 0, btch: 1 usd: 0 > CPU 31: hi: 0, btch: 1 usd: 0 > CPU 32: hi: 0, btch: 1 usd: 0 > CPU 33: hi: 0, btch: 1 usd: 0 > CPU 34: hi: 0, btch: 1 usd: 0 > CPU 35: hi: 0, btch: 1 usd: 0 > CPU 36: hi: 0, btch: 1 usd: 0 > CPU 37: hi: 0, btch: 1 usd: 0 > CPU 38: hi: 0, btch: 1 usd: 0 > CPU 39: hi: 0, btch: 1 usd: 0 > Node 0 DMA32 per-cpu: > CPU 0: hi: 186, btch: 31 usd: 177 > CPU 1: hi: 186, btch: 31 usd: 129 > CPU 2: hi: 186, btch: 31 usd: 145 > CPU 3: hi: 186, btch: 31 usd: 60 > CPU 4: hi: 186, btch: 31 usd: 105 > CPU 5: hi: 186, btch: 31 usd: 96 > CPU 6: hi: 186, btch: 31 usd: 71 > CPU 7: hi: 186, btch: 31 usd: 53 > CPU 8: hi: 186, btch: 31 usd: 121 > CPU 9: hi: 186, btch: 31 usd: 171 > CPU 10: hi: 186, btch: 31 usd: 77 > CPU 11: hi: 186, btch: 31 usd: 0 > CPU 12: hi: 186, btch: 31 usd: 0 > CPU 13: hi: 186, btch: 31 usd: 0 > CPU 14: hi: 186, btch: 31 usd: 29 > CPU 15: hi: 186, btch: 31 usd: 36 > CPU 16: hi: 186, btch: 31 usd: 0 > CPU 17: hi: 186, btch: 31 usd: 0 > CPU 18: hi: 186, btch: 31 usd: 0 > CPU 19: hi: 186, btch: 31 usd: 22 > CPU 20: hi: 186, btch: 31 usd: 76 > CPU 21: hi: 186, btch: 31 usd: 161 > CPU 22: hi: 186, btch: 31 usd: 73 > CPU 23: hi: 186, btch: 31 usd: 78 > CPU 24: hi: 186, btch: 31 usd: 125 > CPU 25: hi: 186, btch: 31 usd: 73 > CPU 26: hi: 186, btch: 31 usd: 72 > CPU 27: hi: 186, btch: 31 usd: 38 > CPU 28: hi: 186, btch: 31 usd: 118 > CPU 29: hi: 186, btch: 31 usd: 63 > CPU 30: hi: 186, btch: 31 usd: 0 > CPU 31: hi: 186, btch: 31 usd: 0 > CPU 32: hi: 186, btch: 31 usd: 0 > CPU 33: hi: 186, btch: 31 usd: 0 > CPU 34: hi: 186, btch: 31 usd: 0 > CPU 35: hi: 186, btch: 31 usd: 0 > CPU 36: hi: 186, btch: 31 usd: 0 > CPU 37: hi: 186, btch: 31 usd: 0 > CPU 38: hi: 186, btch: 31 usd: 0 > CPU 39: hi: 186, btch: 31 usd: 0 > Node 0 Normal per-cpu: > CPU 0: hi: 186, btch: 31 usd: 188 > CPU 1: hi: 186, btch: 31 usd: 174 > CPU 2: hi: 186, btch: 31 usd: 116 > CPU 3: hi: 186, btch: 31 usd: 96 > CPU 4: hi: 186, btch: 31 usd: 169 > CPU 5: hi: 186, btch: 31 usd: 137 > CPU 6: hi: 186, btch: 31 usd: 102 > CPU 7: hi: 186, btch: 31 usd: 96 > CPU 8: hi: 186, btch: 31 usd: 149 > CPU 9: hi: 186, btch: 31 usd: 203 > CPU 10: hi: 186, btch: 31 usd: 179 > CPU 11: hi: 186, btch: 31 usd: 181 > CPU 12: hi: 186, btch: 31 usd: 159 > CPU 13: hi: 186, btch: 31 usd: 160 > CPU 14: hi: 186, btch: 31 usd: 179 > CPU 15: hi: 186, btch: 31 usd: 158 > CPU 16: hi: 186, btch: 31 usd: 180 > CPU 17: hi: 186, btch: 31 usd: 155 > CPU 18: hi: 186, btch: 31 usd: 156 > CPU 19: hi: 186, btch: 31 usd: 173 > CPU 20: hi: 186, btch: 31 usd: 161 > CPU 21: hi: 186, btch: 31 usd: 163 > CPU 22: hi: 186, btch: 31 usd: 123 > CPU 23: hi: 186, btch: 31 usd: 157 > CPU 24: hi: 186, btch: 31 usd: 19 > CPU 25: hi: 186, btch: 31 usd: 159 > CPU 26: hi: 186, btch: 31 usd: 176 > CPU 27: hi: 186, btch: 31 usd: 140 > CPU 28: hi: 186, btch: 31 usd: 180 > CPU 29: hi: 186, btch: 31 usd: 169 > CPU 30: hi: 186, btch: 31 usd: 158 > CPU 31: hi: 186, btch: 31 usd: 181 > CPU 32: hi: 186, btch: 31 usd: 181 > CPU 33: hi: 186, btch: 31 usd: 159 > CPU 34: hi: 186, btch: 31 usd: 168 > CPU 35: hi: 186, btch: 31 usd: 173 > CPU 36: hi: 186, btch: 31 usd: 161 > CPU 37: hi: 186, btch: 31 usd: 179 > CPU 38: hi: 186, btch: 31 usd: 184 > CPU 39: hi: 186, btch: 31 usd: 182 > Node 1 Normal per-cpu: > CPU 0: hi: 186, btch: 31 usd: 164 > CPU 1: hi: 186, btch: 31 usd: 167 > CPU 2: hi: 186, btch: 31 usd: 160 > CPU 3: hi: 186, btch: 31 usd: 171 > CPU 4: hi: 186, btch: 31 usd: 175 > CPU 5: hi: 186, btch: 31 usd: 184 > CPU 6: hi: 186, btch: 31 usd: 169 > CPU 7: hi: 186, btch: 31 usd: 159 > CPU 8: hi: 186, btch: 31 usd: 171 > CPU 9: hi: 186, btch: 31 usd: 185 > CPU 10: hi: 186, btch: 31 usd: 86 > CPU 11: hi: 186, btch: 31 usd: 73 > CPU 12: hi: 186, btch: 31 usd: 29 > CPU 13: hi: 186, btch: 31 usd: 62 > CPU 14: hi: 186, btch: 31 usd: 91 > CPU 15: hi: 186, btch: 31 usd: 114 > CPU 16: hi: 186, btch: 31 usd: 69 > CPU 17: hi: 186, btch: 31 usd: 117 > CPU 18: hi: 186, btch: 31 usd: 17 > CPU 19: hi: 186, btch: 31 usd: 20 > CPU 20: hi: 186, btch: 31 usd: 163 > CPU 21: hi: 186, btch: 31 usd: 161 > CPU 22: hi: 186, btch: 31 usd: 175 > CPU 23: hi: 186, btch: 31 usd: 172 > CPU 24: hi: 186, btch: 31 usd: 177 > CPU 25: hi: 186, btch: 31 usd: 171 > CPU 26: hi: 186, btch: 31 usd: 157 > CPU 27: hi: 186, btch: 31 usd: 171 > CPU 28: hi: 186, btch: 31 usd: 182 > CPU 29: hi: 186, btch: 31 usd: 183 > CPU 30: hi: 186, btch: 31 usd: 40 > CPU 31: hi: 186, btch: 31 usd: 157 > CPU 32: hi: 186, btch: 31 usd: 76 > CPU 33: hi: 186, btch: 31 usd: 157 > CPU 34: hi: 186, btch: 31 usd: 63 > CPU 35: hi: 186, btch: 31 usd: 62 > CPU 36: hi: 186, btch: 31 usd: 58 > CPU 37: hi: 186, btch: 31 usd: 36 > CPU 38: hi: 186, btch: 31 usd: 19 > CPU 39: hi: 186, btch: 31 usd: 47 > active_anon:6762239 inactive_anon:12245334 isolated_anon:0 > active_file:10705451 inactive_file:11321321 isolated_file:0 > unevictable:5516901 dirty:457441 writeback:0 unstable:0 > free:1028680 slab_reclaimable:3791349 slab_unreclaimable:13461537 > mapped:1567688 shmem:904717 pagetables:362061 bounce:0 > free_cma:0 > Node 0 DMA free:15532kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > lowmem_reserve[]: 0 1668 128642 128642 > Node 0 DMA32 free:509204kB min:1164kB low:1452kB high:1744kB active_anon:6016kB inactive_anon:13164kB active_file:2584kB inactive_file:5756kB unevictable:1112kB isolated(anon):0kB isolated(file):0kB present:2045980kB managed:1708904kB mlocked:1224kB dirty:20kB writeback:0kB mapped:4164kB shmem:3900kB slab_reclaimable:463020kB slab_unreclaimable:671072kB kernel_stack:1224kB pagetables:4268kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 126973 126973 > Node 0 Normal free:189488kB min:88804kB low:111004kB high:133204kB active_anon:16411204kB inactive_anon:24056744kB active_file:19949588kB inactive_file:22435040kB unevictable:8645768kB isolated(anon):0kB isolated(file):0kB present:132120576kB managed:130020968kB mlocked:8657300kB dirty:1773000kB writeback:0kB mapped:2878728kB shmem:1624252kB slab_reclaimable:6834156kB slab_unreclaimable:29014116kB kernel_stack:65736kB pagetables:711944kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > Node 1 Normal free:3393536kB min:90240kB low:112800kB high:135360kB active_anon:10638288kB inactive_anon:24911428kB active_file:22869632kB inactive_file:22844488kB unevictable:13420724kB isolated(anon):0kB isolated(file):0kB present:134217728kB managed:132117884kB mlocked:13427472kB dirty:56744kB writeback:0kB mapped:3387860kB shmem:1990716kB slab_reclaimable:7868220kB slab_unreclaimable:24160960kB kernel_stack:90600kB pagetables:731280kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (MR) = 15532kB > Node 0 DMA32: 612*4kB (UEM) 1081*8kB (UEM) 295*16kB (UEM) 412*32kB (UEM) 288*64kB (UEM) 118*128kB (UEM) 63*256kB (UEM) 49*512kB (UEM) 30*1024kB (UEM) 29*2048kB (UEM) 77*4096kB (UMR) = 509256kB > Node 0 Normal: 44891*4kB (UE) 0*8kB 0*16kB 5*32kB (R) 4*64kB (R) 2*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 1*4096kB (R) = 186124kB > Node 1 Normal: 467218*4kB (UEM) 145540*8kB (UEM) 18995*16kB (UEM) 1482*32kB (UMR) 5*64kB (R) 1*128kB (R) 2*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 1*4096kB (R) = 3391128kB > 23967743 total pagecache pages > 1034858 pages in swap cache > Swap cache stats: add 154167590, delete 153132732, find 2698616740/2708213552 > Free swap = 16464476kB > Total swap = 41943036kB > 67100065 pages RAM > 0 pages HighMem/MovableOnly > 524961 pages reserved > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara SUSE Labs, CR