Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756089AbdC2Lum (ORCPT ); Wed, 29 Mar 2017 07:50:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29814 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755967AbdC2LtU (ORCPT ); Wed, 29 Mar 2017 07:49:20 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 414DF7AE9A Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=bfoster@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 414DF7AE9A Date: Wed, 29 Mar 2017 07:49:17 -0400 From: Brian Foster To: Michal Hocko Cc: Ilya Dryomov , Greg Kroah-Hartman , "linux-kernel@vger.kernel.org" , stable@vger.kernel.org, Sergey Jerusalimov , Jeff Layton , linux-xfs@vger.kernel.org Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations Message-ID: <20170329114917.GB24681@bfoster.bfoster> References: <20170328122559.966310440@linuxfoundation.org> <20170328122601.905696872@linuxfoundation.org> <20170328124312.GE18241@dhcp22.suse.cz> <20170328133040.GJ18241@dhcp22.suse.cz> <20170329104126.GF27994@dhcp22.suse.cz> <20170329110515.GA24681@bfoster.bfoster> <20170329111833.GJ27994@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170329111833.GJ27994@dhcp22.suse.cz> User-Agent: Mutt/1.8.0 (2017-02-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 29 Mar 2017 11:49:19 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6494 Lines: 125 On Wed, Mar 29, 2017 at 01:18:34PM +0200, Michal Hocko wrote: > On Wed 29-03-17 13:14:42, Ilya Dryomov wrote: > > On Wed, Mar 29, 2017 at 1:05 PM, Brian Foster wrote: > > > On Wed, Mar 29, 2017 at 12:41:26PM +0200, Michal Hocko wrote: > > >> [CC xfs guys] > > >> > > >> On Wed 29-03-17 11:21:44, Ilya Dryomov wrote: > > >> [...] > > >> > This is a set of stack traces from http://tracker.ceph.com/issues/19309 > > >> > (linked in the changelog): > > >> > > > >> > Workqueue: ceph-msgr con_work [libceph] > > >> > ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000 > > >> > 0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00 > > >> > ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148 > > >> > Call Trace: > > >> > [] schedule+0x29/0x70 > > >> > [] schedule_timeout+0x1bd/0x200 > > >> > [] ? ttwu_do_wakeup+0x2c/0x120 > > >> > [] ? ttwu_do_activate.constprop.135+0x66/0x70 > > >> > [] wait_for_completion+0xbf/0x180 > > >> > [] ? try_to_wake_up+0x390/0x390 > > >> > [] flush_work+0x165/0x250 > > >> > > >> I suspect this is xlog_cil_push_now -> flush_work(&cil->xc_push_work) > > >> right? I kind of got lost where this waits on an IO. > > >> > > > > > > Yep. That means a CIL push is already in progress. We wait on that to > > > complete here. After that, the resulting task queues execution of > > > xlog_cil_push_work()->xlog_cil_push() on m_cil_workqueue. That task may > > > submit I/O to the log. > > > > > > I don't see any reference to xlog_cil_push() anywhere in the traces here > > > or in the bug referenced above, however..? > > > > Well, it's prefaced with "Interesting is:"... Sergey (the original > > reporter, CCed here) might still have the rest of them. > > JFTR > http://tracker.ceph.com/attachments/download/2769/full_kern_trace.txt > [288420.754637] Workqueue: xfs-cil/rbd1 xlog_cil_push_work [xfs] > [288420.754638] ffff880130c1fb38 0000000000000046 ffff880130c1fac8 ffff880130d72180 > [288420.754640] 0000000000012b00 ffff880130c1fad8 ffff880130c1ffd8 0000000000012b00 > [288420.754641] ffff8810297b6480 ffff880130d72180 ffffffffa03b1264 ffff8820263d6800 > [288420.754643] Call Trace: > [288420.754652] [] ? xlog_bdstrat+0x34/0x70 [xfs] > [288420.754653] [] schedule+0x29/0x70 > [288420.754661] [] xlog_state_get_iclog_space+0xdc/0x2e0 [xfs] > [288420.754669] [] ? xlog_bdstrat+0x34/0x70 [xfs] > [288420.754670] [] ? try_to_wake_up+0x390/0x390 > [288420.754678] [] xlog_write+0x190/0x730 [xfs] > [288420.754686] [] xlog_cil_push+0x24e/0x3e0 [xfs] > [288420.754693] [] xlog_cil_push_work+0x15/0x20 [xfs] > [288420.754695] [] process_one_work+0x159/0x4f0 > [288420.754697] [] process_scheduled_works+0x2c/0x40 > [288420.754698] [] worker_thread+0x29b/0x530 > [288420.754699] [] ? create_worker+0x1d0/0x1d0 > [288420.754701] [] kthread+0xc9/0xe0 > [288420.754703] [] ? flush_kthread_worker+0x90/0x90 > [288420.754705] [] ret_from_fork+0x58/0x90 > [288420.754707] [] ? flush_kthread_worker+0x90/0x90 Ah, thanks. According to above, xfs_cil is waiting on log space to free up. This means xfs-cil is probably in: xlog_state_get_iclog_space() ->xlog_wait(&log->l_flush_wait, &log->l_icloglock); l_flush_wait is awoken during log I/O completion handling via the xfs-log workqueue. That guy is here: [288420.773968] INFO: task kworker/6:3:420227 blocked for more than 300 seconds. [288420.773986] Not tainted 3.18.43-40 #1 [288420.773997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [288420.774017] kworker/6:3 D ffff880103893650 0 420227 2 0x00000000 [288420.774027] Workqueue: xfs-log/rbd1 xfs_log_worker [xfs] [288420.774028] ffff88010357fac8 0000000000000046 0000000000000000 ffff880103893240 [288420.774030] 0000000000012b00 ffff880146361128 ffff88010357ffd8 0000000000012b00 [288420.774031] ffff8810297b7540 ffff880103893240 ffff88010357fae8 ffff88010357fbf8 [288420.774033] Call Trace: [288420.774035] [] schedule+0x29/0x70 [288420.774036] [] schedule_timeout+0x1bd/0x200 [288420.774038] [] ? ttwu_do_wakeup+0x2c/0x120 [288420.774040] [] ? ttwu_do_activate.constprop.135+0x66/0x70 [288420.774042] [] wait_for_completion+0xbf/0x180 [288420.774043] [] ? try_to_wake_up+0x390/0x390 [288420.774044] [] flush_work+0x165/0x250 [288420.774046] [] ? worker_detach_from_pool+0xd0/0xd0 [288420.774054] [] xlog_cil_force_lsn+0x81/0x200 [xfs] [288420.774056] [] ? dequeue_entity+0x17c/0x520 [288420.774063] [] _xfs_log_force+0x6e/0x280 [xfs] [288420.774065] [] ? put_prev_entity+0x3c/0x2e0 [288420.774067] [] ? sched_clock_cpu+0x95/0xd0 [288420.774068] [] ? __switch_to+0xf2/0x5f0 [288420.774076] [] xfs_log_force+0x39/0xe0 [xfs] [288420.774083] [] xfs_log_worker+0x28/0x50 [xfs] [288420.774085] [] process_one_work+0x159/0x4f0 [288420.774086] [] worker_thread+0x11b/0x530 [288420.774088] [] ? create_worker+0x1d0/0x1d0 [288420.774089] [] kthread+0xc9/0xe0 [288420.774091] [] ? flush_kthread_worker+0x90/0x90 [288420.774093] [] ret_from_fork+0x58/0x90 [288420.774095] [] ? flush_kthread_worker+0x90/0x90 ... which is back waiting on xfs-cil. Ilya, Have you looked into this[1] patch by any chance? Note that 7a29ac474 ("xfs: give all workqueues rescuer threads") may also be a potential band aid for this. Or IOW, the lack thereof in v3.18.z may make this problem more likely. Brian [1] http://www.spinics.net/lists/linux-xfs/msg04886.html > -- > Michal Hocko > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html