Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758522Ab3G3XuL (ORCPT ); Tue, 30 Jul 2013 19:50:11 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:56185 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755837Ab3G3XuJ (ORCPT ); Tue, 30 Jul 2013 19:50:09 -0400 X-Greylist: delayed 901 seconds by postgrey-1.27 at vger.kernel.org; Tue, 30 Jul 2013 19:50:09 EDT X-AuditID: 9c93017e-b7b62ae000000eeb-69-51f84daa21ad Date: Wed, 31 Jul 2013 08:35:30 +0900 From: Minchan Kim To: Michal Hocko Cc: Peter Zijlstra , Andrew Morton , Dave Jones , "Aneesh Kumar K.V" , Rik van Riel , KAMEZAWA Hiroyuki , Hillf Danton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing Message-ID: <20130730233530.GA19340@bbox> References: <20130730142957.GG15847@dhcp22.suse.cz> <1375195560-23888-1-git-send-email-mhocko@suse.cz> <20130730145834.GA32226@laptop.programming.kicks-ass.net> <20130730152333.GJ15847@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130730152333.GJ15847@dhcp22.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6354 Lines: 128 On Tue, Jul 30, 2013 at 05:23:33PM +0200, Michal Hocko wrote: > On Tue 30-07-13 16:58:34, Peter Zijlstra wrote: > > On Tue, Jul 30, 2013 at 04:46:00PM +0200, Michal Hocko wrote: > [...] > > > +/* > > > + * Now, reclaim path never holds hugetlbfs_inode->i_mmap_mutex while it could > > > + * hold normal inode->i_mmap_mutex so this annotation avoids a lockdep splat. > > > > How about something like: > > > > /* > > * Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never > > * be taken from reclaim -- unlike regular filesystems. This needs an > > * annotation because huge_pmd_share() does an allocation under > > * i_mmap_mutex. > > */ > > > > It clarifies the exact conditions and makes easier to verify the > > validity of the annotation. > > Yes, looks much better. Thanks! > --- > >From 673cbe2ca7df0decd7320987d97585660542e468 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jul 2013 17:22:14 +0200 > Subject: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing > > Dave has reported the following lockdep splat: > [128095.470960] ================================= > [128095.471315] [ INFO: inconsistent lock state ] > [128095.471660] 3.11.0-rc1+ #9 Not tainted > [128095.472156] --------------------------------- > [128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. > [128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes: > [128095.474373] (&mapping->i_mmap_mutex){+.+.?.}, at: [] page_referenced+0x87/0x5e3 > [128095.475128] {RECLAIM_FS-ON-W} state was registered at: > [128095.475866] [] mark_held_locks+0x81/0xe7 > [128095.476597] [] lockdep_trace_alloc+0x5e/0xbc > [128095.477322] [] __alloc_pages_nodemask+0x8b/0x9b6 > [128095.478049] [] __get_free_pages+0x20/0x31 > [128095.478769] [] get_zeroed_page+0x12/0x14 > [128095.479477] [] __pmd_alloc+0x1c/0x6b > [128095.480138] [] huge_pmd_share+0x265/0x283 > [128095.480138] [] huge_pte_alloc+0x5d/0x71 > [128095.480138] [] hugetlb_fault+0x7c/0x64a > [128095.480138] [] handle_mm_fault+0x255/0x299 > [128095.480138] [] __do_page_fault+0x142/0x55c > [128095.480138] [] do_page_fault+0xd/0x16 > [128095.480138] [] error_code+0x6c/0x74 > [128095.480138] irq event stamp: 3136917 > [128095.480138] hardirqs last enabled at (3136917): [] _raw_spin_unlock_irq+0x27/0x50 > [128095.480138] hardirqs last disabled at (3136916): [] _raw_spin_lock_irq+0x15/0x78 > [128095.480138] softirqs last enabled at (3136180): [] __do_softirq+0x137/0x30f > [128095.480138] softirqs last disabled at (3136175): [] irq_exit+0xa8/0xaa > [128095.480138] > other info that might help us debug this: > [128095.480138] Possible unsafe locking scenario: > [128095.480138] CPU0 > [128095.480138] ---- > [128095.480138] lock(&mapping->i_mmap_mutex); > [128095.480138] > [128095.480138] lock(&mapping->i_mmap_mutex); > [128095.480138] > *** DEADLOCK *** > [128095.480138] no locks held by kswapd0/49. > [128095.480138] > stack backtrace: > [128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9 > [128095.480138] Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008 > [128095.480138] c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845 > [128095.480138] c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001 > [128095.480138] c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb > [128095.480138] Call Trace: > [128095.480138] [] dump_stack+0x4b/0x79 > [128095.480138] [] print_usage_bug+0x1d9/0x1e3 > [128095.480138] [] mark_lock+0x1e0/0x261 > [128095.480138] [] ? check_usage_backwards+0x109/0x109 > [128095.480138] [] __lock_acquire+0x623/0x17f2 > [128095.480138] [] ? sched_clock_cpu+0xcd/0x130 > [128095.480138] [] ? sched_clock_local+0x42/0x12e > [128095.480138] [] lock_acquire+0x7d/0x195 > [128095.480138] [] ? page_referenced+0x87/0x5e3 > [128095.480138] [] mutex_lock_nested+0x6c/0x3a7 > [128095.480138] [] ? page_referenced+0x87/0x5e3 > [128095.480138] [] ? page_referenced+0x87/0x5e3 > [128095.480138] [] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e > [128095.480138] [] page_referenced+0x87/0x5e3 > [128095.480138] [] ? raid0_congested+0x26/0x8a [raid0] > [128095.480138] [] shrink_page_list+0x3d9/0x947 > [128095.480138] [] ? trace_hardirqs_on+0xb/0xd > [128095.480138] [] shrink_inactive_list+0x155/0x4cb > [128095.480138] [] shrink_lruvec+0x300/0x5ce > [128095.480138] [] shrink_zone+0x53/0x14e > [128095.480138] [] kswapd+0x517/0xa75 > [128095.480138] [] ? mem_cgroup_shrink_node_zone+0x280/0x280 > [128095.480138] [] kthread+0xa8/0xaa > [128095.480138] [] ? trace_hardirqs_on+0xb/0xd > [128095.480138] [] ret_from_kernel_thread+0x1b/0x28 > [128095.480138] [] ? insert_kthread_work+0x63/0x63 > > which is a false positive caused by hugetlb pmd sharing code which > allocates a new pmd from withing mappint->i_mmap_mutex. If this > allocation causes reclaim then the lockdep detector complains that we > might self-deadlock. > > This is not correct though, because hugetlb pages are not reclaimable so > their mapping will be never touched from the reclaim path. > > The patch tells lockup detector that hugetlb i_mmap_mutex is special > by assigning it a separate lockdep class so it won't report possible > deadlocks on unrelated mappings. > > [peterz@infradead.org: comment for annotation] > Reported-by: Dave Jones > Signed-off-by: Michal Hocko Reviewed-by: Minchan Kim Thanks, Michal! Only remained thing is Dave's testing. - Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/