Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751735AbdFGHTk (ORCPT ); Wed, 7 Jun 2017 03:19:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:38297 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751514AbdFGHTi (ORCPT ); Wed, 7 Jun 2017 03:19:38 -0400 Subject: Re: Sleeping BUG in khugepaged for i586 To: Larry Finger , Andrew Morton Cc: LKML , linux-mm@kvack.org, David Rientjes References: <968ae9a9-5345-18ca-c7ce-d9beaf9f43b6@lwfinger.net> <20170605144401.5a7e62887b476f0732560fa0@linux-foundation.org> <1e883924-9766-4d2a-936c-7a49b337f9e2@lwfinger.net> From: Vlastimil Babka Message-ID: <9ab81c3c-e064-66d2-6e82-fc9bac125f56@suse.cz> Date: Wed, 7 Jun 2017 09:19:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <1e883924-9766-4d2a-936c-7a49b337f9e2@lwfinger.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2672 Lines: 64 On 06/06/2017 05:01 PM, Larry Finger wrote: > On 06/06/2017 09:02 AM, Vlastimil Babka wrote: >> On 06/05/2017 11:44 PM, Andrew Morton wrote: >>> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger wrote: >>> >>>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which >>>> requires a 32-bit kernel. In the log I found the following: >>>> >>>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655 >>>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged >>>> 1 lock held by khugepaged/20: >>>> #0: (&mm->mmap_sem){++++++}, at: [] >>>> collapse_huge_page.isra.47+0x439/0x1240 >>>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G W >> >> W means thre was WARN earler. Could be related... Got logs? > > When I grabbed a splat, I got the last one in my log. The first one shows "Not > tainted". > >> >>>> 4.12.0-rc1-wl-12125-g952a068 #80 >> >> What is "wl-12125-g952a068"? What patches on top of mainline? > > I found this while chasing a problem with one of the wireless drivers. For that > reason I use Kalle Valo's wireless-testing-next, which happens to be the only > kernel tree I have on this laptop. I'm reasonably certain that the extra updates > are not the cause of the problem as the first one appears before any of the > wireless drivers are loaded, but I will pull a clean copy of mainline to test > that assumption. > >>>> Hardware name: Dell Computer Corporation Latitude D600 >>>> /03U652, BIOS A05 05/29/2003 >>>> Call Trace: >>>> dump_stack+0x76/0xb2 >>>> ___might_sleep+0x174/0x230 >>>> collapse_huge_page.isra.47+0xacf/0x1240 >>>> khugepaged_scan_mm_slot+0x41e/0xc00 >>>> ? _raw_spin_lock+0x46/0x50 >>>> khugepaged+0x277/0x4f0 >>>> ? prepare_to_wait_event+0xe0/0xe0 >>>> kthread+0xeb/0x120 >>>> ? khugepaged_scan_mm_slot+0xc00/0xc00 >>>> ? kthread_create_on_node+0x30/0x30 >>>> ret_from_fork+0x21/0x30 >>>> >>>> I have no idea when this problem was introduced. Of course, I will test any >>>> proposed fixes. >>>> >>> >>> Odd. There's nothing wrong with cond_resched() while holding mmap_sem. >>> It looks like khugepaged forgot to do a spin_unlock somewhere and we >>> leaked a preempt_count. >> >> Hmm I'd expect such spin lock to be reported together with mmap_sem in >> the debugging "locks held" message? > > My bisection of the problem is about half done. My latest good version is commit > 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go. Hmm, your bisection will most likely just find commit 338a16ba15495 which added the cond_resched() at mm/khugepaged.c:655. CCing David who added it.